After several days of pretty consistent work, I manage to get OctrayRewrite running and performing reasonably well on AMD cards.
I started by sending demos to several AMD users in the server, however, due to a number of serious issues in both OptiFine and AMD drivers, I ended up having to source my own AMD system for development. This ended up being a wise decision, as there were so many mysterious bugs, it would have been impossible to fix all the issues without the tight testing loop I got on my own system.
Luckily, a side effect of this is that the shader performs about 50% faster with the same settings, even on my Nvidia system (I was not expecting this). These performance gains mostly came when I stopped using geometry shaders for anything. Using new techniques (imageLoad/Store), it is now possible to do voxelization without geometry shaders or instancing (both super slow).
For future shader developers, here are the things I learned about supporting AMD GPUs:
- Don't ever use geometry shaders (slow on both Nvidia and AMD)
- Warp/Wavefront operations are poorly supported, regardless whether the GLSL extensions claim to be supported.
- AMD shader validators sometimes get confused when you have multiple sampler types being used in a single shader (e.g. sampler2D, usampler2D, sampler3D). If validation fails for this reason, find a way to only use one sampler type.
- AMD shader compilers introduce bugs. Don't be surprised if the values of variables change randomly (register corruption). These appear to be fairly consistent however, so if you work around them on one system, they will probably work on others.
- Keep buffer sizes as small as possible, performance severely degrades with buffer size (less so on Nvidia).