Press "Enter" to skip to content

Pitfalls that I fell into when using OpenGL

In this post I will explain all the small pitfalls in which I fell so that you do not have to.

First of all is the use of glMapBuffer() instead of glBufferSubData() and I was causing the so called implicit synchronization. What this means is that even if at first it might seem to you that glMapBuffer() is much faster, because it does not create an additional copy of the data, but it allows you to read/write directly to the mapped buffer on the driver side. This might be very harmful, if you don’t take any measures of course. You have to remember that the GPU is always a frame back and when you reach the call to glMapBuffer() it will stall the gpu potentially, because for the time of upload OpenGL will not be able to execute any previous or current shaders bound to this uniform buffer object (UBO). The usage of a UBO across shaders magnified the issue even more. So a simple solution is just to revert back to glBufferSubData(), because in the end what that would do is create the copy of your data and do the swap of buffer pointers whenever it decides is the perfect time to not loose cycles. And if you really wanted to do a glMapBuffer(), you have to create two buffers and ping-pong manually.

The second pitfall that I fell for is the use of begin/end queries while optimizing, believing that in the case that there were no samples that passed the given query after a full screen blend we will break early out of depth peels, for example. Well, it turned out that if you set a hard number of iterations (geometry passes or peels) instead, even if it is a little bit higher, and you remove the query you will be much faster, because of the fact that the fragment shader execution for the screen blend has to finish for all fragments on the line where you requested the glEndQuery(). Tough 🙁

Third pitfall was that when I decided that I wanted to cut down on color attachments sizes for my main framebuffer. What I did was cut the custom depth texture size down to 16 bits using the format GL_RG16F. When you do that however, you have to have a function compliant with the IEEE 754 standard for 16-bit floats. And what I did was exactly on the contrary, I decided that I just wanted to move some bits around! .. Producing that crap got me a black screen 🙁 (I expected to have artifacts, but not complete darkness).

The forth pitfall was when I got my data implicitly casted to float from int, while I thought that when using the enumeration type (the third argument) in the function call glVertexAttribPointer() I was specifying what the data type just was and directly submitting it. Instead what happened was that I submitted an int that is to be casted to float and send as float. So if you actually wanted to submit integers without them being casted you have to use the glVertexAttribIPointer(). A subtle difference, but oh boy it does matter.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *