gprof and OpenGL

Syth · 16-04-2006 03:45AM #1

I'm trying to profile an OpenGL programme I've written myself using gprof. I'm seeing if/how I can optimise it. The gprof output does show numbers for the number of times each function is executed but the times are zero. As if every function takes zero time to execute.

For example I get output like the following:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  0.0       0.00     0.00    72854     0.00     0.00  _drawTaperedCylinder [34246]
  0.0       0.00     0.00    25449     0.00     0.00  _drawSphere [34247]
  0.0       0.00     0.00    19461     0.00     0.00  _drawVertCylinder [34248]

This is for all functions, including main()! I'm using GLUT, and when using it you call glutMainLoop() to start your programme. This function never returns. Could this mess up the profiling? If so, how can one profile OpenGL/GLUT applications?

satchmo · 16-04-2006 12:55PM

Looks like there's something wrong with the way you're running gprof alright - I've never used it so I can't really help you there.

However you should keep a couple of things in mind when profiling 3D apps like this. If you're just measuring the amount of time spent inside API functions like a bunch of glVertex's, you're not getting the actual time spent to render. Instead all you're getting is how long it took the driver to submit those calls to the graphics hardware's command processor, whereupon it can return and let the CPU continue its task. The simplest method to actually benchmark is to perform some GL calls, call glFinish() to ensure the command queue is emptied and all rendering calls have been executed, and then measure the elapsed time.

There are a couple of different places a 3D app can have a bottleneck in. First, and most likely, is the CPU. All the stuff like physics, simulation and other processing is usually where the majority of the frame time is spent. Even too many API calls can become a bottleneck in the driver. The next likely bottleneck is vertex submission - if you're shoving too many vertices over the bus every frame using immediate mode, it'll take longer to transfer them than actually render them. Use things like VBOs to just transfer them once and keep them in video memory for subsequent frames. Otherwise the bottleneck could be either the vertex processor (so many vertices have to be transformed), the fragment processor (a lot of complex fragment ops), or memory bandwidth/fill rate (too many large textures, too much alpha-blending , too much overdraw etc).

There are a couple of specific 3D profiling packages, all Win32 unfortunately so they mightn't be any use to you if you're using Linux. gDEBugger is a good independent profiler than can be used to debug OpenGL apps - insert breakpoints, step through calls, replace textures etc. NVIDIA has NVPerfKit which gives detailed hardware performance numbers. ATI uses gDEBugger to interface with its hardware counters and give similar profiling results - things like the percentage of vertex processor usage over the entire frame, fragment processor usage, video and AGP memory usage, number of pixels processed and so on. You can use these numbers to pinpoint the bottleneck in your application.

This presentation goes into more detail about GPU pipeline optimization.

gprof and OpenGL

Comments