Trying to diagnose an issue with VirtualGL and Slicer

I am the principal developer and sole maintainer of VirtualGL, a Linux OpenGL remote display framework. Without going into too much gory detail about how VirtualGL works, suffice it to say that one particular VirtualGL mode emulates the GLX API using EGL/DRI commands, and that mode and only that mode has a bug that causes Slicer to render incorrectly when depth peeling and Display ROI are both enabled:

The other VirtualGL modes work fine with Slicer, and the VirtualGL mode that fails with Slicer works fine with other applications. Also, the failure only occurs when both depth peeling and Display ROI are enabled. If either is disabled, everything works fine. Thus, I’m hoping that someone can provide details regarding what happens behind the scenes in Slicer when both depth peeling and Display ROI are enabled. I’ve pored over the source code, both for Slicer and VTK, and I can’t determine exactly what’s happening at the OpenGL level. Also, very oddly, when I use apitrace to capture the Slicer workflow that would normally cause the issue, the issue cannot be reproduced when the captured OpenGL and GLX commands are replayed through apitrace.

To be 100% clear, this is my bug, not yours. I’m just asking for help in understanding what Slicer is doing at the OpenGL level when both depth peeling and Display ROI are enabled. Any assistance is greatly appreciated.

It is awesome that you are working on this, thank you!

ROI widget is special because its sides are semitransparent. Since volume rendering is semi-transparent as well, depth peeling is needed for correct rendering.

Depth peeling relies on the Z buffer. Maybe the issue is that somehow the Z buffer is not initialized or reset correctly or maybe its resolution is not sufficient? I can see some glitches with volume rendering + ROI widget display + depth peeling when using Intel Iris graphics (it does not happen with Nvidia) when zooming in and rotating around:

Is this similar to what you are seeing?
Does it make any difference if you zoom in/out?

Note that recent Slicer Preview Releases use VTK9, which has lots of changes in the rendering pipeline. It would be interesting to see if this newer version works any better.

I can confirm that the issue continues to happen with the latest Linux preview:

Does it make a difference if you use perspective projection (instead of orthogonal) in the 3D view?
Does it make any difference if you zoom in/out a lot?
Does the artifact disappear if you disable and re-enable depth peeling or hide and show the ROI?

If I disable and then re-enable depth-peeling, artifact appears again.
When I hide the ROI, artifact disappears. Showing it brings it back.
Zooming in/out to extremes has no impact.
Perspective vs orthogonal has no impact
All tested with latest preview.

Thank you. These behavior are all differ from what happens on Intel Iris graphics, so they seem to be unrelated problems.

@muratmaga could you try if you see the same issue if you display the new markups ROI (not the old annotation ROI)?

It does, but a in a different way:

VirtualGL’s EGL back end (the mode that emulates GLX using EGL/DRI) uses FBOs behind the scenes in order to emulate double buffering (because EGL doesn’t support double buffering for off-screen surfaces.) I do notice that Slicer and/or VTK are using FBOs as well, so my guess is that VirtualGL is interfering somehow. Can someone point me to the place in the Slicer and/or VTK code where the low-level OpenGL stuff is happening to implement depth peeling and Display ROI?

Depth peeling is implemented in this rendering pass: https://github.com/Kitware/VTK/blob/master/Rendering/OpenGL2/vtkDualDepthPeelingPass.cxx. Have a look at how it is used in vtkOpenGLRenderer and vtkOpenGLGPUVolumeRayCastMapper classes. If you need help with debugging lower-level rendering mechanisms, it may make sense to ask VTK developers on the VTK forum.

It may be tricky, but folks on the VTK forum would I’m sure be very interested to see a pure-VTK example of this issue to help debugging (that is, to remove Slicer-related complexities). E.g. maybe one could start with an example like this one and add some depth peeling and transparent objects.

If I could bring up a VTK-only example, it would greatly facilitate me debugging the issue myself. Alas, I have tried to bring up a few of them without success. The main issue is that all of the examples I have found require newer versions of VTK, but the only machine I have that supports EGL/DRI is still running CentOS 7, which is too old to build those newer versions of VTK. I will examine the source code, as advised above.

If you can diagnose from the source code that’s great.

But just FYI if you are able to run Slicer on the machine you can run vtk examples too. If you download the example linked above to /tmp/SampleRayCast.py, and then give it a .vtk volume file as an argument it should work for you like this example:

~/Downloads/Slicer-4.11.20210226-linux-amd64/bin/PythonSlicer /tmp/SampleRayCast.py ~/Documents/MRHead.vtk

(here I made MRHead.vtk by downloading the MRHead from the SampleData module and saving it as .vtk format.)

Thanks for the tip about running VTK examples. I guess I would just need an example that demonstrates the same type of rendering that Slicer performs when depth peeling and Display ROI are both enabled. Unfortunately, such an example is beyond my ability to produce.

After examining the code and the apitrace logs more closely, I am still clueless as to what the problem is. I may be wrong, but I don’t think it’s FBO-related. VirtualGL’s EGL back end only interferes with FBO functions when the default framebuffer is being used, and since VTK creates and uses its own FBOs, there should be no interference. It seems to me that if Z buffer initialization or resolution were the problem, then we should see the same rendering bug with the GLX back end, but we don’t. (And it’s really telling that playing back the apitrace capture-- which, to be clear, was captured on the local display without VirtualGL-- through VirtualGL’s EGL back end also fails to reproduce the bug. That’s the part that is really confusing me.) Also, VirtualGL’s own unit tests are fairly comprehensive and include quite a few conformance checks to ensure that we are properly emulating double buffering for the default framebuffer without interfering with application-generated FBOs. Also, although the EGL back end still lacks a few of the GLX and OpenGL features that the GLX back end supports, the apitrace logs don’t indicate that the application is using any of those features.

I’m afraid that, without a more stripped-down program that demonstrates the problem, there isn’t much more I can do about it. I don’t have the resources to continue focusing on this. I will note also that I have tried this example: VTK/Tutorials/TranslucentGeometry - KitwarePublic, but it doesn’t reproduce the issue.

1 Like

GitHub issue filed against VirtualGL: EGL back end: incorrect rendering with 3D Slicer when depth peeling and "Display ROI" are both enabled · Issue #168 · VirtualGL/virtualgl · GitHub

stripped-down program

without a more stripped-down program

Could you try this implementation of the example (see CorrectlyRenderTranslucentGeometry) which is the up-to-date and maintained one. (Thanks @Sankhesh_Jhaveri for pointing this out :pray:)

Note that I didn’t have a chance to check if it was different from the one from the wiki.

version of VTK

Last, the version of VTK 9 we use in Slicer is Slicer/VTK@slicer-v9.0.20201111-733234c785, could you try to build the example against this specific version to check if you can reproduce the problem ?

Thanks @jcfr :pray:

@DRC will you be able to build and test the example?

If not @muratmaga and I have been discussing the option of creating a python test script to see if we can replicate the issue in pure VTK without Slicer.

@pieper I was finally able to get that exact version of VTK to build on my machine, but unfortunately, the issue does not reproduce with the CorrectlyRenderTranslucentGeometry example.

You could try adding depth peeling to the example code @alireza added here: Incorrect volume rendering bounds - first and last slice - #5 by alireza

@pieper I have no experience whatsoever with that and would not even know how to begin.