Hello everyone
I am having an issue with my custom app on linux that I am not sure if it is something related with my modifications or SlicerCAT in general.
Can someone test the following on a custom app on linux with python 3.9, please?
it is crashing upon import on libhdf5.so which both packages carry as dependencies. On Slicer 5.1 this is fine and properly handled, i.e. both packages install and load ok. I don’t even know how to handle it. One thing that comes to mind is building both packages with the same libhdf5, but if it is something wrong with the build I want to try and fix it. Thanks
Thanks @cpinter. Mine is using Slicer 4.11 as well but with python 3.9. Which version is your python?
Just to be sure, the crash you are talking about is Slicer exiting completely, right? Thanks
On Windows, using a Slicer custom app based on Slicer 4.13 (commit just prior to 5.0 tag) the following installs and import were successful. This Slicer version is using Python 3.9.
Note that Slicer 4.11.20210226 was using ITK 5.1.2 which specifies HDF5 1.10.4 (see here), while latest Slicer uses ITK 5.3rc03 which specifies HDF5 1.12.1 (see here).
On Windows it appears the latest version of h5py whls (3.6.0) build against HDF5 1.12.1 (see here).
ITK and VTK use custom prefix in the HDF shared library name and symbols, so there should not be any conflicts (ABI compatibility issues) with system HDF or any other HDF versions installed by Python packages.
Have you configured your custom Slicer build to use system HDF? That could break this isolation.
If Python packages include HDF as shared library then it may lead to conflicts, too.
Hey @lassoan thanks for the tips. I will check the build configs but I can see the HDF libs with custom prefixes on the package (libiktIO and libvtk). One thing to notice is that If I use the console (PythonSlicer) it is fine importing h5py and netCDF4. So it must be something that Slicer imports/does.
I’ve built the SlicerCAT example with Slicer 260478f (Slicer 5) and the problems I am reporting here are solved.
I am on a crossroads now. Slicer 4.11 is still the supported version but I understand that we are on the verge on changing to v5. I will spend a little more time on this if no easy solution is found I will abandon it in favor of using Slicer v5.
I can see that in Slicer (4 I think) both VTK and ITK have compiled HDF5 libs.
Also as I remember (worse to check) h5py uses native C hdf5 library (not a high level _hl).
Thus I can find:
/home/kerim/Documents/Slicer/d/VTK-build/lib/libvtkhdf5-9.1.so
/home/kerim/Documents/Slicer/d/ITK-build/lib/libitkhdf5-shared-5.3_debug.so
and also I expect there should be `site-packages/h5py/libhdf5.so`
As you said there is no problem to import h5py to python without Slicer. It might be because you don’t import VTK’s HDF5 and ITK’s hdf5 along with h5py’s hdf5.
As HDF5 is a pure C library I would try to import them to PythonSlicer and then to Slicer directly. I haven’t done this before but google tells me that it may be done via:
@fbordignon sorry, my previous post didn’t work.
I tested it on pure Slicer 4 LTS.
I discovered that if I import netCDF4 first and then import h5py then the app fails at the h5py.init line when importing _errors.cpython-36m-x86_64-linux-gnu.so lib (you can set print() in it to check)
And _errors lib contains H5E_ functions (exceptions). I got this with nm -gD _errors.cpython-36m-x86_64-linux-gnu.so
@keri thanks for your suggestions, but when doing cdll.LoadLibrary no functions from the libraries are called.
I believe there is something wrong with the initialization of the library that is issuing a segfault.
These are the backtraces I am having. The error always occur on the last lib imported. i.e if I import netCDF4 and then h5py, the error is on h5py and vice-versa.
This error is on the interactive python console of the custom app. If I issue these commands on a PythonSlicer console they import both libs fine. i.e. they are somewhat isolated from each other.
Error netCDF4
Thread 2.1 “GeoSlicerApp-re” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffbafbd540 (LWP 3789)]
0x00007ffe4661d91d in ?? () from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/…/netCDF4.libs/libhdf5-02fc656b.so.200.0.0
(gdb) bt #0 0x00007ffe4661d91d in ?? ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/…/netCDF4.libs/libhdf5-02fc656b.so.200.0.0 #1 0x00007ffe466221f8 in H5SL_create ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/…/netCDF4.libs/libhdf5-02fc656b.so.200.0.0 #2 0x00007ffe464f2989 in H5I_register_type ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/…/netCDF4.libs/libhdf5-02fc656b.so.200.0.0 #3 0x00007ffe46721aa2 in H5VL__init_package ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/…/netCDF4.libs/libhdf5-02fc656b.so.200.0.0 #4 0x00007ffe46721b9a in H5VL_init_phase1 ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/…/netCDF4.libs/libhdf5-02fc656b.so.200.0.0 #5 0x00007ffe4630d43e in H5_init_library ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/…/netCDF4.libs/libhdf5-02fc656b.so.200.0.0 #6 0x00007ffe4630dedf in H5get_libversion ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/…/netCDF4.libs/libhdf5-02fc656b.so.200.0.0 #7 0x00007ffe46a8bbe1 in ?? ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/_netCDF4.cpython-39-x86_64-linux-gnu.so #8 0x00007ffe46a85b86 in ?? ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/_netCDF4.cpython-39-x86_64-linux-gnu.so #9 0x00007ffe46a777d6 in ?? ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/netCDF4/_netCDF4.cpython-39-x86_64-linux-gnu.so #10 0x00007fffe18bf33a in PyModule_ExecDef ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/bin/…/lib/Python/lib/libpython3.9.so #11 0x00007fffe19a386b in ?? () from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/bin/…/lib/Python/lib/libpython3.9.so #12 0x00007fffe18bdcf3 in ?? () from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/bin/…/lib/Python/lib/libpython3.9.so #13 0x00007fffe1852954 in PyVectorcall_Call ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/bin/…/lib/Python/lib/libpython3.9.so
Thread 2.1 “GeoSlicerApp-re” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffbafbd540 (LWP 3848)]
0x00007fffbf9178b1 in __vfprintf_internal (s=s@entry=0x7fffff7ff450, format=format@entry=0x7ffe47dd8bb5 “can’t locate ID”, ap=ap@entry=0x7fffff7ff5a8, mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1289
1289 vfprintf-internal.c: No such file or directory.
(gdb) bt #0 0x00007fffbf9178b1 in __vfprintf_internal (s=s@entry=0x7fffff7ff450, format=format@entry=0x7ffe47dd8bb5 “can’t locate ID”,
ap=ap@entry=0x7fffff7ff5a8, mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1289 #1 0x00007fffbf92cbfa in __vasprintf_internal (result_ptr=0x7fffff7ff5a0, format=0x7ffe47dd8bb5 “can’t locate ID”, args=0x7fffff7ff5a8, mode_flags=0)
at vasprintf.c:57 #2 0x00007ffe47b55417 in H5E_printf_stack ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #3 0x00007ffe47be470c in H5I_inc_ref ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #4 0x00007ffe47b55243 in H5E__push_stack ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #5 0x00007ffe47b55444 in H5E_printf_stack ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #6 0x00007ffe47be470c in H5I_inc_ref ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #7 0x00007ffe47b55243 in H5E__push_stack ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #8 0x00007ffe47b55444 in H5E_printf_stack ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #9 0x00007ffe47be470c in H5I_inc_ref ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #10 0x00007ffe47b55243 in H5E__push_stack ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #11 0x00007ffe47b55444 in H5E_printf_stack ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.200.1.0 #12 0x00007ffe47be470c in H5I_inc_ref ()
from /home/fernando/Downloads/GeoSlicer-1.6.0-2022-05-11-linux-amd64/lib/Python/lib/python3.9/site-packages/h5py/…/h5py.libs/libhdf5-346dbfc8.so.20
I’ve updated my custom Slicer app to the latest ITK v5.3rc something and it did not solve the issue.
I noticed that on Linux and Windows if I issue import sitkUtils it crashes the custom app.
I think the problem is that there are too many HDF5 libs (ITK, VTK, h5py, netCDF4)
In my SlicerCAT I build HDF5 as external project and set it to VTK and ITK. So my system has only one instance of HDF5.
I just reproduced your “fail” steps on my SlicerCAT on clean Ubuntu 22.04:
Lately we had a conversation with @lassoan that it would be good to build HDF5 as an external project.
It is not a problem but don’t have time for this now.
In the future I will add a PR.
If you can’t wait (one-two months I think) you can take a look how I configured External_VTK.cmake and External_ITK.cmake. And there you can find External_HDF5.cmake but it probably should be modified to allow building against specific HDF5 version (probably you would want h5py to work with the same HDF5 version)
This is only a problem if an application or library brings its own HDF5 as a shared library, without shared library name or symbol mangling.
ITK and VTK behave well - their HDF5 will not clash with anything (other than potentially another ITK and VTK), unless you force using “system” HDF5. If you force ITK or VTK to use "system HDF5 then it is entirely up to you to manage version conflicts.
I’ve checked the wheel files of netcdf4 and h5py.
netcdf4 behaves well - it links HDF5 staticallly, so there are no conflicts.
h5py is messed up - it includes shared libraries with non-mangled names (hdf5.dll, hdf5_hl.dll, and even a zlib.dll). This can crash any application that uses HDF5 or zlib. An application or the system may provide a common shared library without name and symbol mangling, but a Python package must never do this (they must not dictate an application what HDF5 it may use). I would recommend not to use this package (or maybe build it yourself, as a static build, if that is supported).
I don’t understand why non-mangled hdf5 may crash the app in that sutiation.
This is how I understand this:
VTK and ITK have completely “separeted” name mangled HDF5. That means they use only their own HDF5: instead of H5D_close ITK uses itk_H5D_close.
Then h5py uses unmangled HDF5: it uses H5D_close.
if netCDF4 links to HDF5 statically then does that mean that it can’t use h5py’s HDF5? I don’t know that, but the app crashes after h5py and netCDF4 imported together (along with ITK and VTK).
I can see two hints why this may happen:
If I try to look in HDF5 lib object names (I believe this could be done with nm -gD libitkhdf5-shared-5.3_debug.so) I can see that most function names are prefixed with itk_ but there are also some of them without name mangling:
may two HDF5 libs of the same version without name mangling be loaded together or this is impossible or that leads to crash?
probably these two unmangled HDF5 libs are of different version or compiled with different compilers (compiler flags)
I don’t have enough of theoretical knowledge so I’m not sure about it.
But:
VTK+ITK+h5py = OK
VTK+ITK+netCDF4 = OK
h5py+netCDF4 = OK
VTK+ITK+h5py+netCDF4 = CRASH
(ITK and VTK here are meant as VTK’s and ITK’s HDF5)
I tested on my SlicerCAT where ITK=VTK=HDF5 v1.12 and h5py and netCDF4 are installed with pip_install then:
VTK+ITK+h5py+netCDF4 = OK
I thought that it would be better to build HDF5 externally and set it as a VTK and ITK dependency with the ability to set desired HDF5 version so that if h5py is intented to be used then it was possible to prepare needed version.
VTK’s and ITK’s HDF5 are only isolated if you don’t force them to use your own (“system”) HDF5.
It means netCDF is self-contained, it does not rely on a shared library. It does not use or interfere with any HDF5 shared libraries.
If you build HDF5 externally and want to use that in ITK and VTK then it is up to you to ensure that your custom HDF5 does not clash with h5py’s HDF5. You either have to use the exact same version, built with the same options, or build your HDF5 with library name and symbol mangling.
These seem to be some private/debug functions. It may be normal that they are not name-mangled, but maybe it is an error. Mixing libraries built in debug and release mode (or built with slightly different options) may cause crashes, too.
Even if you build the same HDF5 version, there may be compiler or library build option differences between your build and the third-party-built binary that can cause crash. It is safer to isolate your binaries.
It seems that netCDF4 uses a HDF5 shared library on linux. So, unless you are lucky (they all happen to use the same HDF5 version, built with the same options) you cannot use both h5py and netCDF wheels at the same time on linux. You should be able to build both from source though, forcing both of them to use some common HDF5 version.
From what I looked up recently, the wheels distributed under manylinux tags are compatible with various standards using auditwheel to rename the .so libs and allow for various wheels to use dependencies that would otherwise clash. This issue discusses some problems with this approach. It seems like if a thirdparty lib loads the symbol globally, then the mechanism implemented in auditwheel does not work, i.e. something loads global hdf5 symbols, then the python wheels will call that symbols and crash.
It seems to me that even though the symbols of netCDF4 and h5py are not mangled, they can work together because of the auditwheel steps that are also present on macOS via delocate.
00000000005dd0be B H5AC_init_g
00000000005dd0bc B H5A_init_g
00000000005dcc46 B H5_api_entered_g
00000000005dd0c0 B H5B2_init_g
00000000005dd0bf B H5B_init_g
00000000005dd0c1 B H5C_init_g
00000000005dee00 B H5D_def_dxpl_cache
00000000005decc0 B H5_debug_g
00000000005dd0c2 B H5D_init_g
00000000005de331 B H5EA_init_g
00000000005de330 B H5E_init_g
00000000005dee60 B H5E_stack_g
00000000005de340 B H5FA_init_g
00000000005de348 B H5FD_init_g
00000000005de332 B H5F_init_g
00000000005de3b0 B H5FL_init_g
00000000005de338 B H5F_sfile_head_g
00000000005de400 B H5FS_init_g
00000000005de401 B H5G_init_g
00000000005de403 B H5HF_init_g
00000000005de404 B H5HG_init_g
00000000005de405 B H5HL_init_g
00000000005de420 B H5I_init_g
00000000005dcc45 B H5_libinit_g
00000000005dcc44 B H5_libterm_g
00000000005de838 B H5L_init_g
00000000005de858 B H5MF_init_g
00000000005de859 B H5MP_init_g
00000000005de85a B H5O_init_g
00000000005de948 B H5PB_init_g
on Slicer 5
nm -gD lib/libvtkhdf5-9.1.so.1 | grep -v vtk
no H5* symbols appear at the output.
This corroborates with my thesis that vtk is loading those symbols in some manner and it is corrected in Slicer 5 via the new vtk version which has the fix linked before. I am trying to update hdf5 for vtk on Slicer 4.11 to see if it is really the case.
Updating the hdf5 lib inside VTK was proving to be difficult because it needs a big jump so I’ve forked VTK from the commit hash of Slicer 4.11 and only added the mangling defines from VTK 9.1. It works! This commit made it work fine, no more symbols are unmangled on libvtkhdf5. I’ve merged the defines from vtk 8 and 9 so probably there are some that are not needed, but I figure it does not hurt.
I don’t know if Slicer 4 will be given maintenance after Slicer 5 is released but it is a small enough fix. Now I can import h5py and netCDF4 without crashing