Long delay after disk activity stops when loading very large data

jamesjcook · September 11, 2020, 9:08pm

I’m using slicer to work with very large files. After disk activity stops, there is a (surprisingly) long delay before the volume shows up in viewers.
Disk time to load is a few minutes, the delay after that has been as long as 40 minutes.
One cpu core is busy during this long delay time.

I think it is the min/max calculation, but i’m not currently in a position to verify that.

Is there a way to skip the min/max calc on load, or are there bare-bones programmatic ways to load which don’t do any processing?

lassoan · September 11, 2020, 9:13pm

Min/max computation may add significant amount of time (maybe up to a few ten %) but not something like 4000%. Could you attach the header of your image file in nhdr format?

Slicer retrieves min/max scalar value from some file formats that supports this, this saving some loading time, but the drastic slowdown you are experiencing is probably due to some other issue.

muratmaga · September 11, 2020, 9:21pm

Are your files compressed? I think slicer reads, and then uncompresses everything as oppose to stream. For files in the order of 10GB, these had been in dozens of minutes range for me…

jamesjcook · September 11, 2020, 9:27pm

This was tried with and without compression. Compression is also a serious detriment in loading these, when used with a single data file this took hours to load, my memory is 7, but I didnt find my record of it. When using multiple compressed files, (one per slice) loading time was very long, but could be accomplished in one working day. I didnt time that, i started loading in the morning and it was ready at lunch time.

type: unsigned short
dimension: 3
space: right-anterior-superior
sizes: 6263 10186 2621
space directions: (0.0018,0,0) (0,0.0018,0) (0,0,0.0040)
kinds: domain domain domain
endian: big
encoding: raw
space origin:(-5.36155006705028,	-9.68239733742919,	-4.23750168693362)
data file: big.dat

muratmaga · September 11, 2020, 10:13pm

I have noticed a delay between the end of the loading my large NRRD files (not compressed) and the appearance of slice views, which can be a minute or two (never really timed either). I always assumed those are overheads associated with generating MRML nodes etc. But I never tested with such large dataset, my large files are 10-20GB range. Yours is almost 400GB.

lassoan · September 12, 2020, 4:02am

I’ve tested loading of a 6.4GB uncompressed NRRD file and only a few seconds passed after file loading finished:

While cold loading of the file (after computer restarted), resource monitor showed 100MB/sec disk activity for about 64 seconds. A few seconds after disk activity went down to zero, the image appeared.
While warm loading the file (loading it again right after it was loaded), resource monitor did not show any activity and image appeared in 6 seconds.

A few things that would be nice if you could do to diagnose what’s happening:

Disk activity light is not reliable source of information. If you use a resource monitor that shows the actual disk transfer then you will know if data is actually read. Note that if you load a recently loaded file then it may be retrieved from memory instead of re-reading from disk, so you cannot blindly trust resource monitors either.
Your image is stored as big endian, so each value has to be byte-swapped after loading. It should not cause significant delay but it may worth a try to save the image (as it is saved as little endian) and see if loading this image is any faster.
You may try to instantiate a simple NRRD reader and see if it makes loading any faster:

volumeNode=slicer.mrmlScene.AddNewNodeByClass("vtkMRMLScalarVolumeNode")
storageNode=slicer.mrmlScene.AddNewNodeByClass("vtkMRMLNRRDStorageNode")
storageNode.SetFileName("path/to/your/file.nhdr")
storageNode.ReadData(volumeNode)
setSliceViewerLayers(volumeNode)

jamesjcook · September 12, 2020, 4:31am

The disk activity i’m referring to is the read activity in windows server 2016 process monitoring, specifically watching the open file listing, on big.dat.

I have a significant processing chain problem because ImageJ saves big endian, hopefully thats only a minor contributor to my load speed.

Thanks for the simple reader hint, I should get a change to test that next week.

lassoan · September 12, 2020, 4:48am

You can also try VerySleepy tool, which provides you function list and call stacks of methods that the application spends the most time with. For release-mode builds, without debug information, the data is not fully reliable but often gives useful hints about what the application does.

Instead of just watching the open file listing, use a tool that gives you real-time information about the amount of data transfer (such as Resource Monitor or Performance Monitor). You can then check if the disk activity makes sense (time x bandwidth should equal file size).

aiden.zhu · December 13, 2021, 9:42pm

@ lassoan
I do have large data as well. I tried this way to load data, but seems the data were not read in. Need I do some command to execute or force the data read from the local disk?
Thanks a lot.

==>
volumeNode=slicer.mrmlScene.AddNewNodeByClass(“vtkMRMLScalarVolumeNode”)
storageNode=slicer.mrmlScene.AddNewNodeByClass(“vtkMRMLNRRDStorageNode”)
storageNode.SetFileName(“path/to/your/file.nhdr”)
storageNode.ReadData(volumeNode)
setSliceViewerLayers(volumeNode)

jamesjcook · December 14, 2021, 4:40pm

When that happens to me, it is generally because I wrote my nhdr file incorrectly.
You can see slicer errors in the slicer log. Errors don’t always get from the python console to the main slicer log.

On the bottom right corner of the slier interface there is a circle with an X in it.
The circle is grey normally, if the circle is red, there has been an error.

Click the circle to open the log. You can do this before loading your data and “clear” the current messages to make it easier to see your errors.

aiden.zhu · December 14, 2021, 7:18pm

Hi James,
I got no problem to load data actually. But while I tried the method (with codes mentioned by @lassoan ), all shows fine but just no data does get loaded into Slicer.

My purpose is to check if it’s faster via @lassoan’s codes or no, since I am doing the similar things as you do with huge data.

PS: normally I do load data using slicer.util.loadVolume, which gets me sense slow loading.

lassoan · December 14, 2021, 8:09pm

Loading should be fast if you store the images uncompressed. For most medical images, zlib compression only provides about 50% size reduction, so likely it is not worth the increased load time.

jamesjcook · December 15, 2021, 1:29am

I did my testing on 4.10.2 in 2020 and this helped a great deal. It also seemed to successfully get around a big part of the single threaded process which was happening to me after load.

I’ve got similar code running on 20200930 and 20210226 builds. I think those were full release versions. I’ve not tested these as extensively for run time.

The newer builds have been better behaved with my large data.

When you say “it seems the data was not read in” what do you mean exactly?
It loads up instantly? or you have 0 for all voxels? or it is not displayed/all black?

(Also thank you Andras, this snippet you gave me has been exceedingly helpful. Sorry i lost track of this thread.)

aiden.zhu · December 16, 2021, 4:12pm

Yeah, no data shown/displayed at all after I ran that code part. And I did not see any errors coming out.

Topic		Replies	Views
Can Slicer utilise multiple cores on linux? Support	5	896	November 21, 2019
loading a huge amount of data Support	0	199	March 10, 2020
Loading in huge data Support	3	342	November 25, 2020
Nifti loading times Support	6	527	March 4, 2020
Slicer open .001 .002 files Support	21	1132	October 27, 2018

Long delay after disk activity stops when loading very large data

Related topics