Reading/writing hundreds of files have significant overhead, so probably dcmqi should have support for storing multiple segments in one 3D volume, and allow multiple 3D volumes for storing overlapping segments (so the segmentation file is 3D if all segments are non-overlapping; 4D if there are overlaps). We have implemented functions in Slicer for splitting/merging overlapping/non-overlapping segments that you might find useful for this.
We have found that even with using a single 3D file for storing all the segments, splitting them to separate volumes and process one by one is much slower. For example, when in Slicer we stored each segment as a separate volume, it took about 30 seconds to load the Brain Atlas. When we switched to storing all segments in the usual single 3D shared labelmap representation then loading time went down to a fraction of a second. I know that supporting this would require improving the DICOM standard, but I think it would worth it, not only because of the speed improvement but also because of simplicity and reduced memory need.