10-30 seconds load/save time is normal for a 1GB volume if it is compressed. If you can zip a 1GB file on your computer much faster with any software then let us know.
GPU (or many-core CPU) can only make an algorithm faster if you can split the processing to many independent tasks. Unfortunately, most algorithms are sequential in nature and you need to make significant effort to redesign them to allow running some parts of it in parallel. GPU implementations of algorithms have many disadvantages compared to CPU versions, such as they are much more complex (you can only utilize a GPU if parts of the code run concurrently, and this is always complex), not as flexible as CPU implementations (only work for certain data types, require certain graphics hardware and software capabilities, etc.), much harder to debug (no direct access to data during interactive debugging), and it takes significant amount of time to move data between GPU memory and the rest of the application (CPU memory).
In practice, for generic computational algorithms (that are not particularly easy to parallelize) you can only achieve a modest performance improvement (let’s say computation will be 2x faster overall), which is of course nice, but not worth the effort of implementing and maintaining parallel CPU/GPU implementations of large parts of the code.
We heavily utilize GPUs for rendering and other processing steps that are very close to the end of the data processing&rendering pipeline, because GPUs are well positioned to perform these steps for many reasons. The data is already in the GPU, many rendering operations are relatively easily parallelizable, you don’t need to send the results back to CPU, rendering can run in parallel with processing, rendering pipeline is relatively rigid compared to data processing pipelines, etc.
We have a infrastructure and few examples of doing other data processing on GPU. Therefore, if you identify performance bottlenecks in your workflow then you may decide to reimplement those steps and run a bit faster on GPU. However, in general, best use of your resources is to utilize the GPU for tasks that its special architecture fits well naturally.
To improve performance make sure that the input volume is cropped and downsampled to the minimum necessary size (you can use Crop volume module). If you cannot go any lower and performance is still not good enough then use a computer that has high CPU clock rate and lots of fast RAM. If that’s not feasible then you can to cut up your volume to smaller ones, edit these pieces one by one, and in the end put the pieces back together into one large volume.