I saw @muratmaga above being in apparently similar situation, I felt I should not keep my thoughts for myself so submitted it here for comments, brainstorming but also to collect critic.
if it is OK to install large Python packages in user-writable folders
So far my goal is to create centralized installation which will cover typical scenario of typical user, but keeping the possibility to install user extensions.
Hypothetical situation with having each user having very own Pytorch is indeed suboptimal, ideal solution could be based on concepts known in standard Python distributions - anything what is in PYTHONPATH is considered as python library… So introducing eg. SLICERPATH would make this for us much easier…
To me it loooks that Slicer has Windows-desktop-centric concepts, and I understand that HPC/cloud computing approaches, with features such as immutable app directories and software module systems, might seem exotic at first glance…
…but there are some reasons why the evolution of scientific computing / HPC went in this direction…:
science reproducibility - often various team-members onboards at various time, if they’d install the tools themselves, they’ll end with various versions possibly producing various results. With software modules it is easy to eg. require everybody use particular version of tool for given project, and eg. the latest one for casual work.
aim to save human labor Why ask every single user to download tool, tweak it in certain way, if this could be done centrally? Some users are not IT-savvy, simple module load Software and Software is then universal for all tools. There are projects with multiple team-members having identical setups. Their time is valuable, and they should not spend the time by repeating install procedure.
governance of software updates Not every researcher reads announcements of majors tools. Scientific support engineers do.
efficiency of software repository Today we have modern ways how to distribute scientific software - eg. CVMFS - this enables institutions to share our software repositories and letting users just “bring your own data”, or e.g use cvmfs-distributed software at various collaborating institutions.
platform optimization - with platform knowledge you can provide users with optimized numeric libraries, depending on eg. CPU microarchitecture or CUDA compute capabilities.
EDIT: and possibly one last, reason:
Slicer is one of many software we support - various users of our infrastructure use various apps in various ways, Conda, HCP Pipelines and ConnectomeWorkbench, Matlab, MRtrix, DCMTK, FSL, FreeSurfer… and perhaps tomorow Slicer too When new researcher comes, (s)he can quickly start with any tool with support. The low barrier is key.
About 95% of the Slicer users are using Slicer in Windows or Mac, and the remaining portion are probably using Linux boxes that they have full control over (like desktop computer). Current design has a long history, and it was aimed to make Slicer portable (i.e., if you put in the entire directory Slicer in a USB drive it should work, and it does). So it is normal a lot of things are designed for a proper desktop like enviornment with interactivity in mind.
Up until recently it was not easy (or sometime impossible) to deploy interactive UI applications in queue-based HPC systems, but it is becoming more and more common and doable. With powerful nodes and GPUS, if one has access to a local HPC, it makes sense to use it instead of cloud since probably costs are lower, plus your data is easier to get to (or out) than a remote cloud. So the discussion is more about how we can make it work without breaking what other 95% of users are used to.
I think for reproducibility point of view, what slicer does makes sense. Every package and extension you used to process the data is under the installation tree. if you want to preserve all you have to do is achive tar/zip it. For simpler than creating a docker or like container. Your zip file is the container.
My institutional environment has additional administrative challenges which uses proxies, firewalls, self-signed certificates making a user installed package like Slicer not to work very well (like extension manager would fail with https calls, etc). Otherwise I would probably let them install the software, but then everyone would have to figure out how to make the certificates work, which is a big ask. So in this particular case maintaining it centrally makes sense…
We are also starting a new project to deploy Slicer on the cloud, and some of these discussion points are relevant. So I think we should keep talking, and I will definitely experiment with your install script and see if it helps for us.
So what is going to happen if people need different (or specific) versions of these packages (e.g., due to API differences or extension ask for a specific version of the python package), if the python packages are maintained centrally?
this is very inspirative question. I have to say that I came here exactly for this. Thank you! The more I learn about slicer, the more I somehow understand the current state of things…
I see the need of somehow hybrid approach.
I am converging to concept of providing our users with
(1) “slicer + goodies” - slicer + common modules + the possibility to install custom add-ons somewhere in $HOME…
(2) “bare” slicer - no site-installed add-ons - to be used in cases, when slicer+goodies distribution is not satisfactory, dependencies are blocking custom add-ons, etc. Extensions to be installed on top of that in $HOME.
We need to investigate use-cases of our researchers. Is it that they tend to use the same set of extensions? Or is it that they go shopping for addons, quite often? Do we have some idea how typical user acts?
Based on my experience with Python - we’re often able to cover need for basic modules, but at the end, every researcher ends with some custom per-project packages done by pip/conda/venv.
Yes. And the amount of data we’re collecting about our research subjects and increased awareness about data security is also significant motivation to keep data in well defined and secured server environments…