As others, I have been using LLMs to help develop functionality in the Slicer. This is typically writing code in VS, then loading it into slicer and testing it, then copy and pasting the error back to the agent to iterate and fix the error.
I found this a bit intimidating (particularly VS code) as someone who doesn’t do programming as a day job. Particularly given nowadays our typical advise on the forum is to say “you can do this with chatcpt”, I thought if I can streamline a few things, and actually do all the work directly inside the Slicer. The idea is to do the prompting in Slicer (whether for a script or a module), let it run, and if it fails, capture the error and send it back to itself to correct it. Then the resultant script is loaded into the scene as a python text node and can be opened and edited inside our Script Editor (which I added a few more things that I think made it more usable, definitely for me).
At this point, as far as I can tell all of this is working. I am coming to a point that my technical skills and understanding is not sufficient to guide the internal prompting about how the agent should behave (as opposed to the user’s prompt about the task). I think it needs an insider view to edit those and create an agent behavior that’s more accurate in the first attempt than the iterating over endlessly. By the way this is all written via vibe coding, so I am not even claim the code is meaningful, but it does seem to do the job in the way I thought it might be useful. I am sure there are better ways to accomplish this, and probably use the tokens more sparingly.
The goal is not to replace IDEs or people’s development environment, but to lower the barrier for people who don’t do programming for a living or would not even consider doing this since the prerequisites it. Maybe start with a simple script. Once that’s working perhaps try to convert it to a module that is more friendly. Or edit the existing python modules for additional tasks. I don’t think I can expand this any further on my own (due to time and said reasons), but if anyone wants to help out, please do so. This might be a nice PW activity.
This is amazing, Murat! I haven’t tried it yet, just looked at the code quickly. As I understand there is no domain-specific training nor knowledge integration involved, there is simply a base prompt to which the user’s request is added. Is this correct?
I want to try this eventually as well when I have time. I think this could be a great tool for onboarding people and for actually doing smaller proejcts. Thanks for your efforts!
No, there is nothing specific, it is all off-the-shelf stuff. I am sure it can be improved a lot if we can somehow do finetuning specific to Slicer (and related things). Yes, there is a “persona” prompt, that gets appended to the user task prompt.
This is the part where we need to provide domain-specific guidelines to the model and programming tasks. For whatever reason, the quality of the results I am getting through this is lower than if I provide the same prompt directly to the agent in VScode using the same model.
My guess would be that VS Code considers at least the file in the active tab (and in addition any that you explicitly specify). That could be already enough context to perform better.
It is not trivial at all how to provide your existing code as context for LLMs.
Cursor is a company currently that does just this and it is currently valued at $30 billion. For me, VS Code works just as well as Cursor. So, it is not surprising that code generation works better inside the VS Code IDE.
For creating and developing a full module, it may be harder to do better than in an IDE. To make the first steps easier it may be enough to provide users templates, tutorials, a good agentic coding manifest (see some information for example here).
However, it could make sense to have a module in Slicer for creating and running code snippets to automate simple tasks (load some data, do some processing, set up visualization, export results, etc.). The module could automatically provide all the necessary context (where to look for code snippets, what data is in the scene, etc.). To facilitate this, we could:
Develop a set of useful high-level functions, which would implement all the tasks that users frequently ask. I would guess that a few hundred small functions would provide all the necessary building blocks. The list of functions and their documentation would be used as context for the code generation.
Set up model context protocol in Slicer that would allow LLMs to verify themselves by testing code in Slicer. There was a post about it not too long ago: MCP-Slicer: 3D Slicer Model Context Protocol Integration. The main difficulty is that you would want to run the LLM in a sandbox or you would really need to be careful about the API you expose.