Should we start collecting software usage data?

lassoan · July 29, 2023, 7:52pm

Several extension developers asked about and toying with the idea of starting to collect software usage data. I completely agree that such data can be useful for prioritizing developments and can be a good motivator for developers to devote more of their time to a project. However, I also see a huge risk of eroding users’ trust if extension developers start to implement some home-made mechanisms for phoning home from their modules to collect such data.

I believe there is a way to fulfill this request of developers while also respecting user privacy, because it has become standard practice in many software to opt in into sharing usage data. If we develop such feature carefully and do everything very transparently then it should not have a negative impact. What do you think?

For example, what I could imagine to be acceptable is an opt-in feature that would allow collecting some counter values (e.g., how many times a certain features is used) and reporting to a Slicer-community-maintained server, with the results publicly available. The mechanism could be used by both Slicer core and extensions. Module developers could specify what events they want to count. Users could inspect what data is sent and they could opt out anytime.

mau_igna_06 · July 29, 2023, 11:25pm

In BoneReconstructionPlanner, the Mandible Reconstruction extension, we would like to collect at least a metric related to ‘a unique Virtual Surgical Plan and Personalized Surgical Guide design per patient per case’. I guess this could involve detecting when a 3D model (tagged as surgical guide) is saved as a file (to be 3D printed later). Maybe also distinguish per patient by converting the patient MRN or other personal data to an encrypted SHA string. Maybe also detect between scene created/imported and scene closed events.

lassoan · July 30, 2023, 1:28pm

Thanks for the feedback. So, it seems that a few event counters specific to your module would suffice. I guess the counts should be accumulated per day.

This is a good point, we should decide if fingerprinting user or patient data is acceptable and describe some guidelines for module developers.

jamesobutler · July 30, 2023, 2:08pm

Extension developers could do this today which is a concern that has to be accepted with the current use of the Slicer Extensions Index. For extensions to make it into the Slicer Extensions Index, there is an approval process which implies that Slicer leadership has put their stamp of approval on the extension and that it is safe to use. However this may not be the intention.

Is there any funding to moderate the content on the Slicer extensions index to enforce a Slicer defined privacy policy? This is to provide confidence in the First Party App Store solution.
- If minimal funding, maybe the Slicer extensions index has to set all extensions to a specific git hash, so that every update of the extension can be reviewed to confirm compliance. This would certainly reduce convenience, but the value gain would be increased trust of the first party App Store. Maybe compliance review could be semi-automated instead of fully manual?
- If no specific funding, maybe finding a way for extensions to self-publish their extension where they don’t have to follow the Slicer Extensions Index privacy policy. This would be the Third Party App Store solution. The Extensions Manager could point to a web address to the third party location of the extension.
What might be the process of Slicer Custom Applications that use extensions to verify that their use of the extension is not sending data to an outside source? To make sure the extensions are not breaking their custom app’s privacy policy?

Would the user have any rights to request deletion of data that they submitted (not just opting out of future sending of data)? This request might originate when the user accidentally submitted details when using Slicer in a shared computing environment like an academic lab computer. A shared user might have accepted the policy, but not the current end user.
- To avoid accidental submission, indicating a pop-up at startup whenever opt-in behavior is on could be a possible solution.

lassoan · July 30, 2023, 5:35pm

Thank you, these are all interesting points.

They can do it because we have not set rules to explicitly forbid this. We could simply say that extensions must not collect and transfer usage data. However, I feel that it could be a fair request from developers to get some feedback, so we should consider not simply reject this request but find a controlled way to do it.

The extension approval process is an interesting topic, too, but I think it can be discussed separately from the software telemetry rules. We may bring it up later, when we decide on mechanisms to enforce rules. However, since we allow usage of PyPI for installing extensions, it does not make sense to be more strict than PyPI, which means that you need to trust the maintainers of the packages that you install.

Most important mechanism is trusting the developers of the extension. For further guarantees, the custom application developer could provide its own extensions catalog that only contains a set of reviewed and approved extension packages.

I think we should only submit anonymized data. This way we would not need to worry about GDPR either. It would clearly provide more accurate and robust data if individual users were identified, but I don’t think it is worth the enormous amount of extra complexity of handling such data. I’m not sure about recording location (e.g., country or city) to submissions, as it would be extremely useful, but it may be considered as personal data.

rbumm · July 31, 2023, 9:01am

Having access to anonymous usage data for 3D Slicer and all its extensions would be incredibly beneficial. Since May 2023, I have been compiling such data for my extension. It has proven enlightening to understand which components of my extension are utilized more frequently, enabling me to focus my developmental efforts appropriately.

It would be fantastic to establish a universal counter mechanism for 3D Slicer, thus simplifying data collection. I posit that an opt-in may not be necessary as long as we ensure the data transmission is strictly anonymous (no IP or time tracking) and solely conducted via a 3D Slicer-specific mechanism.

In fact, I would even advocate for an automatic crash reporting feature within 3D Slicer and its extensions, this one of course, contingent on explicit user consent.

If an extension would collect outcome-related data (quality of airway sementation such as numbers of branches generated, resolution of CT used, time used for AI analysis, CPU or GPU used for AI etc) there should be a clear Opt-IN for that kind of data transfer.

lassoan · July 31, 2023, 3:58pm

Any kind of non-networking task (e.g., starting a segmentation task) initiating network communication without explicit user approval could be interpreted by some users as spying. Users that are sensitive about tracking their activities install firewalls that immediately notify them about all network requests and they are upset if they find out that a software attempted to send information without consent, may start a smear campaign, etc. that we should all absolutely avoid. I think allowing sending network request without explicit approval of network communication could be acceptable only for operations of that the main goal is to request information from a server (e.g., update check).

@muratmaga as far as I remember you have been asking about more detailed usage data some time ago. Could you comment about this discussion and describe your needs?

@pieper @jcfr do you have anything to add?

pieper · July 31, 2023, 6:39pm

I’m very curious to hear from Slicer users, particularly those with privacy concerns, about their levels of concern for various options that have been discussed here.

For example, is opt-in required or could we have a default setting to collect and share anonymous data as long as the user is notified at least on the first use? Do you want to be able to inspect the data that is being sent to our servers?

I also think it’s important to keep in mind that Slicer is a community project and we benefit most from those community members who share the most, be it through code, forum posts, or spreading the word to their colleagues. Without dismissing the concerns of those who prefer to remain private, I think it’s fair to prioritize the needs and opinions of those who are active in improving the community. From this point of view I favor collecting and sharing user data that will benefit developers in making Slicer better.

fedorov · July 31, 2023, 8:58pm

For the sake of transparency and accountability, I would suggest that every extension should clearly declare to the user whether any information is collected, and this declaration should be available in the extension UI, and populated in the extension template.

Since the resources available for the review of extensions will always be limited, similar to license considerations, users will have to rely on the declarations made by the extension developers. While it is ok to communicate license details in the code repository, an equivalent to the privacy statement should be readily available to the non-developer users. Whether opt-in is implemented or not could be the choice of the developer and may depend on various considerations. But it should be the choice of the user to use the extension or not given the privacy statement declaration.

I think it is potentially a dangerous situation at the moment, where (as it appears from the discussion above), some extensions already are collecting and communicating certain usage information without users necessarily being aware of this.

rbumm · August 1, 2023, 5:56am

Good suggestion @fedorov, it is implemented in the help text of the LungCTAnalyzer now.

jcfr · August 1, 2023, 3:24pm

Related resources we found while @mau_igna_06, @lassoan and I were discussing during the weekly meeting:

Policies/Telemetry Policy - KDE Community Wiki

lassoan · August 2, 2023, 4:22am

We also talked a bit about possible implementations. It seems that adding a Telemetry extension could be a good idea, as this way we could avoid adding any tracking related code to Slicer core, we could make more frequent changes to the telemetry infrastructure, and any extensions that need to track software usage would need to declare that by depending on the Telemetry extension.

pieper · August 2, 2023, 9:43pm

I started something like this a few years ago:

I was using Google Analytics for the back end, but I think now it’s not a good fit. Still the basic idea could be adapted to a different backend.

muratmaga · August 2, 2023, 10:37pm

We do have not very specific needs. As a grant funded project, it would be nice to have some kind of an independent way of assessing the user counts/activity beyond raw extension download stats, that’s the primary motivation. I think our usage would be like what modules were activated, for how long etc… We already have a sense of this through user interactions and workshops, but would be nice to get a bit more quantitative.

As for how to handle this, I have mixed feeling about a separate telemetry extension. As a user, I wouldnt go into the hassle of installing yet another extension to allow the devs to collect data (just seems too cumbersome). I would be OK with a very explicit message at the end of the installation process that asks my permission to do background telemetry about usage (with the default setting set to not to collect), and even provide a link to the github repo for that section of the code for the inquisitive ones to review, as well as info about what info being collected.

The issue, as it has been pointed out, institutions like where I work, which has very tight internet security policies. The key I think as i said, is to make this very explicit (and opt-in by actively selecting an option), and directly pointing out to the code of telemetry (for people to review, if they want to).

jamesobutler · August 3, 2023, 2:01am

Also a heads up that maintainers of Slicer extensions such as SlicerTotalSegmentor will need to be aware of associated python packages (TotalSegmentor) that contain telemetry where it is ON by default. SlicerTotalSegmentor currently uses TotalSegmentor v1.5.3, but starting in v1.5.4 there is telemetry.

lassoan · August 3, 2023, 3:29am

By default, extension dependencies are installed automatically, so users would not need to do anything extra.

This is very useful information. TotalSegmentator offers huge value, so asking a little favor in return could be fair. However, phoning home without user consent violates Control rule of KDE.

Maybe a solution could be to add a global telemetry kill switch to the Slicer application, which would apply the setting to all extensions and external dependencies that otherwise would not behave properly (e.g., would do silent opt-in by default). This would address the issue with both TotalSegmentator and LungCTAnalyzer.

cpinter · August 9, 2023, 11:43am

I like the idea of the Telemetry extension, and I think it’s a good approach that the extensions collecting such data must depend on it. But if the user does not install any such extension, Telemetry would not be collected for Slicer core either, until the user installs the Telemetry extension manually. My feeling is that this way almost no usage information would be collected about Slicer core (<1% of the users would have this enabled, unless a very popular extension depends on it).

Have you discussed this use case at the dev meeting, when the user only uses Slicer core or only extensions that do not collect usage data? For example a popup on first start offering to install the Telemetry extension (which could be an annoyance)…

lassoan · August 9, 2023, 12:25pm

I agree, for collecting usage data about Slicer core, the user would need to opt in in the installer or in the welcome dialog and this would need to be in Slicer core. If user consented then Slicer would install the telemetry extension.

It would be nice if even the question about telemetry was implemented outside Slicer core, but I don’t think it is possible. We can put telemetry configuration into a separate module and include it in the build depending on some CMake flag.

pieper · August 10, 2023, 2:18pm

Since we know that many people will opt out of tracking and telemetry, why don’t we try to make sharing of usage information fun and easy for people who do want to share?

I’m thinking we could track usage locally and generate badges for users that pass a certain number of hours of usage, or try certain modules or extensions. Then we could offer a button to post these achievements on social media or discourse where we would get statistics automatically. This could also be a place where people could opt-in to automatically posting their progress.

Badges could be for things we want to collect usage information on like “Loaded 10 dicom studies”, “Loaded 1000 dicom studies”, “Saved and reloaded an MRB scene file”, “Installed an extension”, “Used the Python console”, etc. Even seeing the list of potential badges might encourage people to explore features they didn’t know about. This approach would be very transparent and might encourage community building.

mau_igna_06 · August 10, 2023, 2:46pm

I would suggest we make a quick and very simple implementation.

We could infer the opt-out number of users from the number of downloads of Slicer and the number of users that sent telemetry data on a given day (e.g. the only collected variable could be the boolean opted-in).

After we do that, then we may take other long-term telemetry design decisions. Or do other small experiments.

Hope this opinion is favorable for the discussion

Topic		Replies	Views
2023.08.01 Weekly Meeting Weekly meetings	2	308	July 31, 2023
Add number of extension installations to extension manager Feature requests	21	607	July 23, 2021
Updated Slicer binary automatic notification Feature requests	6	634	March 21, 2023
2023.10.31 Weekly Meeting Weekly meetings	1	239	October 30, 2023
ChatGPT knows a little about Slicer Development	8	885	February 3, 2023

Should we start collecting software usage data?

Related topics