Hello.
I have put lamdmarks in some models created by segmentation editor. I’m trying to parse them on GPA , but they aren’t recognized when I select the landmark folder I choose. I think it’s probably a storage format issue …please let me know how to do it.Any helps on this is greatly appreciated.
Landmark needs to be saved in fcsv format. İf you are using a recent version, the default is json. Make sure you change that to fcsv when you are saving or use export as from the data module.
Thank you for your teaching! I have changed it to FCSV and saved it, but, the message “warning: fcsv file format only stores control point coordinates and a limited set of display properties” is displayed. And then, when I select the landmark folder in GPA, the fcsv in the folder is not recognized …
You can ignore that warning message. I am not entirely sure what you mean by
What is not recognized? Are you getting an error message (Ctrl + 0). You need to have multiple fcsv files (belonging to multiple specimens) in that folder for GPA to work. You may want to review the instructions for GPA here
Thanks a lot. I don’t know how I can say…,
this is Python Interactor’s message when I try to execute GPA
Again this is only part of the log. In this window, if you click CTRL+A it will select all the text, and then you can do CTRL+C and CTRL+V to copy and paste. By the way, are you using non-English characters in your filename?
@smrolfe can this be a non-ascii filename issue?
What do you see in the first few lines of the application log (menu: Help / Report a bug)? In particular, what is the content of the line starting with “Operating system”?
This is important because you need to use a recent Windows10 version if you want to use unicode strings in text files or you want to use directory and file names that contain non-ASCII characters. If you cannot update to latest Windows10 then move your data into files and folders that do not have any special characters in their name.
I think it may be due to a non-ascii character in the fcsv file. @hajime_osaki if you can share a sample landmark file in addition to the error log that would be helpful.
@smrolfe you are right, the error occurs in for row in datafile
, which means that the file contains string that does not use UTF-8 encoding (e.g., may use some Japanese code page) or the application does not support UTF-8 code page because the application is not recent enough.
This is yet another example of why we need to get rid of this old .fcsv format. This error could not have happened with the new .json file format:
-
Text encoding is specified in json standard (it is required to be UTF-8), therefore it is always known how to decode the file content.
-
The 10-line handcrafted Python code that parses the csv file is relatively complex and very fragile. In contrast, parsing the json file is a single command and you can get any content from the file incredibly easily.
Read and parse markup json file:
import json
with open('path/to/my.mrk.json') as f:
json_data=json.loads(f.read())
- The parser contains the comment “Imports the landmarks from a .fcsv file. Does not import sample if a landmark is -1000”. I think it is obvious that just bad this is. We should do better. And fortunately, we can: markup json file includes
positionStatus
property for each control point, which is accessible simpy ascontrolPoint["positionStatus"]
.
Print position and status of each control point:
for controlPoint in json_data["markups"][0]["controlPoints"]:
print("position: {0}, defined: {1}".format(controlPoint["position"], controlPoint["positionStatus"] == "defined"))
@muratmaga @smrolfe Please move away from fragile and inflexible legacy formats, especially because we now support modern and sustainable file format.
We will move to json, but not immediately.
For one, we (as in my lab) have thousands of datasets that already collected, and archived and converting them takes time. I remember when Slicer 4 released in 2012 and made the decision that each fiducial file is going to be saved as a single acsv file, it took years to stabilize around fcsv, and we had to do the similar conversion.
As for designating -1000 as a missing landmark, that’s a legacy of Slicer not being able to generate a blank fiducial. That has been a request of mine way before we started the SlicerMorph project, we still can’t do it. Since we have to maintain landmark order and numbers across dataset, we had to invent a way to encode this, and we have to make that known. Switching to JSON will not fix this problem.
Landmarks are a very important part of morphometric analysis. FCSV is not perfect, but it is easy and labs (not just mine) that use Slicer for data collection has been using it for a while, and forcing them to use JSON will make their historical data being inaccessible to GPA module.
Many decisions in Slicer development turn out well, while some others don’t. We are not always smart enough and/or not always lucky - it is hard to predict what technologies become mainstream in 5-10 years.
We aim to preserve Slicer’s backward compatibility with hardware, software, and data formats for at least 5 years (so you don’t have to upgrade, if you don’t want to), which is consistent with industry standard for long-term support timeline (typically 3-5 years).
We have implemented all the infrastructure for this, and the only thing missing is exposing it on the GUI - about a few days of work. I thought your team was working on it and assumed that it was delayed because it was not that important. If you need help with this then let me know.
If we could demonstrate clear advantages of using a richer and more standard file format then I think users would transition voluntarily. Probably some more features (landmark placement templates with undefined point positions, etc.), better documentation, training, and examples (e.g., to show how easy it is to create data tables from json files) would help.
What is important is to provide as good support for json as for fcsv, in all modules. There should not be modules that can only accept fcsv format and not json.
I would also add standard csv export (single-line header, columns in first row) option for better compatibility with table-based analysis. This should also help in reducing usage of the proprietary fcsv format.
I am not saying json would be a poor decision. I think its benefits, particularly for measurement type markups will overweight its complications in the long run. It will happen, but will take time as there will
be inertia among groups, due to similar concerns to mine. When the switch to json first tabled on discourse earlier this year, I did probe the morphometrics community through the listserv, and there was a push back from other groups that use Slicer for types of research similar to ours. That’s mostly due to lack familiarity with the format, and JSON being quite verbose. Saving 3 control points in fcsv is 6 lines, same thing in json is 74. People who are used to looking flat files for a long time, that nested structure do appear confusing at first, and it seem like unnecessarily complex format to extract 9 coordinates.
That is a far too short time for types of research (natural history) that our community does
I think the main issue is that it is new and people just don’t know how much simpler and more efficient it actually is to work with json files than with csv. The main idea is that you can store all data in json (any number of markups, any metadata) and generate a table view from any parts of it, using a single line of code.
Yes, json is verbose, but it is not that bad. Many of the elements are optional (see schema). A minimal file consists of 1-2 lines of header, one row per fiducial, and one closing line, something like this:
{"@schema": "https://raw.githubusercontent.com/slicer/slicer/master/Modules/Loadable/Markups/Resources/Schema/markups-schema-v1.0.0.json#",
"markups": [{"type": "Fiducial", "coordinateSystem": "LPS", "controlPoints": [
{ "label": "F-1", "position": [-53.388409961685827, -73.33572796934868, 0.0] },
{ "label": "F-2", "position": [49.8682950191571, -88.58955938697324, 0.0] },
{ "label": "F-3", "position": [-25.22749042145594, 59.255268199233729, 0.0] }
]}]}
Manual editing is not much worse either, because all modern text editors help json editing with features syntax highlighting, automatic formatting, validation, etc.
Running analysis on a json file should be as simple as on a csv file, because you can read it into a dataframe using a single line of code. For example, getting a table of control point labels and positions : using pandas (pip_install('pandas'); import pandas as pd
):
controlPointsTable = pd.DataFrame.from_dict(pd.read_json(input_json_filename)['markups'][0]['controlPoints'])
Result:
>>> controlPointsTable
label position
0 F-1 [-53.388409961685824, -73.33572796934868, 0.0]
1 F-2 [49.8682950191571, -88.58955938697324, 0.0]
2 F-3 [-25.22749042145594, 59.255268199233726, 0.0]
Splitting the position vector column and into separate xyz columns and write it to csv takes 3 lines:
controlPointsTable[['x','y','z']] = pd.DataFrame(controlPointsTable['position'].to_list())
del controlPointsTable['position']
controlPointsTable.to_csv(output_csv_filename)
Resulting csv file:
,label,x,y,z
0,F-1,-53.388409961685824,-73.33572796934868,0.0
1,F-2,49.8682950191571,-88.58955938697324,0.0
2,F-3,-25.22749042145594,59.255268199233726,0.0
For what it’s worth, I had planned on taking half a day to convert my code that uses fcsv to using json. It took me about 15 minutes.
Oh, and that includes keeping in the old code - basically I check for a json file, if it exists, I use that, otherwise it parses the old fcsv file. (This is a script that runs through a hundred scans or so to tally what fiducials/markups have been placed on what scan, and where.)
We maintain SlicerMorph primarily for 3D landmark digitization and for basic shape analysis and visualization. Most of our users end up using R shape analysis packages for more complex analysis they want to do. It looks like, it is going to be fairly straightforward to read coordinates directly from Json as a data matrix like we currently do with fcsv. We will show it in our next user check-in (this week). But again, it will take time.
Does this mean, you guys are planning to switch to saving all markups generated in a scene to a single json file? Currently every node is saved as a distinct file. I would prefer keeping the current behavior.
Also I still do not see a units tag in the json.
This is not a big deal for your own data, it becomes complicated when you have collaborators because everyone has to update their workflow at the same time.
Yeah, I’m pretty spoiled with a user base of… one. I didn’t mean to sound insensitive to issues of keeping many others happy.
We need to keep one-to-one mapping between MRML node and data file for scene saving, so we don’t plan to change the current save/load behavior. However, we can export an entire folder of markups into a single file, and import an entire group of markups from a single file (don’t try it, not implemented yet, but can be implemented anytime, with a few-hour effort).
This has not been implemented yet. I’ve added a ticket to make sure we don’t forget about it: Save length unit in markups files · Issue #5261 · Slicer/Slicer · GitHub
Again, the beauty of json is that we can make such changes in a clean and backward compatible way: we add a new “length unit” property, describe its type, valid values, etc. and the default value, so that we know how to interpret all those files which were created before this field was introduced. Such mechanisms can help with keeping all older mkp.json files valid and well defined, even as the format evolves. We can of course still make bad decisions and need to live with the consequences, but at least we have a number of mechanisms to properly handle some unavoidable mistakes.
Thanks @lassoan. We still plan to work on this although the project got moved back due to other deadlines.