Read performance between json and fcsv

I have the same markups file with many points saved as fcsv and json. Reading the same file via JSON is instantenous, whereas fcsv takes minutes. This replicates all the time in the windows table. Files are here:

https://app.box.com/s/chydl0iv3pvoq6o4x6cx1wme5n779789

We use a highly optimized json parser library (rapidjson) while fcsv files are just read by a custom parser, so such performance differences are expected.

We should not use fcsv files at all, but we should support importing/exporting csv files instead. For standard csv files we could use fast csv parsers, such as in pandas.

I don’t like the idea of exporting csv’s because it is even more lossy than fcsv. Given that fcsv are essentially csv with headers, can’t we find a middle ground where the specialized parser only reads the header and pandas is used to read the rest?

You can store anything that you need in a csv file. Only those properties cause complications that are common the all control points, but you can store that in either a separate csv file or repeat the value for each control point.

Alternatively, you can parse the special fcsv header and after you determined the columns and the number of first lines that are skip, you can read the rest using pandas.

What is your use case? How did you end up with with a large fcsv file?

We generate large number of landmarks via the tools in SlicerMorph (e.g., PseudoLMGenerator, or PlaceSemiLMPatches) for shape analysis. While we try to steer our users towards JSON pointing out to issues with fcsv, due to both community and paper reviewer requests we need to continue to support FCSV.

Yes, that’s exactly the solution. But I would like to see this done at the core Slicer level.

It would be too much effort fixing and maintaining fcsv as a data saving/loading format (format that allows lossless storage). How would you save all display properties, measurement results, store multiple markups in a single file, etc. while maintaining backward compatibility? It would be practically impossible to use csv-based format for this.

CSV as export/import format can be very useful, but proper generic implementation (field chooser, etc.) would be a lot of time to implement and would never be as convenient as specialized exporters/importers (that allow loading markups to/from well-established standard csv-based file formats).

If you can get funding for developing a generic markup csv importer/exporter then it should be OK to keep it in the Slicer core, but it is more convenient for users and less work for developers if efforts are focused on implementing domain-specific csv importers/exporters in extensions.

The performance differences when loading the two datasets is pretty striking and there’s clearly a bug in the fcsv loading code. When I sample the process using the ActivityMonitor on macOS it shows that 95% of the time is in vtkMRMLDisplayNode::UpdateScalarRange() on each point insert, which leads to a parametric curve evaluation even though these are fiducials and not curves. So the logic needs some work.

I filed an issue to track it.

    +                                                                                       783 vtkSlicerMarkupsLogic::LoadMarkups(char const*, char const*)  (in libvtkSlicerMarkupsModuleLogic.dylib) + 1381  [0x15b7c3145]
    +                                                                                         783 vtkSlicerMarkupsLogic::LoadMarkupsFromFcsv(char const*, char const*)  (in libvtkSlicerMarkupsModuleLogic.dylib) + 1202  [0x15b7c3e32]
    +                                                                                           783 vtkMRMLStorageNode::ReadData(vtkMRMLNode*, bool)  (in libMRMLCore.dylib) + 253  [0x10ecf2f9d]
    +                                                                                             782 vtkMRMLMarkupsFiducialStorageNode::ReadDataInternal(vtkMRMLNode*)  (in libvtkSlicerMarkupsModuleMRML.dylib) + 3691  [0x15b026b0b]
    +                                                                                             ! 759 vtkMRMLMarkupsFiducialStorageNode::SetPointFromString(vtkMRMLMarkupsNode*, int, char const*)  (in libvtkSlicerMarkupsModuleMRML.dylib) + 3464  [0x15b024238]
    +                                                                                             ! : 747 vtkMRMLMarkupsNode::SetNthControlPointPosition(int, double, double, double, int)  (in libvtkSlicerMarkupsModuleMRML.dylib) + 371  [0x15b04c8c3]
    +                                                                                             ! : | 746 vtkMRMLDisplayNode::UpdateScalarRange()  (in libMRMLCore.dylib) + 23  [0x10ebb2fb7]
    +                                                                                             ! : | + 746 vtkMRMLMarkupsDisplayNode::GetScalarDataSet()  (in libvtkSlicerMarkupsModuleMRML.dylib) + 114  [0x15b020122]
    +                                                                                             ! : | +   746 vtkMRMLMarkupsNode::GetCurveWorld()  (in libvtkSlicerMarkupsModuleMRML.dylib) + 48  [0x15b04ecc0]
    +                                                                                             ! : | +     744 vtkStreamingDemandDrivenPipeline::Update(int, vtkInformationVector*)  (in libvtkCommon-8.2.1.dylib) + 283  [0x11de507ab]
    +                                                                                             ! : | +     ! 744 vtkStreamingDemandDrivenPipeline::ProcessRequest(vtkInformation*, vtkInformationVector**, vtkInformationVector*)  (in libvtkCommon-8.2.1.dylib) + 706  [0x11de50242]
    +                                                                                             ! : | +     !   719 vtkDemandDrivenPipeline::ProcessRequest(vtkInformation*, vtkInformationVector**, vtkInformationVector*)  (in libvtkCommon-8.2.1.dylib) + 792  [0x11de29758]
    +                                                                                             ! : | +     !   : 719 vtkCompositeDataPipeline::ForwardUpstream(vtkInformation*)  (in libvtkCommon-8.2.1.dylib) + 297  [0x11de25d59]
    +                                                                                             ! : | +     !   :   719 vtkStreamingDemandDrivenPipeline::ProcessRequest(vtkInformation*, vtkInformationVector**, vtkInformationVector*)  (in libvtkCommon-8.2.1.dylib) + 706  [0x11de50242]
    +                                                                                             ! : | +     !   :     719 vtkDemandDrivenPipeline::ProcessRequest(vtkInformation*, vtkInformationVector**, vtkInformationVector*)  (in libvtkCommon-8.2.1.dylib) + 1175  [0x11de298d7]
    +                                                                                             ! : | +     !   :       719 vtkCompositeDataPipeline::ExecuteData(vtkInformation*, vtkInformationVector**, vtkInformationVector*)  (in libvtkCommon-8.2.1.dylib) + 107  [0x11de24b1b]
    +                                                                                             ! : | +     !   :         718 vtkDemandDrivenPipeline::ExecuteData(vtkInformation*, vtkInformationVector**, vtkInformationVector*)  (in libvtkCommon-8.2.1.dylib) + 61  [0x11de2a13d]
    +                                                                                             ! : | +     !   :         | 718 vtkExecutive::CallAlgorithm(vtkInformation*, int, vtkInformationVector**, vtkInformationVector*)  (in libvtkCommon-8.2.1.dylib) + 69  [0x11de2f515]
    +                                                                                             ! : | +     !   :         |   717 vtkCurveGenerator::RequestData(vtkInformation*, vtkInformationVector**, vtkInformationVector*)  (in libvtkSlicerMarkupsModuleMRML.dylib) + 289  [0x15b0601a1]
    +                                                                                             ! : | +     !   :         |   + 717 vtkCurveGenerator::GeneratePoints(vtkPoints*, vtkPolyData*, vtkPolyData*)  (in libvtkSlicerMarkupsModuleMRML.dylib) + 139  [0x15b06026b]
    +                                                                                             ! : | +     !   :         |   +   666 vtkCurveGenerator::GeneratePointsFromFunction(vtkPoints*, vtkPoints*, vtkDoubleArray*)  (in libvtkSlicerMarkupsModuleMRML.dylib) + 636  [0x15b06278c]
    +                                                                                             ! : | +     !   :         |   +   ! 635 vtkParametricSpline::Evaluate(double*, double*, double*)  (in libvtkCommon-8.2.1.dylib) + 53  [0x11de66ff5]
    +                                                                                             ! : | +     !   :         |   +   ! : 211 vtkParametricSpline::Initialize()  (in libvtkCommon-8.2.1.dylib) + 2504  [0x11de67ab8]
    +                                                                                             ! : | +     !   :         |   +   ! : | 138 vtkPiecewiseFunction::AddPoint(double, double, double, double)  (in libvtkCommon-8.2.1.dylib) + 780  [0x11dcf0a9c]
    +                                                                                             ! : | +     !   :         |   +   ! : | + 57 void std::__1::__sort<vtkPiecewiseFunctionCompareNodes&, vtkPiecewiseFunctionNode**>(vtkPiecewiseFunctionNode**, vtkPiecewiseFunctionNode**, vtkPiecewiseFunctionCompareNodes&)  (in libvtkCommon-8.2.1.dylib) + 524,556,...  [0x11dcf2fac,0x11dcf2fcc,...]
1 Like