morphoclass.console.dataset_preprocessor module

Implementation of the data preprocessor.

class morphoclass.console.dataset_preprocessor.Preprocessor(neurondb_path, morphology_dir, report_path, csv_path)

Bases: object

The data preprocessor.

Parameters
  • neurondb_path – The path to the neurondb file compatible with the morph_tool specifications. Typical file extensions for this file are DAT or XML. See morph_tool.morphdb.MorphDB for more details.

  • morphology_dir – The directory with all the dataset morphology files.

  • report_path – The path for the HTML report file.

  • csv_path – The path for the output CSV file.

collect_duplicated_morphologies()None

Collect duplicated morphologies based on file content.

Collect for the report the neurons that have the same morphology, but happens they have different file names (bad if it happens…)

custom_preprocess(dataset_type: morphoclass.constants.DatasetType)None

Run custom processing depending on the dataset type.

In this method, user is able to modify the dataset depending on specific requirements. Some things that can be modified are:

  • ml_data.df_morph is a dataframe with columns: name, mtype, path, and user can filter out the rows containing morphologies not needed in this dataset (e.g. interneurons dataset contained some PC cells).

  • ml_data.template_vars is a jinja2 dictionary that will be used to generate the report with raw data statistics (report_rawdata.pdf). If you add new keys, don’t forget to modify the corresponding report_template.html. Be careful, to add new key, use: ml_data.template_vars.update({"new_key":"new_value"}) which won’t overwrite already existing values in the dictionary.

  • For other options, check out the data.OrganizeMLData class.

drop_duplicate_files()None

Filter out the duplicates, but store dropped for the report.

drop_mtypes_with_one_neuron()None

Work only with the classes that have >=2 neurons.

run(dataset_type: morphoclass.constants.DatasetType)None

Preprocess the data and save dataset.csv.

save_dataset_csv()None

Save the dataset CSV file to disk.

save_report(report_title: str = 'Dataset Report')None

Write the report to disk.