morphoclass.console.dataset_preprocessor module¶
Implementation of the data preprocessor.
-
class
morphoclass.console.dataset_preprocessor.Preprocessor(neurondb_path, morphology_dir, report_path, csv_path)¶ Bases:
objectThe data preprocessor.
- Parameters
neurondb_path – The path to the neurondb file compatible with the morph_tool specifications. Typical file extensions for this file are DAT or XML. See morph_tool.morphdb.MorphDB for more details.
morphology_dir – The directory with all the dataset morphology files.
report_path – The path for the HTML report file.
csv_path – The path for the output CSV file.
-
collect_duplicated_morphologies() → None¶ Collect duplicated morphologies based on file content.
Collect for the report the neurons that have the same morphology, but happens they have different file names (bad if it happens…)
-
custom_preprocess(dataset_type: morphoclass.constants.DatasetType) → None¶ Run custom processing depending on the dataset type.
In this method, user is able to modify the dataset depending on specific requirements. Some things that can be modified are:
ml_data.df_morphis a dataframe with columns:name,mtype,path, and user can filter out the rows containing morphologies not needed in this dataset (e.g. interneurons dataset contained some PC cells).ml_data.template_varsis ajinja2dictionary that will be used to generate the report with raw data statistics (report_rawdata.pdf). If you add new keys, don’t forget to modify the correspondingreport_template.html. Be careful, to add new key, use:ml_data.template_vars.update({"new_key":"new_value"})which won’t overwrite already existing values in the dictionary.For other options, check out the
data.OrganizeMLDataclass.
-
drop_duplicate_files() → None¶ Filter out the duplicates, but store dropped for the report.
-
drop_mtypes_with_one_neuron() → None¶ Work only with the classes that have >=2 neurons.
-
run(dataset_type: morphoclass.constants.DatasetType) → None¶ Preprocess the data and save dataset.csv.
-
save_dataset_csv() → None¶ Save the dataset CSV file to disk.
-
save_report(report_title: str = 'Dataset Report') → None¶ Write the report to disk.