morphoclass.data.morphology_dataset module

Implementation of the MorphologyDataset class.

class morphoclass.data.morphology_dataset.MorphologyDataset(data, transform=None, pre_transform=None, pre_filter=None)

Bases: Generic[torch.utils.data.dataset.T_co]

Dataset class for neuron morphologies.

This class is derived from torch_geometric.data.Dataset and loads morphologies using the MorphIO library. The morphology data is integrated into the samples by setting the MorphologyData.morphology attribute.

To extract features from the morphology data use the appropriate transformers in morphoclass.transforms.

Parameters
  • data (iterable) – A sequence of instances of Data.

  • transform – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.

  • pre_transform – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.

  • pre_filter – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean value.

classmethod from_csv(csv_file, transform=None, pre_transform=None, pre_filter=None)

Load data listed in a CSV file.

The CSV file should have no header. The first column should list paths to morphology files. The second optional column should contain labels.

The paths can be either absolute or relative. The relative paths should be relative to the directory with the CSV file.

Parameters
  • csv_file (str or pathlib.Path) – The CSV file with data paths and labels

  • transform (callable or None) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.

  • pre_transform (callable or None) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.

  • pre_filter (callable or None) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.

Returns

An instance of MorphologyDataset with loaded data.

Return type

morphoclass.data.MorphologyDataset

classmethod from_features(features: Iterable[dict])morphoclass.data.morphology_dataset.MorphologyDataset

Create a dataset from pre-extracted features.

Parameters

features – The pre-extracted features of the data. Each dictionary in the iterable corresponds to the features of one morphology. Given an existing MorphologyDataset the samples in which are instances of the class torch_geometric.data.Data, then the features can be obtained via sample.to_dict().

Returns

A dataset instance constructed from pre-extracted data features.

Return type

MorphologyDataset

classmethod from_paths(morph_paths, labels=None, transform=None, pre_transform=None, pre_filter=None)

Load data from given file paths.

This just calls the constructor serves as an alternative convenience method to it.

Parameters
  • morph_paths (list_like) – A list of paths to morphology files.

  • labels (list_like, optional) – A list of labels for the morphology files in paths.

  • transform (callable, optional) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.

  • pre_transform (callable, optional) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.

  • pre_filter (callable, optional) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.

Returns

An instance of MorphologyDataset with loaded data.

Return type

morphoclass.data.MorphologyDataset

classmethod from_structured_dir(data_path, layer='', transform=None, pre_transform=None, pre_filter=None, ignore_unknown_filetypes=True)

Load data from a structured directory.

The directory data_path should have sub-directories, which, in turn, contain the morphologies. The names of these sub- directories will be interpreted as the morphology types and will be used as the class labels.

Parameters
  • data_path (str or pathlib.Path) – Path pointing to the location of the data. The different morphology types should be organised in different sub-directories in data_path.

  • layer (str) – The cortical layer for which the morphology data should be loaded. The corresponding m-type folders should start with the string specified in layer, e.g. for layer 5 the valid m-type folders are L5_TPC_A, L5_UPC, etc. If no layer is provided then all data is loaded.

  • transform (callable, optional) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.

  • pre_transform (callable, optional) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.

  • pre_filter (callable, optional) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.

Returns

An instance of MorphologyDataset with loaded data.

Return type

morphoclass.data.MorphologyDataset

get(idx)

Get one data sample.

Parameters

idx (int) – The index of the data sample.

Returns

The data sample corresponding to idx.

Return type

torch_geometric.data.Data

get_sample_by_morph_name(morph_name)

Get morphology sample by name.

Parameters

morph_name (str) – Morphology name.

Returns

sample – Sample instance.

Return type

torch.torch_geometric.Data

guess_layer()

Guess the layer based on the labels.

M-types of the form “L5_TPC_A” contain the layer information in the second character of the m-type string. If all labels contain the same number in that position then this is the guessed layer. Otherwise None is returned.

Returns

The guessed layer number or None in case the guess was unsuccessful.

Return type

int or None

property labels

Get label strings.

Returns

List of labels’ str.

Return type

list

len()

Compute the length of the dataset.

Returns

The number of samples in the dataset.

Return type

int

save_data(output_dir, indices=None)

Save the dataset or a subset of it to disk.

Note that the saving is done by copying the original files from which the data was read to a new location in the output_dir, so the original files must be accessible.

Note also that since it is just a copy of the original files no transforms or pre-transforms assigned to this dataset are applied to the data.

It is important that any pre-transforms assigned to this dataset keep the file attribute of the samples intact since this is how the original morphology file is located.

Parameters
  • output_dir (str or pathlib.Path) – The output directory.

  • indices (list_like, optional) – The indices of the samples to be saved. If None then all samples will be saved.

to_csv(path, write_labels=True)

Write data paths and labels to a CSV file.

Parameters
  • path (str or pathlib.Path) – The target CSV file.

  • write_labels (bool) – If True labels will be included in a separate column.

to_labels(ys)

Convert custom label ids to class.

Parameters

ys (list) – List of label IDs.

Returns

List of labels (as str name).

Return type

list

property ys

Get the y-values of all samples.

Returns

List of the y-values of all samples.

Return type

list