morphoclass.data package

Module contents

Dataset abstractions and data helper functions.

class morphoclass.data.MorphologyDataLoader(dataset: morphoclass.data.morphology_dataset.MorphologyDataset, **kwargs: Any)

Bases: Generic[torch.utils.data.dataloader.T_co]

A data loader for the morphology data set.

This class is derived from torch.utils.data.DataLoader and unlike torch_geometric.data.DataLoader is able to handle Data objects with non-numeric fields. These fields are simply ignored upon constructing batches.

Parameters
  • dataset (torch.utils.data.dataset.Dataset[T_co]) – The data set to apply the data loader to.

  • kwargs – Further parameter to pass on to the PyTorch DataLoader base class.

batch_size: Optional[int]
dataset: torch.utils.data.dataset.Dataset[T_co]
drop_last: bool
num_workers: int
pin_memory: bool
prefetch_factor: int
sampler: Union[torch.utils.data.sampler.Sampler, Iterable]
timeout: float
class morphoclass.data.MorphologyDataset(data, transform=None, pre_transform=None, pre_filter=None)

Bases: Generic[torch.utils.data.dataset.T_co]

Dataset class for neuron morphologies.

This class is derived from torch_geometric.data.Dataset and loads morphologies using the MorphIO library. The morphology data is integrated into the samples by setting the MorphologyData.morphology attribute.

To extract features from the morphology data use the appropriate transformers in morphoclass.transforms.

Parameters
  • data (iterable) – A sequence of instances of Data.

  • transform – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.

  • pre_transform – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.

  • pre_filter – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean value.

classmethod from_csv(csv_file, transform=None, pre_transform=None, pre_filter=None)

Load data listed in a CSV file.

The CSV file should have no header. The first column should list paths to morphology files. The second optional column should contain labels.

The paths can be either absolute or relative. The relative paths should be relative to the directory with the CSV file.

Parameters
  • csv_file (str or pathlib.Path) – The CSV file with data paths and labels

  • transform (callable or None) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.

  • pre_transform (callable or None) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.

  • pre_filter (callable or None) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.

Returns

An instance of MorphologyDataset with loaded data.

Return type

morphoclass.data.MorphologyDataset

classmethod from_features(features: Iterable[dict])morphoclass.data.morphology_dataset.MorphologyDataset

Create a dataset from pre-extracted features.

Parameters

features – The pre-extracted features of the data. Each dictionary in the iterable corresponds to the features of one morphology. Given an existing MorphologyDataset the samples in which are instances of the class torch_geometric.data.Data, then the features can be obtained via sample.to_dict().

Returns

A dataset instance constructed from pre-extracted data features.

Return type

MorphologyDataset

classmethod from_paths(morph_paths, labels=None, transform=None, pre_transform=None, pre_filter=None)

Load data from given file paths.

This just calls the constructor serves as an alternative convenience method to it.

Parameters
  • morph_paths (list_like) – A list of paths to morphology files.

  • labels (list_like, optional) – A list of labels for the morphology files in paths.

  • transform (callable, optional) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.

  • pre_transform (callable, optional) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.

  • pre_filter (callable, optional) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.

Returns

An instance of MorphologyDataset with loaded data.

Return type

morphoclass.data.MorphologyDataset

classmethod from_structured_dir(data_path, layer='', transform=None, pre_transform=None, pre_filter=None, ignore_unknown_filetypes=True)

Load data from a structured directory.

The directory data_path should have sub-directories, which, in turn, contain the morphologies. The names of these sub- directories will be interpreted as the morphology types and will be used as the class labels.

Parameters
  • data_path (str or pathlib.Path) – Path pointing to the location of the data. The different morphology types should be organised in different sub-directories in data_path.

  • layer (str) – The cortical layer for which the morphology data should be loaded. The corresponding m-type folders should start with the string specified in layer, e.g. for layer 5 the valid m-type folders are L5_TPC_A, L5_UPC, etc. If no layer is provided then all data is loaded.

  • transform (callable, optional) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.

  • pre_transform (callable, optional) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.

  • pre_filter (callable, optional) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.

Returns

An instance of MorphologyDataset with loaded data.

Return type

morphoclass.data.MorphologyDataset

get(idx)

Get one data sample.

Parameters

idx (int) – The index of the data sample.

Returns

The data sample corresponding to idx.

Return type

torch_geometric.data.Data

get_sample_by_morph_name(morph_name)

Get morphology sample by name.

Parameters

morph_name (str) – Morphology name.

Returns

sample – Sample instance.

Return type

torch.torch_geometric.Data

guess_layer()

Guess the layer based on the labels.

M-types of the form “L5_TPC_A” contain the layer information in the second character of the m-type string. If all labels contain the same number in that position then this is the guessed layer. Otherwise None is returned.

Returns

The guessed layer number or None in case the guess was unsuccessful.

Return type

int or None

property labels

Get label strings.

Returns

List of labels’ str.

Return type

list

len()

Compute the length of the dataset.

Returns

The number of samples in the dataset.

Return type

int

save_data(output_dir, indices=None)

Save the dataset or a subset of it to disk.

Note that the saving is done by copying the original files from which the data was read to a new location in the output_dir, so the original files must be accessible.

Note also that since it is just a copy of the original files no transforms or pre-transforms assigned to this dataset are applied to the data.

It is important that any pre-transforms assigned to this dataset keep the file attribute of the samples intact since this is how the original morphology file is located.

Parameters
  • output_dir (str or pathlib.Path) – The output directory.

  • indices (list_like, optional) – The indices of the samples to be saved. If None then all samples will be saved.

to_csv(path, write_labels=True)

Write data paths and labels to a CSV file.

Parameters
  • path (str or pathlib.Path) – The target CSV file.

  • write_labels (bool) – If True labels will be included in a separate column.

to_labels(ys)

Convert custom label ids to class.

Parameters

ys (list) – List of label IDs.

Returns

List of labels (as str name).

Return type

list

property ys

Get the y-values of all samples.

Returns

List of the y-values of all samples.

Return type

list

morphoclass.data.augment_persistence_diagrams(persistence_diagrams, xlims, ylims, factor=10, maxd=2, p=0.5)

Augment persistence diagrams.

Try the naive approach: for each persistence diagram randomly select a subset of nodes and shift them slightly. Choose the shifts randomly so that in the resulting persistence image the points shift by at most maxd.

Parameters

persistence_diagrams

Returns

morphoclass.data.augment_persistence_diagrams_v2(persistence_diagrams, labels, xlims, ylims, factor=10, maxd=2, p=0.5)

Augment persistence diagrams.

Try the naive approach: for each persistence diagram randomly select a subset of nodes and shift them slightly. Choose the shifts randomly so that in the resulting persistence image the points shift by at most maxd.

Parameters

persistence_diagrams

Returns

morphoclass.data.load_apical_persistence_diagrams(folder, mtype)

Load persistence diagrams for apicals.

Loads all neurons from a given directory and extracts persistence diagrams from their apical trees.

Parameters
  • folder – Folder containing m-type folders.

  • mtype – M-type name, must be a subfolder of ‘folder’ and contain neuron files.

Returns

Return type

List of apical persistence diagrams of all neurons in the m-type folder.

morphoclass.data.persistence_diagrams_to_persistence_images(persistence_diagrams, xlims=None, ylims=None)

Convert a persistence diagrams to persistence images.

Parameters
  • persistence_diagrams – persistence diagram to convert

  • xlims – the x-dimension of the persistence images to create

  • ylims – the y-dimension of the persistence images to create

Returns

an numpy array with the create persistence images

morphoclass.data.pickle_data(data_dir, rewrite=False)

Cache neuron data-set by pickling every neuron.

This is helpful because loading each neuron each time can take a considerable amount of time. The given neuron files are loaded as tmd.Neuron instances and pickled into files on disk.

Note that the Dataset class is written in such a way that it is able to both process .h5 and .swc files, as well as .pickle data created by this function.

Parameters
  • data_dir

    Directory from which to load the data-set. The structure should be:
    data_dir/
    mtype1/

    neuron1 neuron2 …

    mtype2/

    neuron1 neuron2 …

  • rewrite

    • If False and the target directory already exists then an error is raised.

    • If True the existing target directory is deleted and new pickled data is created.

Returns

Return type

The target directory to which the data was written.

Raises

FileExistsError – File with output dataset exists.

morphoclass.data.reduce_tree_to_branching(tree)

Simplifies a neurite tree to the braning points only.

An analogous methods already exists and is Tree.extract_simplified(). One should use that method instead of using this one.

Parameters

tree – Input neurite tree.

Returns

Return type

Simplified neurite tree.