morphoclass.data package¶

Submodules¶

Module contents¶

Dataset abstractions and data helper functions.

class morphoclass.data.MorphologyDataLoader(dataset: morphoclass.data.morphology_dataset.MorphologyDataset, **kwargs: Any)¶

Bases: Generic[torch.utils.data.dataloader.T_co]

A data loader for the morphology data set.

This class is derived from torch.utils.data.DataLoader and unlike torch_geometric.data.DataLoader is able to handle Data objects with non-numeric fields. These fields are simply ignored upon constructing batches.

Parameters

dataset (torch.utils.data.dataset.Dataset[T_co]) – The data set to apply the data loader to.
kwargs – Further parameter to pass on to the PyTorch DataLoader base class.

batch_size: Optional[int]¶

dataset: torch.utils.data.dataset.Dataset[T_co]¶

drop_last: bool¶

num_workers: int¶

pin_memory: bool¶

prefetch_factor: int¶

sampler: Union[torch.utils.data.sampler.Sampler, Iterable]¶

timeout: float¶

class morphoclass.data.MorphologyDataset(data, transform=None, pre_transform=None, pre_filter=None)¶

Bases: Generic[torch.utils.data.dataset.T_co]

Dataset class for neuron morphologies.

This class is derived from torch_geometric.data.Dataset and loads morphologies using the MorphIO library. The morphology data is integrated into the samples by setting the MorphologyData.morphology attribute.

To extract features from the morphology data use the appropriate transformers in morphoclass.transforms.

Parameters

data (iterable) – A sequence of instances of Data.
transform – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.
pre_transform – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.
pre_filter – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean value.

classmethod from_csv(csv_file, transform=None, pre_transform=None, pre_filter=None)¶

Load data listed in a CSV file.

The CSV file should have no header. The first column should list paths to morphology files. The second optional column should contain labels.

The paths can be either absolute or relative. The relative paths should be relative to the directory with the CSV file.

Parameters

csv_file (str or pathlib.Path) – The CSV file with data paths and labels
transform (callable or None) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.
pre_transform (callable or None) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.
pre_filter (callable or None) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.

Returns

An instance of MorphologyDataset with loaded data.

Return type

morphoclass.data.MorphologyDataset

classmethod from_features(features: Iterable[dict]) → morphoclass.data.morphology_dataset.MorphologyDataset ¶

Create a dataset from pre-extracted features.

Parameters: features – The pre-extracted features of the data. Each dictionary in the iterable corresponds to the features of one morphology. Given an existing MorphologyDataset the samples in which are instances of the class torch_geometric.data.Data, then the features can be obtained via sample.to_dict().
Returns: A dataset instance constructed from pre-extracted data features.
Return type: MorphologyDataset

classmethod from_paths(morph_paths, labels=None, transform=None, pre_transform=None, pre_filter=None)¶

Load data from given file paths.

This just calls the constructor serves as an alternative convenience method to it.

Parameters

morph_paths (list_like) – A list of paths to morphology files.
labels (list_like, optional) – A list of labels for the morphology files in paths.
transform (callable, optional) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.
pre_transform (callable, optional) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.
pre_filter (callable, optional) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.

Returns

An instance of MorphologyDataset with loaded data.

Return type

morphoclass.data.MorphologyDataset

classmethod from_structured_dir(data_path, layer='', transform=None, pre_transform=None, pre_filter=None, ignore_unknown_filetypes=True)¶

Load data from a structured directory.

The directory data_path should have sub-directories, which, in turn, contain the morphologies. The names of these sub- directories will be interpreted as the morphology types and will be used as the class labels.

Parameters

data_path (str or pathlib.Path) – Path pointing to the location of the data. The different morphology types should be organised in different sub-directories in data_path.
layer (str) – The cortical layer for which the morphology data should be loaded. The corresponding m-type folders should start with the string specified in layer, e.g. for layer 5 the valid m-type folders are L5_TPC_A, L5_UPC, etc. If no layer is provided then all data is loaded.
transform (callable, optional) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.
pre_transform (callable, optional) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.
pre_filter (callable, optional) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.

Returns

An instance of MorphologyDataset with loaded data.

Return type

morphoclass.data.MorphologyDataset

get(idx)¶

Get one data sample.

Parameters: idx (int) – The index of the data sample.
Returns: The data sample corresponding to idx.
Return type: torch_geometric.data.Data

get_sample_by_morph_name(morph_name)¶

Get morphology sample by name.

Parameters: morph_name (str) – Morphology name.
Returns: sample – Sample instance.
Return type: torch.torch_geometric.Data

guess_layer()¶

Guess the layer based on the labels.

M-types of the form “L5_TPC_A” contain the layer information in the second character of the m-type string. If all labels contain the same number in that position then this is the guessed layer. Otherwise None is returned.

Returns: The guessed layer number or None in case the guess was unsuccessful.
Return type: int or None

property labels¶

Get label strings.

Returns: List of labels’ str.
Return type: list

len()¶

Compute the length of the dataset.

Returns: The number of samples in the dataset.
Return type: int

save_data(output_dir, indices=None)¶

Save the dataset or a subset of it to disk.

Note that the saving is done by copying the original files from which the data was read to a new location in the output_dir, so the original files must be accessible.

Note also that since it is just a copy of the original files no transforms or pre-transforms assigned to this dataset are applied to the data.

It is important that any pre-transforms assigned to this dataset keep the file attribute of the samples intact since this is how the original morphology file is located.

Parameters

output_dir (str or pathlib.Path) – The output directory.
indices (list_like, optional) – The indices of the samples to be saved. If None then all samples will be saved.

to_csv(path, write_labels=True)¶

Write data paths and labels to a CSV file.

Parameters

path (str or pathlib.Path) – The target CSV file.
write_labels (bool) – If True labels will be included in a separate column.

to_labels(ys)¶

Convert custom label ids to class.

Parameters: ys (list) – List of label IDs.
Returns: List of labels (as str name).
Return type: list

property ys¶

Get the y-values of all samples.

Returns: List of the y-values of all samples.
Return type: list

morphoclass.data.augment_persistence_diagrams(persistence_diagrams, xlims, ylims, factor=10, maxd=2, p=0.5)¶

Augment persistence diagrams.

Try the naive approach: for each persistence diagram randomly select a subset of nodes and shift them slightly. Choose the shifts randomly so that in the resulting persistence image the points shift by at most maxd.

Parameters: persistence_diagrams –
Returns

morphoclass.data.augment_persistence_diagrams_v2(persistence_diagrams, labels, xlims, ylims, factor=10, maxd=2, p=0.5)¶

Augment persistence diagrams.

Try the naive approach: for each persistence diagram randomly select a subset of nodes and shift them slightly. Choose the shifts randomly so that in the resulting persistence image the points shift by at most maxd.

Parameters: persistence_diagrams –
Returns

morphoclass.data.load_apical_persistence_diagrams(folder, mtype)¶

Load persistence diagrams for apicals.

Loads all neurons from a given directory and extracts persistence diagrams from their apical trees.

Parameters

folder – Folder containing m-type folders.
mtype – M-type name, must be a subfolder of ‘folder’ and contain neuron files.

Returns

Return type

List of apical persistence diagrams of all neurons in the m-type folder.

morphoclass.data.persistence_diagrams_to_persistence_images(persistence_diagrams, xlims=None, ylims=None)¶

Convert a persistence diagrams to persistence images.

Parameters

persistence_diagrams – persistence diagram to convert
xlims – the x-dimension of the persistence images to create
ylims – the y-dimension of the persistence images to create

Returns

an numpy array with the create persistence images

morphoclass.data.pickle_data(data_dir, rewrite=False)¶

Cache neuron data-set by pickling every neuron.

This is helpful because loading each neuron each time can take a considerable amount of time. The given neuron files are loaded as tmd.Neuron instances and pickled into files on disk.

Note that the Dataset class is written in such a way that it is able to both process .h5 and .swc files, as well as .pickle data created by this function.

Parameters

data_dir –

Directory from which to load the data-set. The structure should be:

data_dir/

mtype1/
neuron1 neuron2 …

mtype2/
neuron1 neuron2 …
rewrite –
- If False and the target directory already exists then an error is raised.
- If True the existing target directory is deleted and new pickled data is created.

Returns

Return type

The target directory to which the data was written.

Raises

FileExistsError – File with output dataset exists.

morphoclass.data.reduce_tree_to_branching(tree)¶

Simplifies a neurite tree to the braning points only.

An analogous methods already exists and is Tree.extract_simplified(). One should use that method instead of using this one.

Parameters: tree – Input neurite tree.
Returns
Return type: Simplified neurite tree.