morphoclass.data package¶
Submodules¶
Module contents¶
Dataset abstractions and data helper functions.
-
class
morphoclass.data.MorphologyDataLoader(dataset: morphoclass.data.morphology_dataset.MorphologyDataset, **kwargs: Any)¶ Bases:
Generic[torch.utils.data.dataloader.T_co]A data loader for the morphology data set.
This class is derived from torch.utils.data.DataLoader and unlike torch_geometric.data.DataLoader is able to handle Data objects with non-numeric fields. These fields are simply ignored upon constructing batches.
- Parameters
dataset (torch.utils.data.dataset.Dataset[T_co]) – The data set to apply the data loader to.
kwargs – Further parameter to pass on to the PyTorch DataLoader base class.
-
batch_size: Optional[int]¶
-
dataset: torch.utils.data.dataset.Dataset[T_co]¶
-
drop_last: bool¶
-
num_workers: int¶
-
pin_memory: bool¶
-
prefetch_factor: int¶
-
sampler: Union[torch.utils.data.sampler.Sampler, Iterable]¶
-
timeout: float¶
-
class
morphoclass.data.MorphologyDataset(data, transform=None, pre_transform=None, pre_filter=None)¶ Bases:
Generic[torch.utils.data.dataset.T_co]Dataset class for neuron morphologies.
This class is derived from torch_geometric.data.Dataset and loads morphologies using the MorphIO library. The morphology data is integrated into the samples by setting the MorphologyData.morphology attribute.
To extract features from the morphology data use the appropriate transformers in morphoclass.transforms.
- Parameters
data (iterable) – A sequence of instances of Data.
transform – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.
pre_transform – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.
pre_filter – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean value.
-
classmethod
from_csv(csv_file, transform=None, pre_transform=None, pre_filter=None)¶ Load data listed in a CSV file.
The CSV file should have no header. The first column should list paths to morphology files. The second optional column should contain labels.
The paths can be either absolute or relative. The relative paths should be relative to the directory with the CSV file.
- Parameters
csv_file (str or pathlib.Path) – The CSV file with data paths and labels
transform (callable or None) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.
pre_transform (callable or None) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.
pre_filter (callable or None) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.
- Returns
An instance of MorphologyDataset with loaded data.
- Return type
-
classmethod
from_features(features: Iterable[dict]) → morphoclass.data.morphology_dataset.MorphologyDataset¶ Create a dataset from pre-extracted features.
- Parameters
features – The pre-extracted features of the data. Each dictionary in the iterable corresponds to the features of one morphology. Given an existing MorphologyDataset the samples in which are instances of the class torch_geometric.data.Data, then the features can be obtained via sample.to_dict().
- Returns
A dataset instance constructed from pre-extracted data features.
- Return type
-
classmethod
from_paths(morph_paths, labels=None, transform=None, pre_transform=None, pre_filter=None)¶ Load data from given file paths.
This just calls the constructor serves as an alternative convenience method to it.
- Parameters
morph_paths (list_like) – A list of paths to morphology files.
labels (list_like, optional) – A list of labels for the morphology files in paths.
transform (callable, optional) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.
pre_transform (callable, optional) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.
pre_filter (callable, optional) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.
- Returns
An instance of MorphologyDataset with loaded data.
- Return type
-
classmethod
from_structured_dir(data_path, layer='', transform=None, pre_transform=None, pre_filter=None, ignore_unknown_filetypes=True)¶ Load data from a structured directory.
The directory data_path should have sub-directories, which, in turn, contain the morphologies. The names of these sub- directories will be interpreted as the morphology types and will be used as the class labels.
- Parameters
data_path (str or pathlib.Path) – Path pointing to the location of the data. The different morphology types should be organised in different sub-directories in data_path.
layer (str) – The cortical layer for which the morphology data should be loaded. The corresponding m-type folders should start with the string specified in layer, e.g. for layer 5 the valid m-type folders are L5_TPC_A, L5_UPC, etc. If no layer is provided then all data is loaded.
transform (callable, optional) – The transformation to apply to every sample whenever this sample is retrieved from the dataset. Useful for example for data augmentation.
pre_transform (callable, optional) – The transformation to apply to the data before it is stored in the dataset. This will happen only once per sample.
pre_filter (callable, optional) – The filter to apply upon loading data into the dataset. It should be a function that takes objects of type torch_geometric.data and returns a boolean.
- Returns
An instance of MorphologyDataset with loaded data.
- Return type
-
get(idx)¶ Get one data sample.
- Parameters
idx (int) – The index of the data sample.
- Returns
The data sample corresponding to idx.
- Return type
torch_geometric.data.Data
-
get_sample_by_morph_name(morph_name)¶ Get morphology sample by name.
- Parameters
morph_name (str) – Morphology name.
- Returns
sample – Sample instance.
- Return type
torch.torch_geometric.Data
-
guess_layer()¶ Guess the layer based on the labels.
M-types of the form “L5_TPC_A” contain the layer information in the second character of the m-type string. If all labels contain the same number in that position then this is the guessed layer. Otherwise None is returned.
- Returns
The guessed layer number or None in case the guess was unsuccessful.
- Return type
int or None
-
property
labels¶ Get label strings.
- Returns
List of labels’ str.
- Return type
list
-
len()¶ Compute the length of the dataset.
- Returns
The number of samples in the dataset.
- Return type
int
-
save_data(output_dir, indices=None)¶ Save the dataset or a subset of it to disk.
Note that the saving is done by copying the original files from which the data was read to a new location in the output_dir, so the original files must be accessible.
Note also that since it is just a copy of the original files no transforms or pre-transforms assigned to this dataset are applied to the data.
It is important that any pre-transforms assigned to this dataset keep the file attribute of the samples intact since this is how the original morphology file is located.
- Parameters
output_dir (str or pathlib.Path) – The output directory.
indices (list_like, optional) – The indices of the samples to be saved. If None then all samples will be saved.
-
to_csv(path, write_labels=True)¶ Write data paths and labels to a CSV file.
- Parameters
path (str or pathlib.Path) – The target CSV file.
write_labels (bool) – If True labels will be included in a separate column.
-
to_labels(ys)¶ Convert custom label ids to class.
- Parameters
ys (list) – List of label IDs.
- Returns
List of labels (as str name).
- Return type
list
-
property
ys¶ Get the y-values of all samples.
- Returns
List of the y-values of all samples.
- Return type
list
-
morphoclass.data.augment_persistence_diagrams(persistence_diagrams, xlims, ylims, factor=10, maxd=2, p=0.5)¶ Augment persistence diagrams.
Try the naive approach: for each persistence diagram randomly select a subset of nodes and shift them slightly. Choose the shifts randomly so that in the resulting persistence image the points shift by at most maxd.
- Parameters
persistence_diagrams –
- Returns
-
morphoclass.data.augment_persistence_diagrams_v2(persistence_diagrams, labels, xlims, ylims, factor=10, maxd=2, p=0.5)¶ Augment persistence diagrams.
Try the naive approach: for each persistence diagram randomly select a subset of nodes and shift them slightly. Choose the shifts randomly so that in the resulting persistence image the points shift by at most maxd.
- Parameters
persistence_diagrams –
- Returns
-
morphoclass.data.load_apical_persistence_diagrams(folder, mtype)¶ Load persistence diagrams for apicals.
Loads all neurons from a given directory and extracts persistence diagrams from their apical trees.
- Parameters
folder – Folder containing m-type folders.
mtype – M-type name, must be a subfolder of ‘folder’ and contain neuron files.
- Returns
- Return type
List of apical persistence diagrams of all neurons in the m-type folder.
-
morphoclass.data.persistence_diagrams_to_persistence_images(persistence_diagrams, xlims=None, ylims=None)¶ Convert a persistence diagrams to persistence images.
- Parameters
persistence_diagrams – persistence diagram to convert
xlims – the x-dimension of the persistence images to create
ylims – the y-dimension of the persistence images to create
- Returns
an numpy array with the create persistence images
-
morphoclass.data.pickle_data(data_dir, rewrite=False)¶ Cache neuron data-set by pickling every neuron.
This is helpful because loading each neuron each time can take a considerable amount of time. The given neuron files are loaded as tmd.Neuron instances and pickled into files on disk.
Note that the Dataset class is written in such a way that it is able to both process .h5 and .swc files, as well as .pickle data created by this function.
- Parameters
data_dir –
- Directory from which to load the data-set. The structure should be:
- data_dir/
- mtype1/
neuron1 neuron2 …
- mtype2/
neuron1 neuron2 …
rewrite –
If False and the target directory already exists then an error is raised.
If True the existing target directory is deleted and new pickled data is created.
- Returns
- Return type
The target directory to which the data was written.
- Raises
FileExistsError – File with output dataset exists.
-
morphoclass.data.reduce_tree_to_branching(tree)¶ Simplifies a neurite tree to the braning points only.
An analogous methods already exists and is Tree.extract_simplified(). One should use that method instead of using this one.
- Parameters
tree – Input neurite tree.
- Returns
- Return type
Simplified neurite tree.