3.4. Parcellating the brain in regions

Page summary

This page demonstrates how clustering can be used to parcellate the brain into homogeneous regions from resting-state time series.

3.4.1. A resting-state dataset

Here, we use a resting-state dataset from test-retest study performed at NYU. Details on the data can be found in the documentation for the downloading function fetch_nyu_rest.

3.4.2. Preprocessing: loading and masking

We fetch the data from Internet and load it with a dedicated function (see Loading data):

import numpy as np
from nilearn import datasets
from nilearn import input_data
from nilearn.plotting.img_plotting import plot_roi, plot_epi
nyu_dataset = datasets.fetch_nyu_rest(n_subjects=1)

# print basic information on the dataset
print('First subject anatomical nifti image (3D) is at: %s' %
      nyu_dataset.anat_anon[0])
print('First subject functional nifti image (4D) is at: %s' %
      nyu_dataset.func[0])  # 4D data

# This is resting-state data: the background has not been removed yet,
# thus we need to use mask_strategy='epi' to compute the mask from the
# EPI images
nifti_masker = input_data.NiftiMasker(memory='nilearn_cache',
                                      mask_strategy='epi', memory_level=1,
                                      standardize=False)
func_filename = nyu_dataset.func[0]
fmri_masked = nifti_masker.fit_transform(func_filename)
mask = nifti_masker.mask_img_.get_data().astype(np.bool)

No mask is given with the data so we let the masker compute one. The result is a niimg from which we extract a numpy array that is used to mask our original images.

3.4.3. Applying Ward clustering

Compute a connectivity matrix Before applying Ward’s method, we compute a spatial neighborhood matrix, aka connectivity matrix. This is useful to constrain clusters to form contiguous parcels (see the scikit-learn documentation)

from sklearn.feature_extraction import image
shape = mask.shape
connectivity = image.grid_to_graph(n_x=shape[0], n_y=shape[1],
                                   n_z=shape[2], mask=mask)

Ward clustering principle Ward’s algorithm is a hierarchical clustering algorithm: it recursively merges voxels, then clusters that have similar signal (parameters, measurements or time courses).

Caching In practice the implementation of Ward clustering first computes a tree of possible merges, and then, given a requested number of clusters, breaks apart the tree at the right level.

As the tree is independent of the number of clusters, we can rely on caching to speed things up when varying the number of clusters. In Wards clustering, the memory parameter is used to cache the computed component tree. You can give it either a joblib.Memory instance or the name of a directory used for caching.

3.4.3.1. Running the Ward algorithm

Here we simply launch Ward’s algorithm to find 1000 clusters and we time it.

from sklearn.cluster import FeatureAgglomeration
# If you have scikit-learn older than 0.14, you need to import
# WardAgglomeration instead of FeatureAgglomeration
import time
start = time.time()
ward = FeatureAgglomeration(n_clusters=1000, connectivity=connectivity,
                            linkage='ward', memory='nilearn_cache')
ward.fit(fmri_masked)
print("Ward agglomeration 1000 clusters: %.2fs" % (time.time() - start))

This runs in about 10 seconds (depending on your computer configuration). Now, we are not satisfied of the result and we want to cluster the picture in 2000 elements.

# the caching mechanism
start = time.time()
ward = FeatureAgglomeration(n_clusters=2000, connectivity=connectivity,
                            linkage='ward', memory='nilearn_cache')
ward.fit(fmri_masked)
print("Ward agglomeration 2000 clusters: %.2fs" % (time.time() - start))

Now that the component tree has been computed, computation is must faster thanks to caching. You should have the result in less than 1 second.

3.4.4. Post-Processing and visualizing the parcels

3.4.4.1. Unmasking

After applying the ward, we must unmask the data. This can be done simply :

# Avoid 0 label
labels = ward.labels_ + 1
labels_img = nifti_masker.inverse_transform(labels)

from nilearn.image import mean_img
import matplotlib.pyplot as plt
mean_func_img = mean_img(func_filename)

# common cut coordinates for all plots

first_plot = plot_roi(labels_img, mean_func_img, title="Ward parcellation",
                      display_mode='xz')
# labels_img is a Nifti1Image object, it can be saved to file with the
# following code:
labels_img.to_filename('parcellation.nii')


# Display the original data
plot_epi(nifti_masker.inverse_transform(fmri_masked[0]),
         cut_coords=first_plot.cut_coords,
         title='Original (%i voxels)' % fmri_masked.shape[1],
         display_mode='xz')

# A reduced data can be create by taking the parcel-level average:
# Note that, as many objects in the scikit-learn, the ward object exposes
# a transform method that modifies input features. Here it reduces their
# dimension
fmri_reduced = ward.transform(fmri_masked)

# Display the corresponding data compressed using the parcellation
fmri_compressed = ward.inverse_transform(fmri_reduced)
compressed_img = nifti_masker.inverse_transform(fmri_compressed[0])

plot_epi(compressed_img, cut_coords=first_plot.cut_coords,
         title='Compressed representation (2000 parcels)',
         display_mode='xz')

plt.show()

You can see that masked data is filled with -1 values. This is done for the sake of visualization. In fact, clusters are labeled from 0 to (n_clusters - 1). By putting every background value to -1, we assure that they will not mess with the visualization.

3.4.4.2. Label visualization

To visualize the clusters, we assign random colors to each cluster for the labels visualization.

# Unmask data
# Avoid 0 label
labels = ward.labels_ + 1
labels_img = nifti_masker.inverse_transform(labels)

from nilearn.image import mean_img
import matplotlib.pyplot as plt
mean_func_img = mean_img(func_filename)

# common cut coordinates for all plots

first_plot = plot_roi(labels_img, mean_func_img, title="Ward parcellation",
                      display_mode='xz')
# labels_img is a Nifti1Image object, it can be saved to file with the
# following code:
labels_img.to_filename('parcellation.nii')
connectivity/../auto_examples/connectivity/images/plot_rest_clustering_001.png

3.4.4.3. Compressed picture

By transforming a picture in a new one in which the value of each voxel is the mean value of the cluster it belongs to, we are creating a compressed version of the original picture. We can obtain this representation thanks to a two-step procedure :

  • call ward.transform to obtain the mean value of each cluster (for each scan)
  • call ward.inverse_transform on the previous result to turn it back into the masked picture shape
plot_epi(nifti_masker.inverse_transform(fmri_masked[0]),
         cut_coords=first_plot.cut_coords,
         title='Original (%i voxels)' % fmri_masked.shape[1],
         display_mode='xz')

# A reduced data can be create by taking the parcel-level average:
# Note that, as many objects in the scikit-learn, the ward object exposes
# a transform method that modifies input features. Here it reduces their
# dimension
fmri_reduced = ward.transform(fmri_masked)

# Display the corresponding data compressed using the parcellation
fmri_compressed = ward.inverse_transform(fmri_reduced)
compressed_img = nifti_masker.inverse_transform(fmri_compressed[0])

plot_epi(compressed_img, cut_coords=first_plot.cut_coords,
         title='Compressed representation (2000 parcels)',
         display_mode='xz')

plt.show()

left_img right_img

We can see that using only 2000 parcels, the original image is well approximated.