.. _searchlight:

===========================================================
Searchlight : finding voxels containing information
===========================================================

.. currentmodule:: nilearn.decoding

Principle of the Searchlight
============================

Searchlight was introduced in `Information-based functional brain mapping
<http://www.pnas.org/content/103/10/3863>`_, Nikolaus Kriegeskorte,
Rainer Goebel and Peter Bandettini (PNAS 2006) and consists in scanning the
images volume with a *searchlight*. Briefly, a ball of given radius is
scanned across the brain volume and the prediction accuracy of a
classifier trained on the corresponding voxels is measured.

Preparing the data
====================

Loading
-------

Fetching the data from internet and loading it can be done with the
provided functions (see :ref:`loading_data`):

.. literalinclude:: ../../examples/decoding/plot_haxby_searchlight.py
    :start-after: ### Load Haxby dataset ################################
    :end-before: ### Restrict to faces and houses #######################

Reshaping the data
-------------------

For this example we need:

- to put X in the form *n_samples* x *n_features*
- compute a mean image for visualization background
- limit our analysis to the `face` and `house` conditions
  (like in the :ref:`decoding tutorial <decoding_tutorial>`)

.. literalinclude:: ../../examples/decoding/plot_haxby_searchlight.py
    :start-after: ### Restrict to faces and houses ###################
    :end-before: ### Prepare masks ###################################

Masking
-------

One of the main elements that distinguish Searchlight from other algorithms is
the notion of structuring element that scans the entire volume. If this seems
rather intuitive, it has in fact an impact on the masking procedure.

Most of the time, fMRI data is masked and then given to the algorithm. This is
not possible in the case of Searchlight because, to compute the score of
non-masked voxels, some masked voxels may be needed. This is why two masks will
be used here :

- *mask_img* is the anatomical mask
- *process_mask_img* is a subset of mask and contains voxels to be processed.

*process_mask_img* will then be used to restrain computation to one slice, in the
back of the brain. *mask_img* will ensure that no value outside the brain is
taken into account when iterating with the sphere.

.. literalinclude:: ../../examples/decoding/plot_haxby_searchlight.py
        :start-after: #   brain to speed up computation)
        :end-before: ### Searchlight computation ######################

Third Step: Setting up the searchlight
=======================================

Classifier
----------

The classifier used by default by :class:`SearchLight` is LinearSVC with C=1 but
this can be customed easily by passing an estimator parameter to the
cross validation. See scikit-learn documentation for `other classifiers
<http://scikit-learn.org/stable/supervised_learning.html>`_.

Score function
--------------

Here we use precision as metrics to measure the proportion of true
positives among all positive results for one class. Others metrics can be
specified by the "scoring" argument to the :class:`SearchLight`, as
detailed in the `scikit-learn documentation
<http://scikit-learn.org/dev/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules>`_

Cross validation
----------------

:class:`SearchLight` will iterate on the volume and give a score to each voxel. This
score is computed by running a classifier on selected voxels. In order to make
this score as accurate as possible (and avoid overfitting), a cross validation
is made.

As :class:`SearchLight` is computationally costly, we have chosen a cross
validation method that does not take too much time.
*K*-Fold along with *K* = 4 is a
good compromise between running time and quality.

.. literalinclude:: ../../examples/decoding/plot_haxby_searchlight.py
    :start-after: # set once and the others as learning sets
    :end-before: import nilearn.decoding

Running Searchlight
===================

Running :class:`SearchLight` is straightforward now that everything is set.
The only
parameter left is the radius of the ball that will run through the data.
Kriegskorte et al. use a 4mm radius because it yielded the best detection
performance in their simulation.

.. literalinclude:: ../../examples/decoding/plot_haxby_searchlight.py
    :start-after: # The radius is the one of the Searchlight sphere
    :end-before: ### F-scores computation ############################
	
Visualization
=============

Searchlight
-----------

As the activation map is cropped, we use the mean image of all scans as a
background. We can see here that voxels in the visual cortex contains
information to distinguish pictures showed to the volunteers, which was the
expected result.

.. figure:: ../auto_examples/decoding/images/plot_haxby_searchlight_001.png
   :target: ../auto_examples/decoding/plot_haxby_searchlight.html
   :align: center
   :scale: 60

.. literalinclude:: ../../examples/decoding/plot_haxby_searchlight.py
    :start-after: ### Visualization ####################################
    :end-before: ### Show the F_score


.. seealso::

   * :ref:`plotting`

Comparing to massively univariate analysis: F_score or SPM
----------------------------------------------------------

The standard approach to brain mapping is performed using *Statistical
Parametric Mapping* (SPM), using ANOVA (analysis of variance), and
parametric tests (F-tests ot t-tests).
Here we compute the *p-values* of the voxels [1]_.
To display the results, we use the negative log of the p-value.

.. figure:: ../auto_examples/decoding/images/plot_haxby_searchlight_002.png
   :target: ../auto_examples/decoding/plot_haxby_searchlight.html
   :align: center
   :scale: 60

.. literalinclude:: ../../examples/decoding/plot_haxby_searchlight.py
    :start-after: ### F_score results

Parametric scores can be converted into p-values using a reference
theoretical distribution, which is known under specific assumptions
(hence the name *parametric*). In practice, neuroimaging signal has a
complex structure that might not match these assumptions. An exact,
non-parametric *permutation test* can be performed as an alternative
to the parametric test: the residuals of the model are permuted so as
to break any effect and the corresponding decision statistic is
recomputed. One thus builds the distribution of the decision statistic
under the hypothesis that there is no relationship between the tested
variates and the target variates.  In neuroimaging, this is generally
done by swapping the signal values of all voxels while the tested
variables remain unchanged [2]_. A voxel-wise analysis is then
performed on the permuted data. The relationships between the image
descriptors and the tested variates are broken while the value of the
signal in each particular voxel can be observed with the same
probability than the original value associated to that voxel. Note
that it is hereby assumed that the signal distribution is the same in
every voxel. Several data permutations are performed (typically
10,000) while the scores for every voxel and every data permutation
is stored. The empirical distribution of the scores is thus
constructed (under the hypothesis that there is no relationship
between the tested variates and the neuroimaging signal, the so-called
*null-hypothesis*) and we can compare the original scores to that
distribution: The higher the rank of the original score, the smaller
is its associated p-value. The
:func:`nilearn.mass_univariate.permuted_ols` function returns the
p-values computed with a permutation test.

.. literalinclude:: ../../examples/manipulating_visualizing/plot_haxby_mass_univariate.py
   :start-after: from nilearn.input_data import NiftiMasker
   :end-before: ### Load Haxby dataset

.. literalinclude:: ../../examples/manipulating_visualizing/plot_haxby_mass_univariate.py
   :start-after: ### Perform massively univariate analysis with permuted OLS
   :end-before: neg_log_pvals_unmasked

The number of tests performed is generally large when full-brain
analysis is performed (> 50,000 voxels). This increases the
probability of finding a significant activation by chance, a
phenomenon that is known to statisticians as the *multiple comparisons
problem*. It is therefore recommended to correct the p-values to take
into account the multiple tests. *Bonferroni correction* consists of
multiplying the p-values by the number of tests (while making sure the
p-values remain smaller than 1). Thus, we control the occurrence of one
false detection *at most*, the so-called *family-wise error control*.
A similar control can be performed when performing a permutation test:
For each permutation, only the maximum value of the F-statistic across
voxels is considered and is used to build the null distribution. It is
crucial to assume that the distribution of the signal is the same in
every voxel so that the F-statistics are comparable. This correction
strategy is applied in nilearn
:func:`nilearn.mass_univariate.permuted_ols` function.

.. figure:: ../auto_examples/manipulating_visualizing/images/plot_haxby_mass_univariate_001.png
   :target: ../auto_examples/manipulating_visualizing/plot_haxby_searchlight.html
   :align: center
   :scale: 60

We observe that the results obtained with a permutation test are less
conservative than the ones obtained with a Bonferroni correction
strategy.

In nilearn :func:`nilearn.mass_univariate.permuted_ols` function, we
permute a parametric t-test. Unlike F-test, a t-test can be signed
(*one-sided test*), that is both the absolute value and the sign of an
effect are considered. Thus, only positive effects
can be focused on.  It is still possible to perform a two-sided test
equivalent to a permuted F-test by setting the argument
`two_sided_test` to `True`. In the example above, we do perform a two-sided
test but add back the sign of the effect at the end using the t-scores obtained
on the original (non-permuted) data. Thus, we can perform two one-sided tests
(a given contrast and its opposite) for the price of one single run.
The example results can be interpreted as follows: viewing faces significantly
activates the Fusiform Face Area as compared to viewing houses, while viewing
houses does not reveals significant supplementary activations as compared to
viewing faces.


.. [1]

    The *p-value* is the probability of getting the observed values
    assuming that nothing happens (i.e. under the null hypothesis).
    Therefore, a small *p-value* indicates that there is a small chance
    of getting this data if no real difference existed, so the observed
    voxel must be significant.

.. [2]

    When the variate tested is a scalar (test of the *intercept*)
    --which corresponds to a one sample test--, no swapping can be
    performed but one can estimate the null distribution by assuming
    symmetry about some reference value. When this value is zero, one can
    randomly swap the sign of the target variates (the imaging
    signal). nilearn
    :func:`nilearn.mass_univariate.permuted_ols` function automatically
    adopts the suitable strategy according to the input data.