Effects are shown for controls left and CPs right. Table 2. CPs group effects. Figure 2. Within-group analysis: Voxels where the local pattern of activation discriminates between A object vs. CPs' MVPA activity over the right fusiform gyrus, left middle occipital gyrus, and left inferior occipital gyrus could discriminate between faces and objects at levels above-chance see Figure 1 and Table 2.
CPs also showed above chance discrimination between faces and bodies in the right fusiform gyrus, right lingual gyrus, left middle occipital gyrus, and inferior occipital gyri see Figure 1 and Table 2 , and above chance discrimination between faces and body parts over the left inferior occipital gyrus and right lingual gyrus see Figure 1 and Table 2.
CPs' pattern of fMRI activity could discriminate between objects and bodies over the right inferior occipital gyrus. Finally, CPs showed an above chance discrimination pattern between the inferior occipital gyrus bilateral , fusiform gyrus bilateral , right lingual gyrus, left inferior temporal gyrus, and right middle temporal gyrus see Figure 2 and Table 2. Between-group analyses: controls vs. The between-groups comparison indicated stronger face-object discrimination in controls than in CP.
This group difference was evident in the fusiform gyri, right inferior occipital gyrus, right inferior temporal gyrus, and right parahippocampal gyrus Figure 3 and Table 2. The two groups' MVPA activity did not differ when discriminating faces vs. Figure 3. Groups comparison. To define face-sensitive regions, we compared faces vs. As for the multivariate analysis, the multiple regression approach of SPM8 was used to estimate the response to each block in each of the 8 scanning acquisition runs, for each participant, with additional regressors of no interest included to model the run means.
This yielded 8 beta estimates for each condition; one for each run. To find face discriminating region in each group controls, patients , a one-sample t -test was performed for each group separately using face minus object contrasts as reference images. A between groups controls minus patients comparisons using two-sample independent t -test with unequal variance was performed with face minus object contrast images. In addition, face selective regions were also investigated in each subject separately i.
Within-group analyses: Controls and CPs. Since this lack of group activity could potentially be due to the between-subject variability in the location of face-sensitive regions, we additionally performed single-subject analyses, where we compared face vs.
Results, in line with previous studies e. Table 3. Core face regions i. The group comparison did not show any statistically significant difference between controls and CPs. Thus, as predicted, mass-univariate analysis is not as sensitive as MVPA in detecting group differences. Given the small number of single-subject localized face-sensitive regions in CPs see Table 3 , we did not run any statistical analysis to compare the two groups.
We investigated the neural characteristics of CP by examining the pattern of activity to faces, objects, headless bodies, and body parts using MVPA. For the first time, we also report that this pattern poor discrimination between faces and objects in CPs is also evident in the right parahippocampal gyrus.
The two groups did not show any difference in face-body, face-body part, object-body, and object-body part discriminations. Given that mass-univariate results failed to report any group difference, we can also conclude that MVPA represents a more sensitive approach than traditional univariate statistics in detecting group differences Norman et al.
Note that since only the face-object contrast showed group differences and that the univariate analysis failed to report differences between controls and CPs, we exclude that group differences can be explained in term of general activity differences. We acknowledge that face-sensitive regions e. In the current study, due to the lack of face-sensitive i.
This result is in line with previous human neuroimaging Pitcher et al. Despite the finding that occipito-temporal regions in people with CP could be used to discriminate face vs. In agreement with Avidan et al. This finding further suggests the pivotal role of AT for typical face processing Williams et al. However, in contrast to Avidan's et al. Human Rajimehr et al. The STS has been previously implicated in changeable aspects of face processing Hoffman and Haxby, ; Puce and Perrett, , facial emotions expression Said et al.
Given that we used static stimuli that did not show facial expressions, it is likely that that our experimental setting was not the most appropriate for engaging STS activity. Thus, aberrant activity in a network including occipital and temporal regions mediates atypical face processing skills in CP. It is important to note, however, that since the MVPA analysis adopted only tests the for neural discrimination accuracy between category pairs i.
In theory, the CP aberrant discrimination pattern could have been equally driven by object or face processing. The lack of an object-body and object-body part group difference seems to exclude an object-specific coding problem. However, in the same fashion, the lack of face-body and face-body part group differences seems to rule out a face-specific problem in CP. Given the nature of the condition, which is often characterized by a disproportionate deficit in face processing Duchaine and Nakayama, , and given that the group differences appears in brain areas strongly implicated in face Haxby et al.
A finding never reported before in CP neuroimaging literature is the reduced face-object discrimination in the right parahippocampal gyrus. Given that the parahippocampal gyrus is a region strongly implicated in memory processing Davachi et al. We note that the 1-back task did not tax memory, and CPs and controls did not differ in their performance on this task. It is, thus, possible that reduced face-object discrimination in the parahippocampal gyrus may reflect poor face memory in CP, as highlighted by their poor performance on the CFMT see Table 1.
Future studies which adopt tasks specifically tapping memorial aspects of face processing may clarify why reduced sensitivity was seen in this area. Our finding of face-body, face-body part, object-body, and object-body part representations within the occipital and fusiform cortices both in controls and CPs are consistent with previous studies Bar et al. The absence of group differences for face vs. The current study demonstrates that face-object discriminatory abilities in the lateral occipital cortex, fusiform gyrus, AT cortex and parahippocampal gyrus are compromised in people with CP.
Thus, both core- and extended- face networks appear to reflect the behavioral abnormality congenital prosopagnosics experience in everyday life and elucidates a neural marker of CP. Future studies should further investigate the face-specificity issue by, for instance, testing the neural representation of multiple exemplars of individual faces and objects.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. We also with to thank the Kanwisher lab MIT for providing stimuli we adopted in the one-back task. Mark A. Avidan, G.
Functional MRI reveals compromised neural integrity of the face processing network in congenital prosopagnosia. Detailed exploration of face-related processing in congenital prosopagnosia: 2. Functional neuroimaging findings. Selective dissociation between core and extended regions of the face processing network in congenital prosopagnosia. Cortex 24, — Furthermore, MVPA is often presented in the context of "brain reading" applications reporting that specific mental states or representational content can be decoded from fMRI activity patterns after performing a "training" or "learning phase.
In this context, MVPA tools are often referred to as classifiers or, more generally, learning machines. The latter names stress that many MVPA tools originate from a field called machine learning, a branch of artificial intelligence. To control the trade-off between the hyperplane complexity and training errors, a penalty factor is introduced.
The primal optimization problem becomes High values force slack variables to be smaller, approximating the behaviour of hard margin SVM. Figure 3 shows the effect of on the decision boundary.
Large does not allow any training error. Small however allows some training errors. In this figure, is typically preferred because it represents a trade-off between acceptable classifier performance and generalization to unseen examples i. To solve the mentioned primal optimization problem where a function has to be minimized subject to fixed outside constraints, the method of Lagrange multipliers is used.
This method provides a strategy for finding the local maxima and minima of a function subject to equality constraints. These are included in the minimization objective, and Lagrange multipliers allow to quantify how much to emphasize these see, e. Let and be two Lagrange multipliers.
We derive the so-called dual problem using the following Lagrangian of the primal problem:. The Lagrangian needs to be minimized with respect to , , and under the constraints , , and. Consequently, the derivatives of with respect to these variables must vanish:. Substituting the above results in the Lagrange form, we get the following:. According to Lagrange theory, in order to obtain the optimum, it is enough to maximize with respect to , : where comes from and. Because this dual problem has a quadratic form, the solution can be found iteratively by quadratic programming QP , sequential minimal optimization SMO , or least square LS.
This solution has the property that is a linear combination of a few of the training examples: The key feature of this equation is that for every except those which are inside the margin. Those are called the support vectors.
They lie closest to the decision boundary and determine the margin. Note that if all nonsupport vectors were removed, the same maximum margin hyperplane would be found. In practice, most fMRI experimenters use linear SVMs because they produce linear boundaries in the original feature space, which makes the interpretation of their results straightforward.
Indeed in this case, examining the weight maps directly allows the identification of the most discriminative features [ 27 ]. Nonlinear SVMs are often used for discrimination problems when the data are nonlinearly separable. Vectors are mapped to a high-dimensional feature space using a function. In nonlinear SVMs, the decision function will be based on the hyperplane:. It allows a nonlinear operator to be written as a linear one in a space of higher dimension.
Several types of kernels can be used in SVMs models. The most common kernels are polynomial kernels and radial basis functions RBFs. The polynomial kernel is defined by. The and parameters are set to control the decision boundary curvature. Figure 4 shows the decision boundary with two different values of and. We note that the case with and is a linear kernel. Radial basis function RBF kernel is defined by where is a hyperparameter. A large value corresponds to a large kernel width.
This parameter controls the flexibility of the resulting classifier Figure 5. In the fMRI domain, although non-linear transformations sometimes provide higher prediction performance, their use limits the interpretation of the results when the feature weights are transformed back to the input space [ 28 ].
Although SVMs are efficient at dealing with large high-dimensional data-sets, they are, as many other classifiers, affected by preprocessing steps such as spatial smoothing, temporal detrending, and motion correction.
LaConte et al. The study showed that for both SVM and CVA, classification of individual time samples of whole brain data can be performed with no averaging across scans. Ku et al. Misaki et al. The results suggest that normalizing mean and standard deviation of the response patterns either across stimuli or across voxels had no significant effect.
On the other hand, classifier performance can be improved by reducing the data dimensionality or by selecting a set of discriminative features. Decoding performance was found to increase by applying dimensionality reduction using the recursive features elimination RFE algorithm [ 31 ] or after selection of independent voxels with highest overall responsiveness, using a priori knowledge of GLM measures [ 29 ].
However, LaConte et al. Schmah et al. Other studies attempted to compare classifiers in terms of their performances or execution time.
Cox and Savoy [ 14 ] studied linear discriminant LD and SVMs to classify patterns of fMRI activation evoked by the visual presentation of various categories of objects.
The classifier accuracy was found to be significant for both linear and polynomial SVMs compared to the LD classifier. Pereira and Botvinick [ 34 ] found that the GNB classifier is a reasonable choice for quick mapping, LD is likely preferable if more time is given, and linear SVM can achieve the same level of performance if the classifier parameters are well set using cross-validation see Section 4.
When dealing with single-subject univariate analysis, features may be created from the maps estimated using a GLM. A typical feature will consist of the pattern of -values across voxels. The analysis is normally performed on spatially unsmoothed data to preserve fine-grained subject-specific information [ 35 ].
In such a case, features are simply the voxels. Other authors recommend applying spatial smoothing [ 36 ]. This idea is highly debated in the fMRI literature [ 30 , 37 ] see also Section 2. In both cases, the feature space can still be considered as high dimensional when all brain voxels or at least too-large regions of interest are used.
Therefore, the dimensionality of the data needs to be significantly reduced, and informative features voxels have to be wisely selected in order to make the classification task feasible. When small regions of interest are used, there is typically no need to reduce the dimensionality see the following Section 3. Several studies demonstrated the relevance of feature selection. The method is based on a simple multiple linear regression classifier in conjunction with as few as five selected voxels which outperforms the feature selection based on statistical parametric mapping SPM [ 41 ].
More recently, novel techniques have been developed to find informative features while ignoring uninformative sources of noise, such as principal components analysis PCA and independent component analysis ICA [ 42 , 43 ]. Such methods perform well when dealing with single-subject analysis. It is worth mentioning that feature selection can be improved by the use of cross-validation see Section 4. The best classifier will generally include only a subset of features that are deemed truly informative.
In fact, SVM classifiers can also be used to perform feature selection. To do so, Martino et al. For each voxel selection level, the RFE consists of two steps. First, an SVM classifier is trained on a subset of training data using the current set of voxels. Second, a set of voxels is discarded according to their discriminative weights as estimated during training.
Data used as test are classified, and generalization performance is assessed at each iteration. RFE has been recently used for the analysis of fMRI data and has been proven to improve generalization performances in discriminating visual stimuli during two different tasks [ 31 , 46 ].
Multivariate classification methods are used to identify whether the fMRI signals from a given set of voxels contain a dissociable pattern of activity according to experimental manipulation. One option is to analyze the pattern of activity across all brain voxels. In such a case, the number of voxels exceeds the number of training patterns which makes the classification computationally expensive.
A typical approach is to make assumptions about the anatomical regions of interest ROI suspected to be correlated with the task [ 14 , 47 , 48 ]. In such cases, the ROI will represent spatially contiguous sets of voxels, but not necessarily adjacent. An alternative is to select fewer voxels e. This method has been introduced by Kriegeskorte et al.
In other terms, the searchlight method scores a voxel by how accurately the classifier can predict a condition of each example on the training set, based on the data from the voxel and its immediately adjacent neighbours.
The pixels for conditions are random numbers, and pixels of condition are constructed from those of except in some patterns where a value of 1 is added. We used four runs where each run contains 30 examples 15 for condition and 15 for condition. One iteration of the algorithm consists of the brain volume being randomly divided into a number of clusters search spheres such that each voxel is included in one and only one cluster, and a classifier performance is computed for it.
Thus, a mean performance across all the constellations in which the voxel took part is assigned to that voxel as opposed to the searchlight where each voxel is assigned the one value computed when the sphere was centered on it Figure 7. To ensure unbiased testing, the data must be split into two sets: a training and test set. In addition: it is generally recommended to choose a larger training set in order to enhance classifier convergence. Indeed, the performance of the learned classifier depends on how the original data are partitioned into training and test set, and, most critically, on their size.
In other words, the more instances we leave for test, the fewer samples remain for training, and hence the less accurate becomes the classifier. On the other hand, a classifier that explains one set of data well does not necessarily generalize to other sets of data even if the data are drawn from the same distribution.
In fact, an excessively complex classifier will tend to overfit i. This may occur, for example, when the number of features is too large with respect to the number of examples i. The goal is to identify the best parameters for the classifier e.
By cross-validation, the same dataset can be used for both the training and testing of the classifier, thus increasing the number of examples with the same number of features.
In -fold cross-validation, the original data are randomly partitioned into subsamples. Of the subsamples, a single subsample is retained for validating the model, and the remaining subsamples are used as training data.
The cross-validation procedure is then repeated times, each of the sub-samples being used for testing. The results can be averaged or otherwise combined to produce single performance estimation. In this procedure, data of one run provide the test samples, and the remaining runs provide the training samples. The second one is leave-one-sample-out cross-validation LOSO-CV in which one sample is taken from each class as a test sample, and all remaining samples are used for classifier training.
The samples are randomly selected such that each sample appears in the test set at least once. Thus, it is often recommended to use no or minimal smoothing during pre-processing Misaki et al. Op de Beeck, ; Hendriks et al. Additionally, since there is less anatomical correspondence across participants than within subjects and smoothing is more beneficial when there is lower spatial correspondence between images , the unsmoothed or minimally smoothed images are often used throughout the first-level i.
The amount of smoothing is dependent on the type of task or how localized the relevant psychological process is Gardumi et al. In most cases, the whole-brain contrast images are masked to remove voxels that are uninformative e.
Sometimes, you may want to use a functional mask, based on an independent dataset or meta-analysis e. Feature selection i. Reducing the overall number of features also helps decrease the time required to perform analyses and mitigates the risk of overfitting in decoding analyses. Importantly, feature selection generally must be defined on separate data i.
For example, it is not appropriate to create an ROI of the voxels that respond to faces based on all runs in your dataset and then use MVPA to test if this ROI significantly discriminates between faces and other images within the same data. Instead, independent subsets of the data e.
This is true whether or not feature selection is based on which voxels generally respond most to the stimuli e. In cases where researchers wish to use the same data for both feature selection and decoding analyses, feature selection should be performed independently within the training data for each data fold.
It is, of course, important not to try out multiple feature selection strategies on the data on which inferences will be made i. Whereas feature selection reduces the number of features in a model by selecting a subset of features to include in model training, without changing those features in any way, a related family of approaches, called dimension reduction, reduces the number of features in a model by transforming them into fewer dimensions. For example, principal components analysis PCA transforms features into a set of orthogonal values i.
Before model training, you can specify how many components you would like to keep as features in your model or what proportion of the variance you would like the retained components to explain.
You can base decisions about how to set these thresholds by exploring different possibilities within the training data, using the same nested data folding techniques described in the above discussion of hyperparameter tuning. Dimension reduction techniques, such as PCA, can be beneficial in moving from a situation common in fMRI studies, where you have far more features than samples, to one where you have substantially fewer features in your model, but still retain the majority of the information contained in the entire feature set.
Just as with feature selection, this can be useful in preventing overfitting your model to the training data. In addition, transforming features that are correlated with one another into a smaller number of orthogonal components can be beneficial for improving the performance of algorithms that perform best when features are independent of one another e.
Now we will discuss how to implement MVPA in your own research. There are several software packages now available to help researchers use MVPA methods, including python-based packages [e.
Each software package differs somewhat in terms of its default methods or parameters and how easily certain tests are run. For clarity, we will describe the steps in terms of a simple experiment in which participants view faces of humans and dogs of varying ages.
The first two steps are always necessary, and the following steps i. Step 1. Define the conditions. In our example, we will consider response patterns elicited by four different stimulus conditions: pictures of baby and adult humans and dogs. Suppose stimuli from each of these conditions were presented multiple times in each of 10 runs. We could model responses to each condition resulting in 40 samples or individual trials since each stimulus was presented more than once per run.
Step 2. Select the region of interest. Next, we must decide which regions of the brain to test. The analysis is run on each region separately, irrespective of the number of regions being tested see above for the discussion of selecting features.
That is, if analyzing a single brain region or set of brain regions as a single ROI , then the analysis is only run once, while a whole-brain searchlight analysis consists of completing the analysis as many times as there are voxels in the brain.
In the remainder of this section, all steps before significance testing will be described as if being conducted within a single ROI within a single participant. For each condition, the voxels within the selected region are systematically rearranged into a vector for each condition, such that the first voxel in the resulting vectors corresponds to the same point in the brain for each condition Figure 1C and D.
As described earlier in this manuscript, a classification algorithm is iteratively trained on one subset of the data and then tested on an independent subset of the data via cross-validation. Step 3. Data splitting. The simplest method for partitioning a dataset into training and testing data is the holdout method, in which you select one subset of your data for model training and one for model testing e. While this method is simple and fast, the definition of the training and test sets i.
As such, it is more common to use k -fold cross-validation, in which the data are divided into training and testing sets multiple k times, and the training and testing procedure is performed in each subsetting of the data Table 2 , Figure 5. Data within each of the k subsets are used as test data once and as training data k -1 times.
For example, using 5-fold cross-validation, data from our run fMRI study would be divided into 5 subsets e. Leave-one-sample-out cross-validation is a version of k -fold cross-validation where k is the total number of samples, and, similarly, in leave-one-run-out cross-validation Figure 3 , k is the number of runs in the fMRI study.
In cases where pattern information can be aggregated across participants, leave-one-participant-out cross-validation is also an option. Classification analysis. A Within each participant, an algorithm is trained on a subset here, 9 out of 10 runs of a participant's data and then tested on a previously unseen subset here, the heldout run.
In the training phase, each sample here, themultivoxel pattern for each condition in each run is treated as a point in a representational space. Formvoxels per sample, there aremdimensions in the representational space. Each sample's coordinates are defined by the magnitude of each voxel's response i. In many commonly used classification methods, the algorithm then tries to define a boundary in linear SVM learning, a m -1 -dimensional hyperplane in the space such that each sample is classified with its correct label note that the illustration is merely a conceptual example; please see the main text for a more specific discussion of how particular classification algorithms work.
B After calibrating model parameters on the training data, the algorithm is then fed the testing data, which it has never seen, without the correct labels. Depending on where those samples fall in the representational space, the algorithm classifies them based on the distinctions it has learned from the training data. If a sample was incorrectly classified, it is counted as an error.
C The average accuracy across all data folds is calculated for each participant. D Repeat this process for each participant, and compare the group-level accuracy to what would be expected based on random chance. Representational similarity analysis. A To create the neural RDM, the patterns of neural responses elicited by each condition within a particular region are compared with each other to estimate their relative distinctiveness e.
These distances are organized into a neural RDM. Since the RDM is symmetric about a diagonal of zeros, only the lower off-diagonal triangle of this matrix is extracted, which can be B visualized in a low-dimensional space using MDS or D compared to a behavioral dissimilarity structure.
B The MDS plot visualizes the dissimilarity structure by plotting conditions that are more similar closer together. Here, we can see that human faces cluster together and are separate i.
We can also see that there seems to be an effect of age, such that young faces are similar to each other and separate from older faces. C This effect of perceived age can be tested by creating a behavioral dissimilarity structure. This is achieved by finding the absolute difference between the perceived youth of each pair of faces.
Again, the lower off-diagonal triangle is extracted. D The lower off-diagonal triangles of the neural and behavioral RDMs are compared with one another, often using the Spearman correlation, as it does not assume a linear mapping between RDMs.
This correlation coefficient is mapped back into the region, creating a map of how closely the neural data matches the behavioral ratings. E A model RDM of species reflects if two pictures are of the same species or not.
F Multiple RDMs can be included as predictors in a regression, and the resulting betas may be mapped back into the ROI as an indicator of how much that variable predicted the neural data over and above the other predictor s. Nested k -fold cross-validation with hyperparameter tuning. Cross-validation consists of iteratively splitting data into training and testing datasets, training an algorithm on the training data and then testing the resulting model on the testing data.
For each of the k divisions of the data i. To perform hyperparameter tuning, one would further split the training data into a number of 'sub-folds' consisting of sub-training and validation datasets. Within each of these 'sub-folds', the algorithm is trained on the sub-training data and tested on the validation data once per hyperparameter set.
Once every unique combination of hyperparameters has been tested in every 'sub-fold', the hyperparameter set with the best performance across the validation datasets within the training data is selected. The selected hyperparameter set is then used to train the algorithm on the entire set of training data for that fold. The resulting model is then tested on the testing data in that fold. This process is repeated for each fold i. Finally, the average performance of the algorithm across all testing datasets is calculated.
To avoid biasing algorithms toward predicting one particular category, it is important to avoid class imbalance in the training data by including the same number of samples of each category e. A simple strategy is to have an equivalent number of samples of each category in each run and use leave-one-run-out cross-validation Figure 3. In our study, this would amount to fold cross-validation, with data from each of our four stimuli present 9 times in each training set and once in each testing set.
If performing feature selection or hyperparameter tuning on this data, then the training data within each fold must be split into sub-training and validation sub-folds i. After this iterative testing is completed on every sub-fold within the training data, the best hyperparameters and features, if conducting feature selection within the training data are selected to be used when training the algorithm—i. Note that this process may result in different features, feature weights and hyperparameters being used in each fold.
Step 4. Train model. Within each training set, we label the samples with their correct labels and give this information to our algorithm.
Essentially, the model considers each multivoxel pattern as a point in a multidimensional representational space, such that each voxel corresponds to one dimension Figure 3.
If we have m voxels in our sample, then, we have an m -dimensional space. The algorithm tries to select model parameters such that samples are most often assigned the correct labels.
Step 5. Test model. Once the model has been trained, we give the algorithm the testing data, which is provided without any labels.
The model categorizes each of these new samples based on where they fall in the representational space relative to the boundaries that were estimated from the training data as in SVM learning or relative to its neighbors in the training data as in k -nearest neighbor classification; Table 3.
We then count the number of errors it made in its categorization and calculate the classification accuracy of that model. Although classification accuracy is the most commonly used measure of decoding success in MVPA of fMRI data, other methods, such as the area under the ROC receiver operating characteristic curve, can sometimes be preferable Ling et al. Next the average classification accuracies across testing sets are compared to what would be expected due to random chance e.
Create RDMs. An RDM represents the relative differences between the stimuli or conditions. The cell corresponding to row i and column j is the difference i. Step 3a. Neural RDM. Therefore, we first obtain a single response pattern for each stimulus rather than one response pattern per stimulus per run by averaging the neural response patterns for each stimulus across runs. Theoretically, the higher the correlation distance, the more the brain region distinguishes between those two concepts.
Once calculated, these distance values are organized into an RDM. Note that this will result in a symmetric matrix across the diagonal because the difference between a baby human and a baby dog is the same as that between a baby dog and a baby human. Note also that the diagonals will all be zero, because each condition is perfectly correlated with itself, and thus has a correlation distance of zero.
If comparing two RDMs, this symmetry and diagonal of zeros would falsely increase the correlation between the full RDMs, so only the lower off-diagonal triangles of the RDMs are extracted for further analyses Figure 4A.
Step 3b. Non-neural RDMs. In order to determine what the structure of the neural RDM corresponds to, we can compare it with similarly prepared RDMs from participant data e. In our example, we might want to test if a brain region organizes faces by perceived age. To create a behavioral RDM based on the perceived age for each participant, we could calculate the absolute difference between the participant ratings of age for each pair of conditions and then organize this into a matrix and extract the lower off-diagonal triangle Figure 4C.
Step 4 option 1. Compare neural and non-neural RDMs. Note that since behavioral and neural RDMs likely use different scales, using Spearman rather than Pearson correlations to determine how well they correspond can be beneficial, as this does not assume a linear relationship. If you have multiple predictors e. The beta associated with perceived age would reflect the extent to which age predicts the neural data over and above species.
If doing so, it is important to first ensure that the predictor RDMs are independent of one another, as, otherwise, their respective regression coefficients will be difficult to interpret. Step 4 option 2. Visualize RDMs. RDMs can also be used to visualize the structure of the data. When visualizing RDMs, each cell is often colored based on its value to visually indicate which conditions are similarly represented e.
Techniques such as MDS can also be used to view the overall structure of the data. MDS plots data points in a low-dimensional space based on how similar they are: two stimuli that elicit similar response patterns are plotted close together, while two stimuli that elicit different response patterns are plotted further apart Figure 4B.
This can help identify how stimuli are organized in the brain e. After you have completed the above steps, you are ready for significance testing. In many cases, this can be done in the same way as in univariate experiments: results from a searchlight analysis, for example, have a similar data structure to other statistical parametric maps e. Of course, the exact approach you use will depend on the specifics of your data. Since correlation coefficients and classification accuracy values are bounded by zero and one, it is appropriate to transform them e.
The statistical significance of the results of RSA or decoding analyses can be assessed within each subject or across subjects, and these methods test fundamentally different questions. The relevant statistical test e. The result is considered significant if it surpasses the critical value in this null distribution e.
Testing the significance of MVPA results across subjects can be accomplished in much the same way that data from corresponding ROIs or statistical parametric maps are tested for significance across participants in univariate studies.
It should be noted that if voxel-wise results were obtained for each participant e. Note that if using multiple comparison correction methods that require estimating smoothness e.
In this section, we will discuss the types of research questions that are particularly amenable to MVPA. We will consider how MVPA can be used to answer different kinds of research questions.
For clarity, we will continue to consider our example experiment in which participants considered human and dog faces of varying ages while undergoing fMRI. Many researchers are inherently interested in what the participant is currently thinking about or attending to i. In our example above, we used classification analysis to determine if people are considering human or dog faces. If we can successfully train a predictive model to decode this information based on response patterns evoked in a given brain region, then there are likely fundamental differences in how human and dog faces or some covariate are represented in that region.
This can provide valuable insight into how the brain encodes such information. We can also examine how information is transformed as it travels through different brain regions. In the Benefits of MVPA section, we discussed a study that provided evidence that the left STS and mPFC represent emotion in terms of its abstract emotional value independent of the modality through which the emotion is expressed Peelen et al.
For example, this approach can illustrate how information is transformed as it progresses from early sensory cortex where neural population codes reflect low-level sensory properties, such as modality to later stages of processing where neural population codes reflect higher-level, more abstract categories, such as emotional content. This could allow us to see how the stimuli are represented at each neural processing stage.
Earlier, in the step-by-step instructions, we discussed how RSA may be used in our example to discover that a brain region clusters stimuli by age as well as by species and how to test this using explicit models. That is, RSA allows us to test what type of information a given brain region uses to organize state or stimulus representations.
Decoding analyses can be used in a complementary fashion to RSA. Using cross-classification, we can ask if we represent age in the same way across species. Cross-classification involves training a model within one condition e. If the model can reliably decode the age of dog faces after being trained on human faces, then there are likely consistent underlying patterns that encoded age across species in this region.
This would suggest that people represent the age of human and dog faces similarly at this level of processing. Does everyone see and process the world in the same way? Just like univariate analyses, individual differences may be integrated with any type of MVPA to better understand how individuals process information. That is, results from MVPA can be used to predict individual differences. For example, Ersner-Hershfield et al. The similarity of these response patterns may reflect future self-continuity and thus predict retirement savings.
MVPA is a useful tool when studying individual differences because these differences may manifest in the distinctiveness of neural patterns, not just the overall response magnitude of a region. Although we have focused largely on the benefits of MVPA, like any data analytic technique, there are important issues and potential pitfalls to consider. For instance, in decoding analyses, it can be difficult to interpret anything about the model itself, beyond the yes or no question of whether or not a region distinguishes between the stimuli Carlson and Wardle, MVPA also introduces many more researcher degrees of freedom.
It is important to remember that MVPA is not simply a replacement of univariate analyses. In addition, it is important to consider that two conditions might evoke similar univariate responses but different multivoxel response patterns. This does not necessarily imply that these conditions have nothing in common psychologically or that they do not entail shared processing demands.
As such, both techniques may be used in a complementary manner. In this section, we describe issues that may be helpful to consider when planning and interpreting MVPA. In both MVPA and univariate analyses, it is often difficult to ascertain when the apparent neural encoding of stimulus characteristics reflects the computation or representation of those characteristics themselves and when it reflects systematic and perhaps subtle effects on processes that typically follow the computation of those characteristics.
For example, much previous research suggests that the activity in parietal and premotor regions associated with planning and executing actions is associated with viewing tools; whether this activity reflects encoding part of the tool concept itself Mahon and Caramazza, , or downstream processes that typically follow tool identification e.
Thus, the same results may be interpreted by some authors as information encoding and by other authors as processes being affected by that information. In the same way, if a decoding analysis can distinguish between human and dog faces, it is unclear if that brain region encodes the content of those stimuli differently e. In such instances, it can be appropriate to use characterizations of fMRI data that capture how responses change over time, such as how multivoxel patterns ebb and flow over time e.
Chang et al. Hyon et al. The same methods used in MVPA can be used to analyze patterns of functional connectivity. As such, when analyzing patterns of functional connectivity to decode psychological processes or states, data can be easily aggregated across participants, which can substantially increase the amount of training data available for decoding analyses and, in turn, the ability of machine learning algorithms to learn generalizable distinctions between conditions.
As alluded to above, decoding analyses generally benefit from having more training data in which to learn distinctions between conditions, and one way to achieve this is by analyzing response patterns that can be well-aligned across participants.
This includes cases where patterns of functional connectivity, rather than multivoxel response patterns, are used for decoding Richiardi et al. In addition to the tendency for techniques that facilitate between-subject decoding to produce relatively high classification accuracies due to the increase in the amount of training data available , between-subject decoding approaches could also have practical benefits in cases where researchers wish to predict things about new individuals.
That said, within-subject analyses may be preferable in cases where response patterns are thought to be idiosyncratic to particular participants, either because of between-subject heterogeneity in fine-scale functional brain organization see Comparison Across Individuals section and Representational Spatial Scale section or because the stimuli in question connote meaning that is inherently specific to each participant e.
While MVPA can detect information carried at a finer-grained spatial scale than most univariate fMRI analyses, it is still relatively coarse when compared with methods that analyze individual neurons. Patterns of neuronal activity have been shown to carry a diversity of information Georgopoulos et al.
Thus, while we are gaining some nuanced signal in MVPA compared with univariate tests, we are missing information carried at a much finer spatial scale. The relatively low spatial resolution of fMRI data can engender misleading results in the context of MVPA, potentially leading to false positives in some cases and false negatives in others. For example, neurophysiological studies in monkeys show that nearby but largely non-overlapping sets of neurons in the orbitofrontal cortex encode the value of social and non-social rewards Watson and Platt, Given that many thousands of neurons comprise each voxel in a multivoxel pattern, using MVPA or univariate analyses on fMRI data to study such phenomena may lead researchers to erroneously conclude the presence of a common encoding scheme.
This could be an issue in any cases where distinct, but nearby or interdigitated, populations of neurons encode different kinds of information. On the other hand, analyzing multivoxel patterns, rather than multi-neuron patterns, can also systematically produce false negatives.
0コメント