A simple statistical misunderstanding is leading many neuroscientists astray in their use of machine learning tools, according to a new paper in the Journal of Neuroscience Methods: Exceeding chance level by chance.
As the authors, French neuroscientists Etienne Combrisson and Karim Jerbi, describe the issue:
Machine learning techniques are increasingly used in neuroscience to classify brain signals. Decoding performance is reflected by how much the classification results depart from the rate achieved by purely random classification.
Suppose you record activity from my brain while I am looking at a series of images of people. Some of the people are male, some are female. You want to determine whether there is something about my brain activity (a feature or pattern) that’s different between those two classes of stimuli (male and female). Now suppose you find a pattern that allows you to ‘read my mind’ and determine whether I’m looking at a male or a female image, with 70% accuracy. Is that a good performance? Well, you might think: guessing at random, flipping the proverbial coin, we would only be right 50% of the time. 70% is much higher than 50%, so the method works!
Not so fast, say Combrisson and Jerbi:
In a two-class or four-class classification problem, the chance levels are thus 50% or 25% respectively. However, such thresholds hold for an infinite number of data samples but not for small data sets. While this limitation is widely recognized in the machine learning field, it is unfortunately sometimes still overlooked or ignored in the emerging field of brain signal classification […] while it will not come to anyone as a surprise that no study to date was able to acquire infinite data, it is intriguing how rarely brain signal classification studies acknowledge this limitation or take it into account.
The problem is that although, intuitively, we expect that random chance would be able to pick the correct choice, out of two choices, 50% of the time; but this assumption is not valid in machine learning unless we have an infinite sample size, which we don’t. The smaller the sample size, the more likely chance performance is to deviate from the ‘theoretical’ chance level e.g. 50%.
Combrisson and Jerbi note that this problem is well known to statisticians and computer scientists. However, they say, it is often overlooked in neuroscience, especially among researchers using neuroimaging methods such as fMRI, EEG and MEG.
So how serious is this problem? To find out, the authors generated samples of random ‘brain activity’ data, arbitrarily split the samples into two ‘classes’, and used three popular machine learning tools to try to decode the classification. The methods were Linear Discriminant Analysis (LDA), Naive Bayes (NB) classifier, and the Support Vector Machine (SVM). The MATLAB scripts for this is made available here.
By design, there was no real signal in these data. It was all just noise – so the classifiers were working at chance performance.
However, Combrisson and Jerbi show that the observed chance performance regularly exceeds the theoretical level of 50%, when the sample size is small. Essentially, the variability (standard deviation) of the observed correct classification rate is inversely proportion to the sample size. Therefore, with smaller sample sizes, the chance that the chance performance level is (by chance) high, increases. This was true of LDA, NB and SVM alike, and regardless of the type of cross-validation performed.
The only solution, Combrisson and Jerbi say, is to forget theoretical chance performance, and instead evaluate machine learning results for statistical significance against sample-size specific thresholds. They provide a helpful “look-up table” revealing the minimum performance that a classifier needs to achieve in order to statistically significantly exceed chance. This table offers both a yardstick by which to judge previous studies, and a guide for the future. Some neuroscientists who use machine learning may cringe at how high these figures are:
Combrisson E, & Jerbi K (2015). Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy. Journal of Neuroscience Methods PMID: 25596422