Hi there. Advaita is dedicated to bringing you the most advanced, easiest-to-use Bioinformatics tools out there. And that includes educational materials designed to help you take advantage of all the powerful features we offer. Our last post about p-value correction factors was a bit confusing. This blog post explains how each method works, so you can decide when to use each one.
We are lucky to have a few bioinformaticians around the office, including Dr. Sorin Draghici our CEO and founder. If you don’t have a bioinformatics expert in-house, you might want to pick up his book. It’s full of useful information, and I think the best part about it is how easy it is to read— he makes it fun! For now, if you want to know more about getting the most from your analyses, read on…
A p-value represents the probability of observing an event by random chance. For example, if there are 5 differentially expressed (DE) genes on pathway X out of 100 DE genes in the dataset, the over-enrichment p-value for pathway X is the probability that from a randomly selected set 100 genes in the dataset, 5 or more fall on pathway X. Significance is determined by setting a threshold, in many cases 0.05. If the p-value is less than 0.05, pathway X is considered significant because the chance of randomly observing the same result is less than 5%.
This means that there is still a chance that the observation was in fact due to randomness and pathway X is not significant, what we would call a “false positive.” The chance of pathway X being a false positive is small, but when we perform this test multiple times as we would for multiple pathways, the chance of reporting at least one false positive increases quickly. That is because the probability of reporting a false positive in a group of independent tests is the sum of the individual p-values. When this is done for hundreds of pathways, we are virtually guaranteed to have some pathways that appear to be significant just by chance. This is known as the “multiple comparisons problem,” and we tell you how to correct for it in the first section.
Enrichment tests are used in a number of settings including enrichment pathway analysis  and gene ontology (GO) enrichment analysis. However, the GO has an additional structure that includes a hierarchical organization of its terms, as well as a “true path rule” that allows genes to be associated with entire paths through the ontology, rather than single terms . Because of these additional properties, specific enrichment analysis methods (and associated multiple comparison strategies) have been developed for GO enrichment analysis. Two of these methods will be briefly discussed in the second section.
I. Methods of Correcting for Multiple Comparisons
General methods for multiple comparison corrections may be applied to any enrichment analysis. There are two strategies to limit the number of false positives across a large number of significance tests, and several methods have been developed for each strategy.
Strategy 1. Limit the probability of making a mistake (reporting a false positive) for each individual test
Strategy 2. Limit the rate of false positives, i.e. the proportion of false positive tests
In iPathwayGuide and iVariantGuide, we offer the most widely-cited method for each strategy. Furthermore, the methods we chose provide a range of stringency so that you can choose what is appropriate for your data. Try it out now!
The Bonferroni correction is considered to be the most conservative method to correct for multiple comparisons, meaning that the fewest false positives are returned. The drawback is that some truly meaningful events may not be reported as significant. The Bonferroni method guarantees that the chance of any individual test yielding a false positive is less than the chosen significance threshold [3,4]. In other words, for a 5% significance threshold, the Bonferroni correction guarantees that the probability of generating at least one false positive is less than 5%. The more tests we run, the smaller the individual (raw) p-values must be for them to remain significant after the Bonferroni correction.
False Discovery Rate
In contrast to Bonferroni, FDR is one of the most lenient methods, allowing more true positives to be reported as significant with the drawback that some false positives may also be reported as such. Developed by Benjamini and Hochberg, FDR correction guarantees that the proportion of false positive tests will be smaller than the original significance threshold [5,6]. In other words, for a 5% significance threshold, FDR correction guarantees that the proportion of false positives is less than 5% of the total number of positive tests.
II. Multiple Comparisons in GO enrichment analysis
Due to the True Path Rule, genes associated with a GO term are also associated with its parent terms (for more on this, see Chapter 22 of Dr. Draghici’s book ). This means that simply performing an enrichment analysis for each GO term will count each gene many times, which is a serious problem (see Draghici, Chapter 24). Furthermore, testing the enrichment of all GO terms is not necessary and due to the unavoidable multiple comparison curse will increase the number of false positives reported. Luckily, one can leverage the structure and additional properties of GO in order to limit the number of tests performed, and therefore the number of comparisons one must correct for. In 2006, Alexa  proposed two methods to accomplish this: “Elim” and “Weight.”
In iPathwayGuide and iVariantGuide we offer both methods, each of which follow the same outline.
1) Decouple GO terms from one another
2) Perform significance tests
3) Correct for multiple comparisons
The Elim method assesses the significance of GO terms starting with the most specific terms first. The benefit of this approach is that it is easier to find specialized terms that are significant, e.g. "response to amphetamine" is more descriptive than "response to chemical.” This approach provides a very nice custom cut through the GO hierarchy that “magically” identifies the lowest level of abstraction that contains the significant GO terms in the given experiment.
Given a set of related GO terms, the Weight method is designed to identify the term that best represents the genes of interest, regardless of where the term falls in the hierarchy. This approach is less stringent than Elim, capturing more true positives with the drawback of including additional false positives.
iPathwayGuide and iVariantGuide are the only tools to provide these advanced correction factors to help you minimize false positives. Try them today for FREE and see what is truly significant in your data.
1. Khatri, P., Sirota, M., & Butte, A. J. (2012). Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol, 8(2), e1002375.
2. Rhee, S. Y., Wood, V., Dolinski, K., & Draghici, S. (2008). Use and misuse of the gene ontology annotations. Nature Reviews Genetics, 9(7), 509-515.
3. Dunn, O. J. (1959). Confidence intervals for the means of dependent, normally distributed variables. Journal of the American Statistical Association,54(287), 613-621.
4. Dunn 1961 Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293), 52-64.
5. Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 289-300.
6. Benjamini, Y. & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics, 1165-1188.
7. Drăghici, S. (2011). Statistics and data analysis for microarrays using R and bioconductor. CRC Press. Available here.
8. Alexa, A., Rahnenführer, J., & Lengauer, T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics, 22(13), 1600-1607.