On Twitter

The Australian Bioinformatics Network was initiated by

ABACBS (pron.ˈabəkəs) runs the Australian Bioinformatics Network

The latest...
Handy links...
Open Jobs
« Need help managing registrations for your event? | Main | Show everyone your science! »

Don't correlate proportions!

In molecular bioscience, measurements of relative abundance are, well… abundant. However, appreciation of the need to analyse and interpret these data differently to measurements of absolute abundance is scarce.

Correlation is one of the workhorses of quantitative bioscience, but people do not always realise that it should not be used for data that carry only relative information.

This short video uses cookbooks, thrillers, and a bit of 3D geometry to explain why correlation is not OK for proportions, percentages and parts-per-million.

And, just in case you are wondering whether you are working with relative abundance data, ask yourself “Would doubling the amount of starting material double my measurements?”

If the answer is no, then you are working with relative information and should steer clear of correlation!




Reader Comments (4)

do you have a tool David for this? I was sold and it would be awesome for me to have a look at our methylation data, because I know it's all proportions for a fact as all methylation is measured in % !!

March 22, 2013 | Registered CommenterNick Wong

No no no no no. There's nothing wrong with correlating relative abundance data, or proportional data.
The video presents an absurdity where one infers absolute numbers from correlations of proportionality. That is of course wrong but that does not mean one cannot usefully correlation proportionality. The point in in how you interpret any correlations or lack of correlations observed.

By the same token, linear correlations are not necessarily more meaningful than rank order correlations (Spearman). By all means, think about the tool you are using and what relationships are implied. Don't make teh stupid leap the video refers to, inferring absolution trends from proportional trends, but don't throw away a tool just because somebody can imagine a way to use it incorrectly.

March 25, 2013 | Registered Commenterwade hines

"The video presents an absurdity where one infers absolute numbers from correlations of proportionality."
...well I think a lot of folks are doing that with omics data, especially transcriptomics and metagenomics.

I guess I just don't understand how to _interpret_ correlations of relative abundances.

It's true that many don't really think through all of the various normalization steps being applied to data.
Is the gross level of transcription higher in one state than in another?
Are we normalizing to total transcript or to a housekeeping subset of largely invariant transcripts or to some particular reference set? These questions impact every time of analysis including for instance ANOVAs. But that does not mean we can't use ANOVAs.
For just about everything done we quickly run into trouble if we are comparing two states that are significantly different. A functionally equivalent level of transcript can be shifted to be apparently smaller in a statistically significant way if there is a large increase in abundant transcripts if we normalize to total transcript levels. Every analysis needs to be thought through. This is one good reason we can't reliably outsource bioinformatics to BGI.

March 25, 2013 | Registered Commenterwade hines
Member Account Required
You must have a member account on this website in order to post comments. Log in to your account to enable posting.