As part of the Statistical Society of Australia's National Access Grid Seminar Series, Professor Terry Speed will speak on "Removing unwanted variation: from principal components to random effects"
Time: Wednesday 26 June, 4.00pm AEST (2.00pm AWST)
Place (physical): AMSI Access Grid Room at the University of Melbourne
Place (online): If you want to participate in this seminar, please book an Access Grid Room that you are able to use at your local university/institution (a list is available at http://www.accessgrid.org/nodes)
RSVP: your attendance by emailing Anne Nuguid (firstname.lastname@example.org) noting your name, email address, and the location of the AGR that you are participating from
Ordinary least-squares is a venerable tool for the analysis of scientific data originating in the work of A-M. Legendre and C. F. Gauss around 1800. Gauss used the method extensively in astronomy and geodesy. Generalized least squares is more recent, originating with A. C. Aitken in 1934, though weighted least squares was widely used long before that. At around the same time (1933) H. Hotelling introduced principal components analysis to psychology. Its modern form is the singular value decomposition. In 1907, motivated by social science, G. U. Yule presented a new notation and derived some identities for linear regression and correlation. Random effects models date back to astronomical work in the mid-19th century, but it was through the work of C. R. Henderson and others in animal science in the 1950s that their connexion with generalized least squares was firmly made.
These are the diverse origins of our story, which concerns the removal of unwanted variation in high-dimensional genomic and other "omic" data using negative controls. We start with a linear model that Gauss would recognize, with ordinary least squares in mind, but we add unobserved terms to deal with unwanted variation. A singular value decomposition, one of Yule's identities, and negative control measurements (here genes) permit the identification of our model. In a surprising twist, our initial solution turns out to be equivalent to a form of generalized least squares. This is the starting point for much of our recent work. In this talk I will try to explain how a rather eclectic mix of familiar statistical ideas can combine with equally familiar notions from biology (negative and positive controls) to give a useful new set of tools for omic data analysis. Other statisticians have come close to the same endpoint from a different perspectives, including Bayesian, sparse linear and random effects models.
Terry Speed completed a BSc (Hons) in mathematics and statistics at the University of Melbourne (1965), and a PhD in mathematics at Monash University (1969). He held appointments at the University of Sheffield, U.K. (1969-73) and the University of Western Australia in Perth (1974-82), and he was with Australia's CSIRO between 1983 and 1987. In 1987 he moved to the Department of Statistics at the University of California at Berkeley (UCB), and has remained with them ever since. In 1997 he took an appointment with the Walter Eliza Hall Institute of Medical Research (WEHI) in Melbourne, Australia, and was 50:50 UCB:WEHI until 2009, when he became emeritus professor at UCB and full-time at WEHI, where he heads the Bioinformatics Division. His research interests lie in the application of statistics to genetics and genomics, and to related fields such as proteomics, metabolomics and epigenomics.