Introduction to Model I and Model II linear regressions
What are linear regressions?
Linear regression is a statistical method for determining the slope and intercept parameters for the equation of a line that “best fits” a set of data.
The most common method for determining the “best fit” is to run a line through the centroid of the data (see below) and adjust the slope of the line such that the sum of the squares of the offsets between the line and the data is a minimum. Thus, this is called the “least squares” technique.
The centroid is the point determined by the mean of the x-values and the mean of the y-values. Regardless of the model chosen, the line will pass through the centroid of the data. For weighted data sets, the line will pass through the weighted centroid.
Among the various models, there are different methods for calculating the offsets between the line and the data points. Since many of these methods minimize the sum of the squares of the offsets, they are all called “least squares” techniques; because of this, the term “least squares” does not designate a specific nor a unique method.
How are Model I and Model II regressions different?
In the case of Model I regressions, the offsets are measured parallel to one of the axes. For example, in the regression of Y-on-X (the most common regression technique), this would be parallel to the Y-axis. So, we fit the line by minimizing the sum of the squares of the y-offsets. For the X-on-Y regression, we would use the x-offsets measured parallel to the X-axis.
For Model II regressions, the offsets are measured along a line perpendicular (or normal) to the regression line. Thus, to use Pearson’s term, the line is fit by minimizing the sum of the squares of the normal deviates.
Why are Model I and Model II regressions different?
In the case of Model I regressions, X is the INDEPENDENT variable and Y is the DEPENDENT variable: X is frequently controlled by the experimenter (or known very precisely) and Y varies in response to the changes in X. One assumes little or no error in X and all regression error is attributed to measurement or other error in Y. The equation tells us how Y varies in response to changes in X.
For Model II regressions, neither X nor Y is an INDEPENDENT variable but both are assumed to be DEPENDENT on some other parameter which is often unknown. Neither are “controlled”, both are measured, and both include some error. We do not seek an equation of how Y varies in response to a change in X, but rather we look for how they both co-vary in time or space in response to some other variable or process. There are several possible Model II regressions. Which one is used depends upon the specifics of the case. See Ricker (1973) or Sokal and Rohlf (1995, pp. 541-549) for a discussion of which may apply. For convenience, I have also compiled some rules of thumb.
Research programs at the Monterey Bay Aquarium Research Institute (MBARI) encompass the entire ocean, from the surface waters to the deep seafloor, and from the coastal zone to the open sea. The need to understand the ocean in all its complexity and variability drives MBARI's research and development efforts.