Why do linear models work?

People often do statistics using linear models, which assume that the thing you’re trying to predict (Y) is a linear function of some set of covariates (X). That’s an incredibly restrictive assumption; almost nothing is linear! It’s confusing, then, why linear models seem to work reasonably well in practice a lot of the time. Why is that?

The answer is, basically, Taylor’s Theorem. Even if the true relation Y = f(X) is not linear, f is reasonably well-approximated by a linear function around any point of interest, as long as you don’t get too far away from that point. And it turns out that for “many” things statisticians care about, the range of your data isn’t that large, so the Taylor approximation of linearity works reasonably well.

More precisely, if you include polynomial terms up to order k in your linear model, then the approximation error is some function of the magnitude of its (k+1)-th derivative. And it’s common in statistics to find that, even if the functions you’re trying to model are nonlinear, their higher-order derivatives decay to zero pretty quickly. You can think of this as encoding an assumption that the function must be “simple.”

(Of course, the range of data is quite frequently large enough that you miss important things by assuming linearity. So the Taylor expansion explanation is more like “why linear models aren’t hopelessly inadequate” than “why linear models work really well.”)

Comments

Ben Kuhn

David Speyer comments by email:

Higher order Taylor series are still linear models! If I have inputs (x_1, y_1), (x_2, y_2), …, (x_N, y_N), and I am fitting a model y=ax^2+bx+c, this is still linear in (a,b,c), and I still find (a,b,c) by doing a best squares fit, which is a linear algebra technique. To get to a nonlinear situation, I would need to the set of parameters I am choosing from to be something nonlinear, eg, fitting y=a x^2+bx +c subject to an auxiliary condition like b^2-4ac=0. (Actually, there is a clever trick that will allow you to transform this one to a linear problem too!)

Related

Startup options are much better than they look

Why squared error?

Is treating a cold with zinc still evidence-backed?

Comments