Friday, May 15, 2009

The End of My Stat Course

I had my statistics final on Monday afternoon. Overall, I felt pretty confident about my performance, though I stumbled momentarily over a problem about unbiased estimators (but figured out eventually that the expectancy I needed to calculate was actually very easy) and choked on defining some distributions (e.g. I know that the ratio of two Chi-square variables is distributed F, but how does the degrees of freedom work in this case? How is T defined?). I think I did a better job with finding the maximum likelihood estimator from a given pdf than I usually do. The problems that involved testing hypotheses and calculating test statistics were very straightforward (though it's easy to screw up the arithmetic involved). All of our grades in the course have been posted and I am getting an A (or what would be an A+).

Although I had been ready for the past week or so for the class to be over, that was mostly due to a desire to get something checked off my to-do list rather than being done with the material (as I was with differential equations). This is a good thing since I will be continuing to take stat courses seemingly forever.

I've appreciated the opportunity to take a fairly rigorous two-semester sequence in probability and statistics that was pretty serious about the mathematical basis of what we are doing. A lot of courses, even those that are calculus-based, do not get into enough of this mathematical background, and the formulas can appear to arise almost magically from nowhere. I can see, though, why many professors choose to skip over so much of this mathematical detail. One thing that is a bit surprising (I think) is how much math is necessary even to do the simplest things in stat. (For example, my first homework assignment in this class looked like a calculus assignment with the amount of integration involved.)

And the development of the rationale / motivation is time-consuming, so there is a trade-off between depth and breadth that leads many people to elect for coverage of a greater number of topics. Given that perhaps the majority of people taking undergraduate statistics (hell, even graduate statistics in many fields) will never need to derive anything themselves, but only recognize what statistical test to apply to a given situation, it's tempting to skimp on this development. My professor clearly skimped on some things, but it seems like he did so less than many. The marketing professors I am working for have told me that I will be really happy to have this preparation when I take the stat sequence in my PhD course.

It's my working belief / feeling that a lot of people who like math think statistics is "boring" because they have never seen enough of the mathematical basis to actually understand anything (and I include those who have taken a stat course in college), but that if they could see where all the formulas are coming from, they would think it's more interesting. I recognize that this belief is probably a great example of naive realism (the idea that others would share my views if only they were exposed to the same things I am and were "rational" and "objective" in their evaluation). Robert, who has taken more statistics courses in his life than just about anyone short of having a PhD in statistics has, does not ascribe to this view. Certainly my fellow classmates did not seem to engage in the excitement of discovery...or they were extraordinarily masterful at hiding their enthusiasm. However, I have found that the more I actually understand the subject, the more I like it. (At this point, of course, I understand very little.)

For instance, anyone who has taken a statistics course knows about the Central Limit Theorem and has taken advantage of it in doing statistical testing. I certainly have been exposed to it several times and came into this two-class stat sequence having a rough idea of what it's getting at. I knew it was a useful result. I had seen simulations demonstrating with sample data that it appears to work. But it didn't really impress me until we did a proof that I was able to follow on a step-by-step basis how completely amazing it is that as the sample size increases, the mean of a set of independent variables from any distribution (with finite mean and variance) will itself be distributed approximately normal. (It helped that we had worked with so many different distributions with such different properties; this made the convergence of data from so many disparate distributions to a single distribution very surprising.)

I like this quote from Francis Galton about how awesome this is (1889):

"I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the 'Law of Frequency of Error' [i.e. the Central Limit Theorem]. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement amidst the wildest confusion. The huger the mob and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of unreason."

While my appreciation for the Central Limit Theorem definitely falls well short of religious devotion, and we could argue that it would be an even nicer result if the normal distribution weren't such a pain in the ass to deal with, I do have to agree that this result "impresses my imagination."

1 comment:

Tam said...

I think your naively idealistic view is correct, at least as applied to myself. I love probability but think statistics is boring. But it seems obvious to me that if I were to understand the derivations of things in statistics statistics, it would turn out to be probability, and thus interesting. QED.

I am just not interested in math that involves "formulas that arise from nowhere," which is how the statistics I've had so far (not that much) has been taught.

I am somewhat interested in things like what kind of tests apply in what circumstances and ways that statistics can go wrong or be misunderstood, but not enough to study it personally for those reasons.

But if I could come at statistics from a purely mathematical perspective I'm sure I would love it.