Saturday, March 29, 2014

Nate Silver's new venture FiveThirtyEight

I think I have a sense now for what we're going to see at FiveThirtyEight, based on what we've been seeing since its launch, and that's exemplified by two recent posts, one in the sports section, the other in the economics section.

In the first post, Neil Paine delves into player performance data, prompted by the "Wait...he's still in the league?" question.  Using records for all position players active since 1973, he takes data on  player age and three years' worth of wins-above-replacement (WAR) data to estimate a logit model of survival over 5 year periods.  The equation takes the following form:

S[Y(t+5)] = f(WAR(t), WAR(t-1),WAR(t-2), Age(t)]

He provides us with a list of 20 players who in fact have survived from 2009 into the 2014 season, but who had the lowest probabilities of survival (ranging from 3% to 26%).

In the second post, Ben Casselman looks at the declining labor force participation rate and attempts to determine how much of it is a result of the recent recession, how much of it is a consequence of the slow recovery, and how much of it is likely to be permanent.  He presents a chart showing the projections he gets for the 2008-2014 period (which shows a labor force participation rate declining, but generally above the actual LFPR.  In a follow-up post, he elaborates on his conclusion, which is, essentially, that the LFPR has been declining in a way largely related to business cycle factors, not from longer-term changes in the economy.

These are both interesting question, in one way or another, to one group of people or another.  I have professional research interests in career length in MLB, having done research on player career length, looking at whether ethnicity affected career length and (separately) at whether being a union player representative affected career length.  I've also done some work looking at the dramatic drops in teenage labor force participation and at the similarly striking increases in labor force participation of those age 65 and over.  So I found their posts interesting.

And intensely frustrating.

In both cases, we are given very little information about the data sets (more, interestingly, in the baseball piece than in the labor force participation piece, where we know nothing about the time period or the variables used in the analysis).  Neither Paine nor Casselman presents the actual statistical results, either in their posts or in a separate, linked document.  So we know nothing about the statistical properties of their results (e.g., whether anything is statistically significant, what the explanatory power of their analyses are).  We know nothing about the magnitude of the effects of their explanatory variables.  In Casselman's work, we don't know whether he estimated his regression in a way that let him determine whether (or how well) his results work in an out-of-sample period.  In Paine's work, we don't know whether the number of "misses"--players predicted to be out of the game--is larger than our expectations (subjective or statistical), nor does he tell us anything about players projected to survive, but who didn't.

In short, we're supposed to take the very partial, incomplete results presented and trust them.  In my world (academic research), this approach would never be acceptable.  Showing one's work is the essence of the matter in what we do, and it's the essence of why we expect people to accept out conclusions.

If FiveThirtyEight continues to present analysis in this way, I think I'll stop looking, and, when I do, I'll be thinking of it as "analysis" (scare quotes very much intended.


