It is pretty certain by now we live in a data world: institutions, companies, cell-phones… we are all constantly creating and storing data, even if we don’t know it. Things as they are, this’ come to a point where the problem is usually not to obtain the right data but to discard the wrong ones. Besides, apparently “people don’t read anymore“, less to talk about reading data… So the other day, I had another of this horrible moments where you think something kind of cool is going to take 5 minutes to accomplish. The idea seemed pretty simple, so I gave it try. Today, 5 minutes (and 3 more nights) afterwards I’ve finished movieR.
The idea is simple: you have a bunch of data that represent some phenomenon (say, income per capita), for some individuals (say a few countries) and for a few years; and you want to see whether the distribution of the variable has changed over time and, if so, how.
MovieR is a small set of functions written in R that draws a bunch of density kernels (for each period and to smooth the transition) and pastes them into a movie using the free/libre software package dvd-slideshow. It takes a csv looking like this:
"y2000","y2001","y2002","y2003","y2004","y2005" "29392.85","30125.1","31396.76","32353.47","33898.93","34460.48" "30451.79","31075.53","32344.51","33678.8","35310.66","36520.62" ...
And gives out something like this:
This allows you to get a good sense of the behaviour of the dataset and some of its main properties in a quick glance. For example, in the video above, which uses OECD data, the distribution of per capita income across a set of 59 cities all over the world from 2000 to 2007 is displayed; one can easily see the distribution has moved rightwards over the years, implying a move towards higher income per capita for more people over the years, as well as a flattening of the curve, that is an equalizing effect across the cities. And, to please Mr. Jobs, all that without reading a single bit 🙂