It sucks to visualize public data

Screen_shot_2010-02-25_at_1

A while back, I made a graph overlaying the senate party majority with the national debt.  While correlation isn’t sufficient for causation, I did learn a lot about the shape of the national debt as well as assertions between political party and national debt is not as clear cut as political commenters and pundits would like you to believe.  

However, the entire experience of producing the graphs in those blog post left me dumbfounded with how tedious it was just to see the shape of the data.  

First, finding the data was difficult.  Many of the census bureau and government websites have confusingly bad user interfaces.  It took me a while to figure out how to find the number of single females by year and state.  Go ahead, try it.  

Once you’ve found the data, you need shape it into what you need for the visualization, which is difficult because data on the web is in a sad state of affairs.  Images, video, and text on the web are limited to a few popular formats.  Not only that, the actual format is abstracted out with an image tag–and soon to be video tag–by the browser or flash.  No widespread consensus exists for raw data, with the exception of a limited number of domains dictated by microformats, KML, and its ilk.  Many publish data through html tables or CSV files, which comes in surprising variety of hard-parsable formats.
Then, it was tedious simply finding a free visualization tool that produce moderately good looking graphs.  There are a plethora of javascript graphing libraries, not to mention flash libraries like amcharts and flare, each with their own strengths, weaknesses, aesthetics, and ramp-up time.  It took a while to pick the one that was suitable.
The whole thing was harder than it should have been, and if I couldn’t convey to you how mad I am, just insert expletives above, whenever possible.
As a programmer, I can figure out how to find and graph the data, however painfully.  But it’s completely inaccessible to regular everyday person that uses the internet.  There’s a lot of pubic data out there, but people aren’t able to access it easily.  If it doesn’t show up on google’s search, that’s where they stop.
It’s not that people aren’t interested that data, but it’s because the data is completely inaccessible.  If blog and news articles about unemployment, STD rates, and gas prices are any indication, people want to know this kind of information.  
Even more so, people want to be able to explore different aspects of this data like the New York Times’ interactive visualizations by cutting and slicing the data as they see fit, because often times a statistic only makes sense when you can answer the question, “compared to what and how?”  And people want to be able to explore it visually in a way that gives insight.  

This is why I’m working on Graphbug–making public data more easily accessible through visualizations.  Some questions are best answered visually.  

We want to make it easy to find data that you need, and make it as fluid as possible to move between different datasets to compare them, in the same way that google maps made it fluid to navigate a map.  Then if you need to do more hardcore analysis, you can download it and play with it however you like.  
Of course, the topic is too broad at the moment.  We’d like to focus on particular datasets first, and though we picked the US Census to start, we’d like to hear from you what sorts of datasets you’d be interested in?  iPhone market share?  Number of earthquakes in each state by year?  The male to female ratio at colleges over the years?
So visit graphbug.com, and let us know and give me feedback or comment on whether you felt the same pain or you have a need for this.  And if so, what are you currently using to solve this problem?

Advertisements

2 thoughts on “It sucks to visualize public data

  1. Your vision for graphbug sounds similar to Wolfram|Alpha (disclaimer: I have no connection to WA; have only played around a bit with it). In particular, they talk about the need to “curate” data to make it programmatically useful; and of course they use the elaborate graphing capabilities of Mathematica.

  2. It is similar in the sense that you’re looking for data. However, once you get that graph, there’s no way to easily compare it to other datasets you have in mind. There’s no way to download it and slice it in different ways.They’re right that numeric and tabulated data is in a sad state of affairs on the web, and hence their solution of curating it. But perhaps linked data is changing that.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s