24 March 2013

Put it to The Test

I gave my brother's kids the They Might Be Giants album 'Here Comes Science' for Christmas. He told me today that the kids love it, and he highly recommends it for folks with young children. I'm sure lots of people know the old song "Why does the sun shine?"-- but did you know that they came out a new one, "Why does the sun really shine?" First the sun was a mass of incandescent gas, and now it's a miasma of incandescent plasma... The Johns really keep up with their science.

I was listening to a few more of the songs on this album and I especially enjoyed this one: Put it to the Test. Who could argue with these lyrics?

"If there's a question bothering your brain that you think you know how to explain-- You need a test! Yea, think up a test. If somebody says they figured it out, and they're leaving any room for doubt, come up with a test! yeah! You need a test! Are you sure that thing is true? Or did someone just tell it to you? Come up with a test! Test it out!"

05 March 2013

Context for Big Data

As a part of the Internet Multi-Resolution Analysis (MRA) long program at IPAM a few years ago, I became very aware of the need for context when analyzing big data. Our most infamous example was the claim made by some that the internet was extremely vulnerable to targeted attacks. The reasoning in that paper suffered two faults, one of logic but another of lack of context. The actual internet topology is very different from the one suggested in the paper, as was discussed in this article by some of the organizers of the MRA program.

A few weeks back, this blog entry from the new york times also stressed the need for context. It tells a story of collecting data using sensors on elevator and stair usage, where after a few days of collection the conclusion was that students use the stairs more at night. That seemed an interesting story until a security guard gave them some needed context: that the elevators had been breaking at night. So of course people were taking the stairs!

Missing context and missing data can be as (if not more) important as confounding factors in data collection. As we see more and more data collected and analyzed for various decision-making purposes from government to corporations to industry, and in both the private and public domains, I believe that the need to understand potential pitfalls of missing data and uncertainty will be central to actually getting good use out of that data.