13 November 2015

Chess game with the Devil

I had been meaning to read this article about Terry Tao for the last several months and I finally got around to it on a cozy Friday night at home. I really like it for the way it describes him as such a genial and friendly guy, and the story it tells about even Terrence Tao being intimidated when he arrived at Princeton. My favorite quote though is this one about what it's like to be a mathematician:

"The true work of the mathematician is not experienced until the later parts of graduate school, when the student is challenged to create knowledge in the form of a novel proof. It is common to fill page after page with an attempt, the seasons turning, only to arrive precisely where you began, empty-handed — or to realize that a subtle flaw of logic doomed the whole enterprise from its outset. The steady state of mathematical research is to be completely stuck. It is a process that Charles Fefferman of Princeton, himself a onetime math prodigy turned Fields medalist, likens to ‘playing chess with the devil.’ The rules of the devil’s game are special, though: The devil is vastly superior at chess, but, Fefferman explained, you may take back as many moves as you like, and the devil may not. You play a first game, and, of course, ‘he crushes you.’ So you take back moves and try something different, and he crushes you again, ‘in much the same way.’ If you are sufficiently wily, you will eventually discover a move that forces the devil to shift strategy; you still lose, but — aha! — you have your first clue."

20 October 2015

Einstein memorial

Last weekend I was in DC and I visited the Einstein memorial along with all the other national memorials on or near the national mall. There are so many great quotes memorialized on these walls, but this is one of my favorites:

"The right to search for truth implies also a duty: one must not conceal any part of what one has recognized to be true.”

23 September 2015

Probability in your profession

What does probabilistic terminology really mean when people use it in your field?

Here is my favorite (image from the linked site above, used without permission, but for educational purposes obviously):

In mine we say "with probability 1-delta" and that's exactly the probability we mean. Except don't calculate delta.

11 August 2015


I recently learned that when you win the Smale Prize you get a gömböc!

My favorite part of the gömböc is its relationship to a turtles' righting response: "The balancing properties of the gömböc are associated with the 'righting response', their ability to turn back when placed upside down, of shelled animals such as tortoises and beetles."

24 July 2015

the entry in which I admit I am reading the phantom tollbooth

"That's absurd," objected Milo, whose head was spinning from all the numbers and questions.

"That may be true," [the Dodecahedron] acknowledged, "but it's completely accurate, and as long as the answer is right, who cares if the question is wrong?"

20 April 2015

everyone gather round for a physics joke

Thanks to my friend Jim Hall and this reddit.

Heisenberg, Schrödinger, and Ohm are together in a car, driving down the road.

They get pulled over. Heisenberg is driving and the cop asks him "Do you know how fast you were going?" "No, but I know exactly where I am" Heisenberg replies. The cop says "You were doing 55 in a 35." Heisenberg throws up his hands and shouts "Great! Now I'm lost!"

The cop thinks this is suspicious and orders him to pop open the trunk. He checks it out and says "Do you know you have a dead cat back here?" "We do now, asshole!" shouts Schrödinger.

The cop moves to arrest them. Ohm resists.

24 March 2015

big data is very often bad data

In my research I study issues with big data that make them messy. Missing data, corrupted data, uncalibrated sensors, biased human participants, inaccurate information on where or when the measurement was taken, measuring X when you really wanted to know Y... etc. A lot of big data proponents act like it's easy peasy to milk all the information possible out of these datasets. It's not!

What a perfect time to learn this lesson but during March Madness, via a winning prediction algorithm that does so well because it figured out what were the best data to use. This is called "variable selection" in statistics, and we try to do it in an automated way, but often (as was the case here) it's really domain expertise that allows one to figuring out which data are the best for inference.

One of my favorite lists of problems with big data can be found here-- including the fact that "although big data is very good at detecting correlations, ... it never tells us which correlations are meaningful." Here is another nice article from last summer on the limitations of big data -- through ok cupid and facebook user experiments. And of course my earlier blog post that gives props to IEEE for discussing the same.

Every statistical inference procedure -- from simply calculating p-values, to predicting class labels with SVM, to estimating system dynamics with filters -- has assumptions that may or may not hold in practice. Understanding the implications of that is crucial to figuring out how to use big data.