20 May 2007

artificial intelligence

Tons of technologies around us use some form of artificial intelligence. Science fiction media usually gives us the impression that AI is just a robot that can walk and talk like a human, but really AI is the practice of getting a machine to do something humans currently do, like make decisions or classify objects and rank their relevance. Mathematics, computer science, statistics and signal processing all play roles in the field of artificial intelligence-- we give it many names like statistical learning, machine learning, estimation, classification....etc.

Let me give you some examples... Spam filters try to classify email as spam. Netflix or Amazon (and others) try to suggest new products based on what you and others like. Alarm systems decide when a home has been broken into and notify the police. GPS boxes for your car give you directions and then adjust to your own choices or mistakes and give you a new route to follow. Google tries to find the best website match for your search terms. Translators try to find the best match from a set of words in one language to a set of words in another.

For a long time, certain learning algorithms have focused on improving algorithm performance with a limited amount of "training data"-- data you have ahead of time that you already know how it should get classified, for example. So if Netflix has some data where you told them what movies you were actually interested in, then this is training data. You can use that information to teach your algorithm your preferences. Or, you can use the training data as "testing data", to see if the algorithm predicts a movie that you actually do like.

Now, however, google is showing us that the real way to go is not to improve the algorithm carefully--but instead to give the algorithm a ridiculous amount of training data. As you increase the amount of training data, all of the algorithms can just do vastly better-- way better than any new-and-improved AI algorithm does on a small set of training data. So for example, google is working on a translation service, and they are looking for every multilingual journalistic publication out there. Wherever they can find the same stories in two languages, google algorithms can try to learn how to translate between those two languages by learning. You might wonder, what if some of the translations are wrong? Well if there are enough data, then those incorrect translations will get lost in the heap, and the algorithm will still do well.

In my field of sensor networks, we are collecting data and hope to create technologies that can make all kinds of decisions for us-- hopefully, better decisions than we could even make ourselves, because they incorporate both human knowledge and a vast resource of collected data. For example, in the santa monica mountains nature preserve, rangers are collecting a lot of data using sensors. Because of it, they are better able to decide on properties to buy and add to the reserve for the best plant and animal preservation, building codes for developing property nearby, and developer requests.

(I don't know where I am going with this. But here you go.)

No comments: