Like all such events, the lastest AOL disaster began innocently enough:
The goal of this collection is to provide a real query log based on users. It could be used for personalization, query reformulation or other type of search research.
AOL wanted to share its closely guarded, Google-powered search-term data with private researchers. So they did… on AOL’s web site.
Eleven million unique searches by 658,000 AOL users over 90 days. The treasure chest includes {Anonymized UserID, Query, QueryTime, ClickedRank, DestinationDomainUrl}.
Search Engine Optimizers, adwords buyers, pay-per-click spammers and class-action ambulance chasers are delighted.
Markus at Paradigm Shift predicted how the data could effect a single market:
This will probably have a huge effect for the ring tone market. All those big ring tone affiliates doing $200k+ a month are about to have some huge competition. At $12.00/signup bidding on many of these keywords gives you a ROI of 8:1 even at 20 cents a click.
He then crunched the data to charge that the widely reported success of MySpace is more a product of search engine optimizers than real people:
People are hitting profiles.myspace.com, music.myspace.com because they infest/mass spammed the search engines. No other social network even shows up in the data in any meaningful way. In 3 months 19321/365422 or 5.3% users landed on myspace because google indexed them so well. We can conclude that 5.3% of all users in the united states using Google.com are clicking through to myspace every 3 months.
Other folks are crunching the data for more nefarious purposes. Michael Arrington at TechCrunch thundered:
The utter stupidity of this is staggering. AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the abilitiy to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.
For some reason, Markus also found hundreds of searches from AOLers looking to kill themselves or commit murder.
Those, of course, would be the folks who tried to cancel AOL subscriptions… like my mother-in-law. Never, ever again.
For you worry-warts who think this sensitive data might fall into the wrong hands… You must think AOL researchers are idiots. There is ABSOLUTELY NOTHING TO WORRY ABOUT. AOL restricted the data right there on the web site:
This collection is distributed for non-commercial research use only. Any application of this collection for commercial purposes is STRICTLY PROHIBITED.
AOL researchers were so proud of their work, they asked that we specifically reference their effort. And so we have: G. Pass, A. Chowdhury, C. Torgeson, “A Picture of Search”, The First International Conference on Scalable Information Systems, Hong Kong, June, 2006.
Unfortunately, the dataset has been deleted. To follow-up with Pass, Chowdhury or Torgeson, check the Alexandria, Virginia Yellow Pages under “Restaurants, Fast Food.”

2 comments
Comments feed for this article
August 7th, 2006 at 4:08 pm
Thomas
is anyone still actually using AOL?
August 7th, 2006 at 4:31 pm
BJ
I’m ashamed to say I have a brother-in-law…