Friday, February 09, 2007

Search by "Intent"... not "Content"

I had been busy with some developments in my search product, but nothing interesting enough. My team has just started putting into shape one of our long pending objectives. Hope to release it by end of feb.
Now, after seeing some action happening towards this direction, am off to visualize and analyze my next aim. That is "Search by intent" . I have always felt that internet search which looks good today is not even 10% of what it can be. There is a lot of intelligence that can be introduced. I am referring to behavioural intelligence here. Now what's that ? Hmm.. it is my term ;-) Well, it is the intelligence derived from user behaviour. Nice and patient analysis of your search dump will help you develop this intelligence better.
Coming back to search... if we can know the intention of the user, we can provide much better results. If we crack this problem and are able to covert CONTENT into INTENT, we would have solved half the problem. Many of the big search engines have started researching in this line.
Yahoo has already launched the beta version of their search engine "Yahoo ! Mindset". It provides a cool slider to dynamically rank results. It shows results basis its informational or commercial aspect.
I happened to stumble upon a blog which writes about Matt cutts (google) hinting that google is doing scientific research in this field. I won't take it as a joke. We can soon see some intent oriented search happening in google.
Users intention can be known and taken care off by,
  • Search Dump - A proper analysis of search dump will throw light on the "intent" behind every search. This study has to be w.r.t the results clicked upon by the user from the search result set.
  • Personalization - Proper tracking of user activities can open a whole new world of knowledge for search engines. If I visit a search engine every week and click on a set of links, the search engine should damn well know what i am looking for every time. And if you have the user profile with you, then what else can you ask for.
  • NLPs (Natural Language Processing) - Natural language search is what every user is comfortable doing. The engine should be able to derive the intent of the user from the text provided in the search box.
  • Phrase Searches - The emphasis should be more on phrase searches. The more the user writes in the search box, more clear is his/her intent. Ofcourse, the search engine should be geared up to handle the shrink in result set due to large phrase search. Here, the result boosting algorithms come into play.
  • Wisdom of crowd - Now, this one is a much talked about concept out of the web 2.0 books. Entire trail of all user activities should be logged. We can know a completely new user's intent by comparing his/her search with searches made by other users, and establishing a pattern. If most of the users search for "apache" and click on pages dedicated for apache web server, we can assume that the new user is also interested in apache web server and not in apache tribes. Here, the results pointing to the former can be rated higher in the search. Ofcourse you have to include all results though.
I had read somewhere about a quote from Peter Norvig, director of research for Google talking about how Google returns results to search query.
"We want to do a better job of understanding the user's intent and the content provider's intentions,"
and
"We mostly rely on matching keywords, but we'd like to get closer to matching the intent."
The same sentiments are echoed by Adam Sohn from microsoft,
"If someone is searching for 'Jaguar, the smarts to distinguish between 'he's looking for a car' and 'a big cat in the jungle' - that's coming."
Here, the "intent" of the big players in the search business is clear.

A major chunk of work in a search engine development goes into large set of data collection and analysis. Better log analyzers and information system should be in place. A thorough research on this information set is carried and the outputs serve as inputs for algorithms. A continuous analysis of user behaviour is important to monitor pattern deviations. Studies have revealed that phrase searches have become more common as compared to searches done 6-8 years ago. No doubt therefore that n-grams are hot again.

I have got to do a lot now. I am yet to have a good information system. There is lot of tracking which still needs to be done. More i think of it, more i realize that the entire data management, right from tracking is a different area. I, as a techie won't be able to do much justice. I'll be more interested in the output of these studies to convert them into complex algorithms. Hmm... till we have a separate team for these studies, tech has to manage it. Can't complain though... it is giving me much needed exposure.