Information extraction has graduated from Data Retrieval where the data is simply extracted from a database, based on user query, to Text Retrieval which forms the fundamental basis of modern day web search in which results are returned in form of text from a heterogeneous database to finally Information Retrieval (IR) in which, instead of blindly returning data from the database, an IR application returns meaningful information much more relevant to the user’s query based on IR algorithms as search engines like Google do. Today we are in the era of Knowledge Retrieval. Knowledge retrieval means not only extracting the information but to make sense of that information in a way that matches the human cognitive process. A good example of such a system is Wolfram Alpha. Knowledge retrieval becomes all the more important when it comes to mining a humongous amounts of data such as the worldwide web. Gone are the days where a simple text based search would be sufficient to return relevant results for a user query. Time is of essence for the modern user, he neither has the time nor the will to carefully iterate over the returned result set and choose the best. So it becomes the responsibility of the search engine itself to detect the context in which the user is searching. For example, if a user fires a query:
“harry potter walk-through”
Now a simple text-based search will simply return results based on keywords like “harry potter”. Text-based search means results will contain links to harry potter movies and harry potter books. Such products will be given a higher rank because of the more number of pages containing harry potter as books or movies. Here lies problem, a user searching for “harry potter walk-through” is searching for the walk-through of the computer game Harry Potter. If the search engine was intelligent enough to figure out the context of the user’s query that the presence of the keyword “walk through” implies video game and not movies or books, it would have resulted in more relevant results for the user and a much enhanced user experience.
This kind of intelligence becomes all the more important for the search engine when it comes to e-commerce where the results being shown for a user query directly determines whether the user will buy something from the site or not, thereby impacting revenue for the e-commerce site. Moreover, in the present e-commerce market conditions, where the margins are too low and consumer has lot many options to chose from, a consumer if doesn’t find something relevant in one search on a site, rather than searching for one more time after refining his search query, he will move on to some other site. This implies the consequences of showing irrelevant search results to your user are grave if not fatal. So in this scenario, where we can’t allow visitors to move on to some other site just because he can’t find something relevant, Natural Language Processing (NLP) comes to our rescue.
At Unbxd, NLP helps us make sense of the user’s query and not just treat it as a mere text string but as an indicator of the user’s intent. Knowledge retrieval can be made possible with tools of Natural Language Processing like Named Entity Recognition, Automatic Summarization, Cross-Language Retrieval, etc. In order to build a state-of-the-art information system, one must extract as much meaning as possible from each keyword. Context and meaning must be preserved. A good natural-language-based infrastructure provides the foundation for this system, because it parses sentences thoroughly, extracts meaning from context, and is smart to differentiate between every part of the query. To put in a nutshell, such a system falls under L2L (Language to Logic) category for parsing & processing queries and arriving at a result-set. To conclude, Search Engine powered with NLP seeks to return information in a structured form, consistent with human cognitive processes as opposed to simple lists of data items. That’s Wisdom Computing!
This post gave you a brief insight into why NLP techniques should be used to create state-of-the-art information systems. In my next post I will be talking about some of the NLP techniques like Named Entity Recognition that we are using at Unbxd to increase customer happiness while using e-commerce platforms, so stay tuned.