Why synonyms are important?
The central task of information retrieval (IR) in eCommerce is to find products that satisfy the users’ needs. Words in a concise description of products might not cover synonyms in all dialects in the language. User typing the query might not be aware of the domain-specific terminology and so the words in the query might not match the words mentioned in the product description. Chances of this are really high which would eventually mean that shoppers will not be able to find products that are actually there in the catalog but didn’t show up because of poor synonym matching capabilities in your search system.
Why domain-specific synonyms?
Say, Bob wants to gift Alice a glitter dress for Christmas. He would visit an eCommerce site and type the query “glitter dress”. On the other hand, if all the glitter dresses contain a description as sequin (formal fashion word for glitter) dress, eCommerce site wouldn’t be able to fetch those results due to *no match* and Bob leaves the site unsatisfied assuming that glitter dress (which he wants to gift) is not available. It would be a loss for both the eCommerce site for losing a valued customer and of course Bob for not being able to gift Alice.
This is a typical case of terminology mismatch between product description and the customer’s query, and this scenario happens a lot. In another scenario, languages like English have a lot of dialects. Words like “mobile phone” and “cell phone” mean the same but are in different dialects (British: Mobile phone, American: Cell phone). The product description might not cover all of these cases.
Now, consider that a word like apple changes its meaning based on the domain. Apple in the technology domain refers to a technology company while in the food industry it’s a healthy fruit. Cases like these elevate the importance of domain orientation of the synonyms generated.
To overcome such scenarios, we follow a process called Query Expansion (QE). Query expansion (QE) is a process in Information Retrieval that consists of selecting and adding terms to the user’s query with the goal of minimizing query-document mismatch and thereby improving retrieval performance. (- /Quoted From wiki/). For QE, we need to identify the domain-specific synonyms.
How do we generate synonyms?
At Unbxd, we have broadly classified the synonym generation process into 3 categories:
This is quite straightforward. Here a skilled linguistic individual or community would manually contribute to the existing domain-specific synonyms. Quality would be quite high with this curation but major cons would be that this process is resource and time-intensive
Language-specific synonyms from linguistic knowledge
There is vast public knowledge available for languages like English. For synonyms, we could leverage freely available lexical databases such as Wordnet, Conceptnet. The main issue with this approach is that the synonyms available would be generic (i.e not domain-specific). Here at Unbxd, we filter domain-specific synonyms using some smart filtering algorithms based on clickstream data. One major con is that, with the evolution of new internet slang each day, it would be difficult for these lexical databases to catch up and stay up-to-date
Mining synonyms from ClickStream data
Considering the high volume clickstream data available, mining synonyms from
this data turns out to be quite cheap and of high quality. If Bob is familiar with fashion terminology, when he doesn’t find glitter dress he would reformulate the query to *sequin* and retry. We leverage those scenarios to mine the collective intelligence from users’ reformulated searches and generate high-quality synonyms. This approach is heavily based on the query chain analysis.
We are able to successfully test and generate high-quality reliable synonyms using the discussed approaches on various domains like auto parts, technology, jewelry, and fashion. For Examples: In Auto parts domain; o2 compressor, oxygen compressor and in an online fashion store; sequin, glitter, etc.
In this way, we ensure that any online shopper doesn’t leave the eCommerce site unhappy and unsatisfied and finds what s/he came looking for. eCommerce sites can’t leave money on the table just because their search couldn’t understand what the shopper meant by the search query.
If you have been a victim of poor synonym matching capabilities in your search solution, reach out to firstname.lastname@example.org and we will be happy to walk you through in detail how Unbxd technology simplifies the query analysis for your eCommerce site.