Make Your Ecommerce Search Engine More Intelligent

How to Leverage Entity Extraction to Make Your Ecommerce Search Engine More Intelligent

In 2020, US eCommerce revenue was $431.6Bn Us Dollars. That means that Americans buy at least one in every three items online. And the most common way most shoppers try to find what they’re looking for is through the search function on eCommerce stores. The question is, are the search engines intelligent enough to handle the queries. Entity extraction is one of the many search techniques available, but one among very few works for eCommerce Site Search. This blog will uncover how entity extraction makes the eCommerce search engine intelligent enough to take care of human queries and help in easy product discovery.

Are eCommerce search engines doing the best that they can?

However, even the top 50 grossing US eCommerce websites don’t do a great job supporting some common types of search queries, resulting in irrelevant products or zero results.
For example, when you search for “Bluetooth headphones from $100 to $200” on Amazon, the query returns two products, and one of them costs $249.
Other ways even the most popular eCommerce websites fall short: they fail to support product names (that are clearly listed on the product pages), don’t understand spelling mistakes, don’t process synonyms (understanding or nomenclature differences), can’t grasp themes or subjective qualifiers (keywords such as winter dresses, cheap, or in fashion), can’t manipulate symbols and abbreviations (such as feet when the site uses ft), etc.

Types of search queries

Let’s look at some simple search query types at the largest eCommerce companies.

  1. Product Type Searches
  2. Feature Searches
  3. Subjective Searches
  4. Spelling Mistakes or Nomenclature differences.

Product type searches

A query “30 in laptop” on Amazon shows “30-inch” laptop tables in the first two results. As another example, a search for a 16-inch laptop brings up laptop cases.Make Your Ecommerce Search Engine More Intelligent

Feature Searches 

A search query that involves a feature of the product.
While searching for a “cheap red evening gown” on eBay, you would find a black dress in the second position, though the website sorts the listing by best match and has many other red dresses to show.
Make Your Ecommerce Search Engine More Intelligent

Subjective searches

A query that involves a subjective preference of the customer. For example, when you search for a “high-quality sofa” on Best Buy, the top result is an unreviewed sofa, and the second result is a water cooler.

Spelling mistakes or nomenclature differences

When the query has a spelling mistake, most websites show zero results. For example, when you search for “handwash” on, the result page says that there is no such product — though the seller has many hand washes, which it displays only when you search with the correct spelling “hand wash.”
Make Your Ecommerce Search Engine More Intelligent
Even when the shopper mentions a specific feature, product type, preference, or by mistake enters a slightly incorrect spelling, she either gets unrelated results, or the site returns empty-handed with no products.

Why aren’t even the largest websites able to show relevant results?

Here’s one reason: as the number of searches done has increased, so has the myriad ways users search. Shoppers from different geographies have their own requirements, nomenclature, and subjective descriptions that they put in the query, resulting in a plethora of search queries for the same product.
The enormous size of today’s product catalogs also poses a challenge to websites to categorize the products and label them with all possible keywords. Amazon alone sells almost 480 million products in the US.
The combination of vast datasets, large types of search queries, and a large number of products makes the exact matching of a user requirement with a relevant set of products extremely complex.

How to improve query results?

Search engines have to become more intelligent and adaptable to understand the intent behind a user’s query to improve query results. Rather than simply reading the words literally, search engines must associate each query word with intent, thus forming a meaningful phrase from the query to show appropriate results.

Many of today’s search engines leverage machine learning models and natural language processing (NLP) techniques to optimize search results. Among these NLP methods, entity extraction, a key technique that employs context-sensitivity, can improve search results significantly.

What is Entity Extraction?

Entity extraction, or Named-Entity Recognition (NER), scans search queries to identify and classify words or phrases into predefined categories, such as names of people, brands, products, locations, styles, colors, quantities, monetary values, percentages, and many other features. These predefined categories (mostly) represent real-world objects and are described by proper nouns.
Let’s consider the search query “latest black plaid sweater dress.”
Here the product attributes, or features are the latest, black, plaid, and sweater dress. The output of NER may be “latest black_color plaid_pattern sweater_dress_category_type,”; where color, pattern, and category_type are the predefined categories, and black, plaid, and sweater dress are their values, as shown in the diagram below.
Make Your Ecommerce Search Engine More Intelligent
In this process, the NER algorithm extracted the entities’ black, plaid, and sweater dress and put them into their respective categories.
For example, in the query “Calvin Klein shoes,” the NER model may identify “Calvin Klein” as a brand name and “shoes” as a product type.

Similarly, in the query “brown shoe polish,” shoe polish should be extracted as one compound entity, which is the product type here. If the entity isn’t recognized as a compound token, results may contain shoes, nail polish, or anything that matches the individual keyword.

Entity extraction plays a key role in identifying the phrases and avoiding possible irrelevant results for the end shopper.

NER is the initial step in the search algorithm. The entity extraction model finds the significance of words in a search query to understand the users’ intent concerning a specific product catalog while using historical data points. Then further algorithms are applied to this query.

Why should you consider NER?

At least 10-20 percent of all search queries result in zero products. This loss in search queries implies a minimum of 20 percent revenue loss. And as we have seen, apart from zero-result queries, many queries have a low recall. These low recall rates push the customer to leave the website without buying the product.

Given today’s vast amounts of data and user queries, making e-commerce search engines more efficient is an extremely challenging problem. eCommerce businesses must explore entity extraction as a way to increase the relevance of their search results. 

Challenges of training machine learning models for eCommerce

What is entity extraction? Here’s a primer: consider the sentence, “Cindy bought two Levi’s jeans last week.” Using this input, we can highlight the names of entities:

Cindy [person] bought [action] two [quantity] Levi’s [brand] jeans [category] last week [time].

Another example:

take the query “Black leather jacket,” from which color_name, pattern, and category_type are the “named entities” recognized.

We also know that black, leather, and jacket are the values of these entities.

Today, training and building ML models entail massive challenges:
1. Challenges with getting data
2. Generating high quality labeled data
3. Optimizing algorithms and architecture to deploy ML models and deliver business results

Challenges with getting data

Data is the backbone of named-entity recognition or NER. Over several years, Unbxd has aggregated massive amounts of e-commerce clickstream and user behavior data. Our commitment to innovation has started with how we have built the Unbxd data layer: we crawl open source content like Wikipedia, conceptNet, and social media, and we combine it with a special data set built from scanning 100K sites. (This data set is so large that we call it the world catalog!)

Generating high quality labeled data

NER models will require user behavior data and catalog data to generate high-quality labeled data that will train NER-based machine learning models, which have the capability of understanding entities in a search query. This understanding can be combined with existing implementations and provide a richer experience to shoppers.

Obviously, the more relevant data that we provide, the better the algorithms become. So, the first step was to generate high-quality labeled data from historical user behavior data and product catalogs. But collecting, labeling, and maintaining huge amounts of historical data on which eCommerce enterprises can train entity extraction algorithms is one of the biggest challenges.

While we were using many models to run and optimize entity extraction modules in the past, our labeled data set generation part was largely manual. There were arguments on quality and whether algorithms can provide a labeled data set as a human-labeled data set.

In 2018, we introduced intelligent tagging models to generate labeled data for different domains like fashion and accessories, home and living, electronics, and mass merchants. The results have suggested 94% accuracy for the algorithm-driven models compared to 97% for human-labeled models. We have seen minor degradation in quality. However, these models have set us on the path to achieving truly scalable entity extraction models – both from a training and testing perspective.

Further, we see continuous improvement in accuracy for algorithm-driven models and believe that these models will soon overshoot the accuracy of human-driven models.

As of now, for generating labeled data, we combine historical clickstream data, product catalog data at the category level, and a bunch of derived parameters along with site-level configuration data to generate a statistical model which predicts labels for search queries with a certain confidence.

We discard the entities with confidence below a certain threshold and take high confidence tokens and search queries from here.

This labeled data will contain all “named entities” or concepts that you need to extract from the search query and the possible values of these entities, as per historical search queries and product descriptions.

Once we have labeled the data set, the NER-based machine learning models are trained on this data set.

Optimizing algorithms and architecture to deploy ML models and deliver business results

Creating machine models that get a smarter understanding of queries is one thing, but making those models available for delivering more relevant search results is another. 

We built our system to consist of two major modules: one of them is the Entity Extraction API that serves the extracted “entities” from a search query to the front end, while the Model-learning System evolves the learned model using clickstream data, derived parameters, and product catalog.

The two components are

  1. Entity Extraction API: The Entity Extraction API takes the query and client key as input and will return entities for that search query with a certain confidence. Each API host talks to the storage layer to fetch the latest machine learning model as they continuously evolve.
  2. Model-Learning System: This module uses the pre-built labeled datasets and trains them for the general use case. The output model is made available via the API.

Recall and Precision of NER Models

The recall is the ratio of “the number of relevant products retrieved” to “the number of relevant products in the catalog.” Precision is the ratio of “the number of relevant products retrieved” to “the total number of products retrieved.” Essentially, high precision implies that an algorithm returned a higher number of relevant results, while high recall means that the model returned most of the relevant results. For the search query “blue jeans,” if the catalog has 100 relevant products and the search retrieves 160 products out of which 80 are relevant, then recall is 80/100 = 0.8. Precision in this scenario is 80/160 = 0.5. Instead of Recall and Precision, most of the models in the Information Retrieval domain are measured by a harmonic mean of Recall and Precision, denoted by F1 score.

NER Models

Our goal is always to use more data points from the product catalog, search queries, and click-stream data to generate NER tags for the historical queries of a client from a specific domain. We then want to generalize this understanding via a machine-learned model. Given new search queries, the model can make accurate tag predictions for various phrases in the query. Simply put, NER models solve for a sequence-to-sequence labeling problem, where given training data in the form of input sequence [Xa, Xb,…Xn] and its corresponding labels [Y1, Y2, ..Yn], we learn a model for a new input [Xx, Xy,…Xn], the model outputs the predicted labels [Yx, Yy, ..Yn]. A unique challenge in e-commerce is that the search queries are pretty short and less structured than in web documents. (This is a major reason why out-of-the-box NLP solutions are available for web and document search, but not eCommerce.)

NER Models for eCommerce

As noted above and as a general case, we wanted to use clickstream and/or the catalog to generate the training data. We tried multiple approaches to generate the model, and here are some models that we found competent with NER:


This NER model uses a transition learning model coupled with convoluted neural networks (CNN). In this model, the input query passes through multiple states, where a decision is made at each state to generate the label for that state. As it is a neural network-based solution, the model takes a significant time to train, and the performance of the model was lower than other approaches. The python based free and open-source NLP library SpaCy is used for this model. So for a sample query, “Lenovo mouse,” SpaCy would predict “Lenovo” as the “brand” or “company” and “mouse” as the “product.”

Stanford core NLP NER tagger

This Java implementation of NER uses Conditional Random Fields (CRF), and thus it is also known as CRFClassifier. The model maximizes the conditional probability of tagging the entire query as per the training data. Since the CRF models try to maximize conditional tagging, the recall is less unless we have huge datasets.

Stanford MaxEnt POS Tagger 

This model uses a maximum entropy-based tagger and is similar to the CRF. Though this model tags the query more liberally (hence the maximum entropy), causing some biased tagging, it has a high recall. Some other models that we have tested are the Hidden Markov model and SVM-based models. Apart from experimenting with different models, we also assessed the various implementations of the model and chose the one which gives the best result. For example, with the Stanford MaxEnt Tagger, Unbxd chose Apache OpenNLP library over Stanford NLP library due to its better developer-friendly interface. NER model A/B testing results for a few of our customers: Unbxd Advancements in Entity Extraction Initial results have shown a significant conversion uplift of around 4.83% (2.69% to 2.82%) over our current models.


Unbxd has tested various NER models to find the best ones and optimize them for specific domains. We have found that a one-size-fits-all approach does not work. Domains are different, and what works in fashion may not work for electronics, so experimentation is key to find out what works best for our customers.
Unbxd works relentlessly to help e-commerce websites optimize their search engines.

If you have any questions or want to understand how you can leverage entity extraction to deliver more relevant search results, please reach out to me here.

More To Explore