You are here

How to Leverage Entity Extraction to Make Your Ecommerce Search Engine More Intelligent

Online Shopping and Global ecommerce

In 2017, US ecommerce revenue was $447B , or 36% of total retail revenue. That means that Americans buy at least one in every three items online. And the most common way most shoppers try to find what they’re looking for, not surprisingly, is search.

Are ecommerce search engines doing the best that they can?

However, even the top 50 grossing US ecommerce websites don’t do a great job supporting some common types of search queries, resulting in irrelevant products or zero results.

For example, when you search for “Bluetooth headphones from $100 to $200” on Amazon, the query returns two products, and one of them costs $249.

Other ways even the most popular ecommerce websites fall short: they fail to support product names (that are clearly listed on the product pages), don’t understand spelling mistakes, don’t process synonyms (understanding or nomenclature differences), can’t grasp themes or subjective qualifiers (keywords such as winter dresses, cheap, or in fashion), can’t manipulate symbols and abbreviations (such as feet when the site uses ft), etc.

Let’s look at some simple search query types at the largest ecommerce companies.

Product type searches — A query “30 in laptop” on Amazon shows “30 inch” laptop tables in the first two results. As another example, a search for a 16 inch laptop brings up laptop cases.

Feature Searches — A search query that involves a feature of the product.

While searching for a “cheap red evening gown” on eBay, you would find a black dress in the second position, though the website sorts the listing by best match and has many other red dresses to show.


Subjective searches — A query that involves a subjective preference of the customer. For examples when you search for a “high-quality sofa” on Best Buy, the top result is an unreviewed sofa, and the second result is a water cooler.
Spelling mistakes or nomenclature differences – When the query has a spelling mistake, most websites show zero results. For example, when you search for “handwash” on BathAndBodyWorks.com, the result page says that there is no such product — though the seller has many hand washes which it displays only when you search with the correct spelling “hand wash.”

Even when the shopper mentions a specific feature, product type, preference, or by mistake enters a slightly incorrect spelling, she either gets unrelated results, or the site returns empty-handed with no products.

Why aren’t even the largest websites able to show relevant results?

Here’s one reason: as the number of searches done has increased, so has the myriad ways in which users search. Shoppers from different geographies have their own requirements, nomenclature, and subjective descriptions, that they put in the query — thus resulting in a plethora of search queries for the same product.

The enormous size of today’s product catalogs also poses a challenge to websites to categorize the products and label them with all possible keywords. Amazon alone sells almost 480 million products in the US.

The combination of vast datasets, large types of search queries and a large number of products makes the exact matching of a user requirement with a relevant set of products extremely complex.

How to improve query results:

To improve query results, search engines have to become more intelligent and adaptable to understand the intent behind a user’s query. Rather than simply reading the words literally, search engines must associate each query word with an intent, thus forming a meaningful phrase from the query to show appropriate results.

Many of today’s search engines leverage machine learning models and natural language processing (NLP) techniques to optimize search results. Among these NLP methods, entity extraction, a key technique that employs context sensitivity, can improve search results significantly.

What is Entity Extraction?

Entity extraction, or Named-Entity Recognition (NER), scans search queries to identify and classify words or phrases into predefined categories, such as names of people, brands, products, locations, styles, colors, quantities, monetary values, percentages, and many other features. These predefined categories (mostly) represent real-world objects and are described by proper nouns.

Let’s consider the search query “latest black plaid sweater dress.”

Here the product attributes or features are latest, black, plaid, and sweater dress. The output of NER may be “latest black_color plaid_pattern sweater_dress_category_type”; where color, pattern, and category_type are the predefined categories, and black, plaid, and sweater dress are their values, as shown in the diagram below.

In this process, NER algorithm extracted the entities black, plaid, and sweater dress and put them into their respective categories.
For example, in the query “Calvin Klein shoes,” the NER model may identify “Calvin Klein” as a brand name and “shoes” as a product type.

Similarly, in the query “brown shoe polish,” shoe polish should be extracted as one compound entity which is the product type here. If the entity isn’t recognized as a compound token, then results may contain shoes, nail polish, or anything that matches with the individual keyword. Entity extraction plays a key role in identifying the phrases and avoiding possible irrelevant results for the end shopper.

NER is the initial step in the search algorithm. The entity extraction model finds the significance of words in a search query to understand the users intent, with respect to a specific product catalog, while using historical data points. Then further algorithms are applied to this query.
Why should you consider NER:

At least 10-20 percent of all search queries result in zero products. These lost in search queries implies a minimum 20 percent revenue loss. And as we have seen, apart from zero result queries, a lot of queries have low recall. These low recall rates push the customer to leave the website without buying the product.

This blog was to written to introduce entity extraction to a business audience. In part two, you can read about the innovations Unbxd brought to this area in 2018.

 

Related posts