Adopting Elasticsearch to drive ecommerce search, recommendations and personalisation

Tuesday, 05 January 2021

Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. It is a JSON document store built upon the Apache Lucene search engine and can be used with the language and platform of your choice.

With Microsoft SQL Server Full Text Search not being actively developed and with the backing of a whole community of developers behind it delivering regular updates, enhancements and support, Elasticsearch is now being adopted as the industry standard for building fast, full-text search functionality. With Elasticsearch ideally suited for use with ecommerce sites it can now be configured as the search provider for tradeit.


Why choose Elasticsearch?


Speed

One of the main advantages of Elasticsearch is speed, particularly when faced with large data sets such as a large number of products in an ecommerce store. Slow response times deliver a poor user experience and cause higher bounce rates but Elasticsearch can achieve fast search responses because it searches an index instead of the text directly. Elasticsearch can return sub-second responses on huge datasets giving it near real time performance.

In our tests the results showed that Elasticsearch was at least twice as fast as tradeit's default SQL search. The actual time spent executing the search will however depend on the number of products to index and the subsequent number of results.


Scalability

Elasticsearch is built to scale. Distributing the processing load across multiple nodes allows Elasticsearch to be easily scaled across servers and balance the load between those nodes in a cluster. Elasticsearch will run fine on any machine but can be scaled across hundreds of servers and contain petabytes of information. Growing the number of clusters is almost entirely automatic and pain free so scaling is easy.


Enhanced Queries & Relevancy

Elasticsearch uses JSON as the serialisation format for documents and is supported by various programming languages. This allows you to construct complex queries and fine tune them to help deliver the relevant results you want from a search. It provides a way of ranking and grouping those result, and provides aggregations which can explore trends and patterns of data.


Flexibility

Elasticsearch can support all commonly-used data types including Text, Numbers, and Dates. It also supports more complex types such as objects, geo data types, nested types, arrays and many others.


Stability & Reliability

Running from a cluster of dedicated servers not only provides scalability to whatever size you need but also means that Elasticsearch is very robust. Should there be any issues with a server, there is an automatic failover, ensuring your search is continually operational in the event of an issue. Data is automatically replicated to prevent any loss in case of server failure.


“We've seen a 25% increase in revenue from search, taking an additional £45,000 in sales within the first month of using Elasticsearch.”
Ecommerce Manager, F.Hinds


Functional enhancements of Elasticsearch


Alongside the high-level benefits of using Elasticsearch, there are a number of functional benefits too. Below is a list of the default functions available with Elasticsearch in tradeit.



Fuzzy Search

Keyword matches now cater for fuzziness on search passes allowing for spelling mistakes or mistyping, so if the user enters the term incorrectly the same results would be returned. For example:

  • “oramge” = “orange” (changing a character)
  • “handdbag” = “handbag” (removing a character)
  • “candel” = “candle” (transposing adjacent characters)
  • “bir” = “bird” (adding a character)
Keyword matches are also extended to include language stemmers so the same set of products would be returned whichever of the related words was included in the search. For example searching for 'swimming', 'swimmer' or 'swimmers' would return the same set of results by using the stem "swim".

Keyword matching will also remove 'stop words' so if words like 'to', 'the', 'i', 'and' etc... were included in the search term those words would be removed so only the other words are matched to drive more accurate results. For example, searching for 'the product' or 'product' would return exactly the same results.



Fuzziness per search pass

Elasticsearch enables the level of fuzziness to be made configurable for each search pass including turning it off, setting it to be automatic (where it determines the number of characters that can be changed depending on the length of the search term - this is the current standard), or specifying the number of characters that can be changed. This can be used to promote more exact matches by configuring a search pass without fuzziness as being weighted higher than a search pass with it applied.



Exact Phrase Matching

By using Elasticsearch, exact matches are boosted to ensure they rank above fuzzy matches. The amount they are boosted is configurable via tradeit.



Partial Matches Within Keywords

Elasticsearch has the ability to match search terms anywhere within keywords, rather than just at the start so a partial keyword search for “berry” would not match “blackberry” for example. The keyword match position can be configured within tradeit's manage search passes screen.



Synonyms

In Elasticsearch you can define and manage a list of synonyms which can be managed via tradeit's admin. As synonyms differ in relevance across industry sectors and product sets, there is no set of default synonyms that is applicable to all. Merchants would need to provide their own list of synonyms relevant to their particular data set but for example a search for 'Laptop' would return results for all 'Laptops' as well as 'Macbooks', 'Notebooks' and 'Netbooks' if the correct synonyms are set up.



Multiple Sort Fields

Elasticsearch allows merchants to configure two-dimensional sorting which enables them to combine options like product rank and in stock, so users will see products listed by rank (which is automatically calculated in tradeit based on up to 6 key metrics) that are in stock first, followed by rank of all those products that aren't in stock.



In Stock Sort

An additional sort option field of ‘In Stock’ has been enabled to allow merchants to display items that are in stock ahead of those that aren't when they apply that sort option.



Boost Product By Metric

Using the Elasticsearch rank feature query, merchants can boost products within listing and search results based on a metric. This is configurable by creating a set of rules (same as component rules and conditions). When any of those rules are matched, the merchant can choose to boost the results by a chosen metric. The metrics that can be used to boost products include Number of Orders, Number of Baskets, Number of Page Views, Number of Page Reviews, Average Review Rating and Sales Value. The metrics will boost the rank of the products based on the rank of the product within that metric, so the higher a product is in each metric then the higher the rank of the product in the search results and product listing. How much that metric rank will boost the item can also be adjusted by applying a weighting.

For example, 'for all customers, boost in stock products', or 'When viewing the monthly offers category, boost products with the highest sales price'.

NOTE: Metrics calculated at runtime would not be able to boost listing and search results.



Configurable & Custom enhancements of Elasticsearch


Alongside the default functions a second phase of features can be implemented with consultation to decide on custom settings, configuration and supporting implementation changes.



Aggregated Multiple Search Passes

Elasticsearch can support multiple search passes in one, so all search passes could be executed in one request to Elasticsearch and it will provide a set of results for each search pass. With this enabled a search will return the results based on all search passes, not just the first search pass which returned results. Along with the weighted search passes this can be used to deliver the user a larger set of search results but with still the more exact matches towards the top of the results.



Weighted Search Passes

When aggregated search passes are enabled, you can weight the importance of each pass to promote more exact matches and promote matches based on certain fields that are deemed more important (i.e. product name). For example a search pass can be configured as a keyword match on product name with a higher weighting than a keyword match on product description meaning any items where the product name matches the search term will be promoted higher in the results than one where the description matches.



Boost Products By Product Group

Any products in boosted product groups will always appear at the top of the search results and product listing when the configured sort option is selected. Ideal for pushing groups like new products, in season products or particular brands. When multiple boosted product groups are configured for a sort option the order of the product groups is used to determine which products will appear first. So where the default configuration is configured with new and in season product groups this will mean "new, in season" products will appear first in search results, then "new, out-of-season" products, then "old, in season" products and finally "old, out-of-season" products.



Suggested Search Terms & Categories

The Suggested Search Terms component can be introduced which includes both suggested search terms and categories based on the search term the user has entered. The suggested terms will be keywords which will match products in the index. If only one suggested category is displayed and the search is submitted the user will be redirected to the category page instead of the search results.



'More Like This' Product Metric

Using Elasticsearch's 'More Like This' query we can power a new metric that automatically lists products similar to the products being viewed, but can be limited to a channel or category. The products are determined by how how much their name and/or description match the given product. In very simplistic terms, lets say for example we have a product that is a hair brush, we may ask for all other products with 'hair brush' in their 'product name' and in their 'description' fields, limited to the 10 closest matches, to be returned.



Speak to us to learn more about how we can configure Elasticsearch as the search provider for tradeit on your ecommerce site.

Share