Adopting Elasticsearch to drive ecommerce search, recommendations and personalisation

Friday, 25 February 2022

Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. It is a JSON document store built upon the Apache Lucene search engine and can be used with the language and platform of your choice.

With Microsoft SQL Server Full Text Search not being actively developed and with the backing of a whole community of developers behind it delivering regular updates, enhancements and support, Elasticsearch is now being adopted as the industry standard for building fast, full-text search functionality. With Elasticsearch ideally suited for use with ecommerce sites it can now be configured as the search provider for tradeit.


Why choose Elasticsearch?


Speed

One of the main advantages of Elasticsearch is speed, particularly when faced with large data sets such as a large number of products in an ecommerce store. Slow response times deliver a poor user experience and cause higher bounce rates but Elasticsearch can achieve fast search responses because it searches an index instead of the text directly. Elasticsearch can return sub-second responses on huge datasets giving it near real time performance.

In our tests the results showed that Elasticsearch was at least twice as fast as tradeit's default SQL search. The actual time spent executing the search will however depend on the number of products to index and the subsequent number of results.


Scalability

Elasticsearch is built to scale. Distributing the processing load across multiple nodes allows Elasticsearch to be easily scaled across servers and balance the load between those nodes in a cluster. Elasticsearch will run fine on any machine but can be scaled across hundreds of servers and contain petabytes of information. Growing the number of clusters is almost entirely automatic and pain free so scaling is easy.


Enhanced Queries & Relevancy

Elasticsearch uses JSON as the serialisation format for documents and is supported by various programming languages. This allows you to construct complex queries and fine tune them to help deliver the relevant results you want from a search. It provides a way of ranking and grouping those result, and provides aggregations which can explore trends and patterns of data.


Flexibility

Elasticsearch can support all commonly-used data types including Text, Numbers, and Dates. It also supports more complex types such as objects, geo data types, nested types, arrays and many others.


Stability & Reliability

Running from a cluster of dedicated servers not only provides scalability to whatever size you need but also means that Elasticsearch is very robust. Should there be any issues with a server, there is an automatic failover, ensuring your search is continually operational in the event of an issue. Data is automatically replicated to prevent any loss in case of server failure.



Functional enhancements of Elasticsearch


Alongside the high-level benefits of using Elasticsearch, there are a number of functional benefits too. Below is a list of the default functions available with Elasticsearch in tradeit.


Standard Features


Fuzzy search

Keyword matches now cater for fuzziness on search passes allowing for spelling mistakes or mistyping, so if the user enters the term incorrectly the same results would be returned. For example:

  • “oramge” = “orange” (changing a character)
  • “handdbag” = “handbag” (removing a character)
  • “candel” = “candle” (transposing adjacent characters)
  • “bir” = “bird” (adding a character)

Keyword matches are also extended to include language stemmers so the same set of products would be returned whichever of the related words was included in the search. For example searching for 'swimming', 'swimmer' or 'swimmers' would return the same set of results by using the stem "swim".

Keyword matching will also remove 'stop words' so if words like 'to', 'the', 'i', 'and' etc... were included in the search term those words would be removed so only the other words are matched to drive more accurate results. For example, searching for 'the product' or 'product' would return exactly the same results.



Exact phrase matching

By using Elasticsearch, exact matches are boosted to ensure they rank above fuzzy matches. The amount they are boosted is configurable via tradeit.



Search results explainer

Upon administrator login, merchants can see why products are ranking where they do against any search term, and display scoring against each criteria that is defining that ranking in each instance. This is a great tool for examining and adjusting their search strategy, seeing the results and why, then tweaking as they see fit.




Configurable Features


Pin products considered ‘NEW’

A dynamic product segment called 'NEW' can be set up which has one rule applied to it which populates it with products that have a created date within the last 30 days (when they were added or imported to tradeit)*. When returning products on the search results or product listing page, any product which is in the 'NEW' product segment will be pinned above products that are not considered new.

* The time frame to determine what is classified as new can be adjusted via the admin system in tradeit.



Boost/moderate product by metric

Using the Elasticsearch rank feature query, merchants can boost products within listing and search results based on a metric. This is configurable by creating a set of rules (same as component rules and conditions). When any of those rules are matched, the merchant can choose to boost the results by a chosen metric. The metrics that can be used to boost products include Number of Orders, Number of Baskets, Number of Page Views, Number of Page Reviews, Average Review Rating and Sales Value. The metrics will boost the rank of the products based on the rank of the product within that metric, so the higher a product is in each metric then the higher the rank of the product in the search results and product listing. How much that metric rank will boost the item can also be adjusted by applying a weighting.

For example, 'for all customers, boost in stock products', or 'When viewing the monthly offers category, boost products with the highest sales price'.

NOTE: Metrics calculated at runtime would not be able to boost listing and search results.



Boost/moderate rank by product segment

Alongside being able to boost products by metrics, merchants can boost or moderate an entire product segment’s ranking. Product segments can defined by the merchant and contain any items they want. For example this could be a brand, new products, in stock products, a category of products or any items the merchants wishes to group together.



In stock sort

An additional sort option field of ‘In Stock’ has been enabled to allow merchants to display items that are in stock ahead of those that aren't when they apply that sort option.



Multiple sort fields

Elasticsearch allows merchants to configure two-dimensional sorting which enables them to combine options like product rank and in stock, so users will see products listed by rank (which is automatically calculated in tradeit based on up to 6 key metrics) that are in stock first, followed by rank of all those products that aren't in stock.



Fuzziness per search pass

Elasticsearch enables the level of fuzziness to be made configurable for each search pass including turning it off, setting it to be automatic (where it determines the number of characters that can be changed depending on the length of the search term - this is the current standard), or specifying the number of characters that can be changed. This can be used to promote more exact matches by configuring a search pass without fuzziness as being weighted higher than a search pass with it applied.



Partial matches within keywords

Elasticsearch has the ability to match search terms anywhere within keywords, rather than just at the start so a partial keyword search for “berry” would not match “blackberry” for example. The keyword match position can be configured within tradeit's manage search passes screen.



Joint field look-up

This enables merchants to combine multiple search fields against any search pass to ensure that they can output the most accurate matches. For instance, 'Product Name' + 'Colour'.



Synonyms

In Elasticsearch you can define and manage a list of synonyms which can be managed via tradeit's admin. As synonyms differ in relevance across industry sectors and product sets, there is no set of default synonyms that is applicable to all. Merchants would need to provide their own list of synonyms relevant to their particular data set but for example a search for 'Laptop' would return results for all 'Laptops' as well as 'Macbooks', 'Notebooks' and 'Netbooks' if the correct synonyms are set up.




Custom Implementation


Alongside the default and configurable functions some additional further features are also available but may require some time for additional development of components and interface.



Aggregated multiple search passes

Elasticsearch can support multiple search passes in one, so all search passes could be executed in one request to Elasticsearch and it will provide a set of results for each search pass. With this enabled a search will return the results based on all search passes, not just the first search pass which returned results. Along with the weighted search passes this can be used to deliver the user a larger set of search results but with still the more exact matches towards the top of the results.



Weighted search passes

When aggregated search passes are enabled, you can weight the importance of each pass to promote more exact matches and promote matches based on certain fields that are deemed more important (i.e. product name). For example a search pass can be configured as a keyword match on product name with a higher weighting than a keyword match on product description meaning any items where the product name matches the search term will be promoted higher in the results than one where the description matches.



Pin products by custom product segment

As part of the default installation, the dynamic segment NEW products is created (see above), however the same functionality can be used to pin product by any custom product segment the merchant wishes. These could be manually configured by selecting individual products, or by pulling in products already assigned to certain categories. Product segments can also be automated via rules, or based on product attribute values.

Where multiple segments are configured, the order of those segments can be sorted which affects the priority in which they are applied. For example, if there are two pinned custom product segments like “NEW” and “In Season” (in that order), the “NEW” products will display first, followed by the “In season” products. However, where a merchant uses multiple segments, they are also combined, so in this example the product would display as follows:

NEW + In Season
NEW + not In season
not NEW + In Season
not NEW and not In Season



Suggested search terms & categories

The Suggested Search Terms component can be introduced which includes both suggested search terms and categories based on the search term the user has entered. The suggested terms will be keywords which will match products and will show suggestions based on indexed products, rather than previous users' searches, meaning results are delivered immediately and with more logic - our search tool doesn't rely on a history of previous searches to build its intelligence.

Suggested categories can also be displayed based on the search term. These can appear alongside product and content matches, directly in the search results. If only one suggested category is displayed and the search is submitted the user will be redirected to the category page instead of the search results.



'More like this' product metric

Using Elasticsearch's 'More Like This' query we can power a new metric that automatically lists products similar to the products being viewed, but can be limited to a channel or category. The products are determined by how how much their name and/or description match the given product. In very simplistic terms, lets say for example we have a product that is a hair brush, we may ask for all other products with 'hair brush' in their 'product name' and in their 'description' fields, limited to the 10 closest matches, to be returned.



Content search

Elasticsearch can index both product and content pages meaning any non-product pages like articles (blog) will also appear alongside any products or categories in the search results, based on the user’s search. Changes will need to be made to the search flyout/dropdown and the search results page, to ensure this is displayed properly to the user.



Split content results

Blog articles and other page content can be treated differently and output independently in the search results. This requires a small bit of front end styling based on the merchant's requirements.



Boosted search passes

In some scenarios a merchant may wish to pin products to the top of their search results based on specific criteria they have defined such as NEW or IN STOCK for example. These would then be prioritised ahead of any other matches. However, whilst it may suit the merchant to try and sell things they currently have available, that’s not always ideal for the user as it may return items that are not exactly what they’re looking for but are in stock, ahead of the thing they are looking for if it isn’t in stock currently. In order to manage this situation tradeit features Boosted Search Passes, meaning that two sets of search passes can be created and then prioritised, so that group 1 is always carried out before group 2. When this is combined with weighted, aggregated search passes it can be extremely effectively for returning both more accurate and specifically merchandised results.

For example a merchant could set up and group their search passes as follows

Group 1 – Exact match on product code
Match against: Keyword - Full - Product Code - not fuzzy - weight 200%
Match against: Keyword - Partial - Product Code - not fuzzy - weight 150%

Group 2 – Pin items that are in stock
Match against: Keyword - Partial - Product Code - Fuzzy - weight: 200%;
Match against: Keyword - Full - Product Name - Fuzzy - weight: 150%
Match against: Keyword - Partial - Product Name - Fuzzy - weight: 100%
Match against: Keyword - Full - Product descriptions - not fuzzy - weight: 50%

Within group 2, products can be pinned to the top based on whether they are in stock or not, however they will ALWAYS appear below group 1 which is showing exact product code matches above any other ruling.


Further Customisations

Once a customer has implemented Elasticsearch, there is further scope to enhance the capabilities based on any particular business requirements, particularly where solutions rely heavily on data or are potentially very time-consuming.

For example, a merchant may have several stores each holding stock, as well as stock held in their main warehouse, but would like to prioritise the display of warehouse products first, as fulfilment is quicker and easier.




Frequently Asked Questions



Speak to us to learn more about how we can configure Elasticsearch as the search provider for tradeit on your ecommerce site.