Evolving from SQL Full Text Search to Elasticsearch in tradeit
Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. It is a JSON document store built upon the Apache Lucene search engine and can be used with the language and platform of your choice.
With Microsoft SQL Server Full Text Search not being actively developed and with the backing of a whole community of developers behind it delivering regular updates, enhancements and support, Elasticsearch is now being adopted as the industry standard for building fast, full-text search functionality. With Elasticsearch ideally suited to ecommerce sites with a large number of products, it can now be configured as the search provider for tradeit.
Why choose Elasticsearch over SQL?
One of the main advantages of Elasticsearch is speed, particularly when faced with large data sets such as a large number of products in an ecommerce store. Slow response times deliver a poor user experience and cause higher bounce rates but Elasticsearch can achieve fast search responses because it searches an index instead of the text directly. By caching almost all queries it precomputes the results, so when something is actually searched for, it is found (or not) very quickly. Elasticsearch can return sub-second responses on huge datasets giving it near real time performance.
Elasticsearch is built to scale. Distributing the processing load across multiple nodes allows Elasticsearch to be easily scaled across servers and balance the load between those nodes in a cluster. Elasticsearch will run fine on any machine but can be scaled across hundreds of servers and contain petabytes of information. Growing the number of clusters is almost entirely automatic and pain free so scaling is easy.
Enhanced Queries & Relevancy
Elasticsearch uses JSON as the serialisation format for documents and is supported by various programming languages. This allows you to construct complex queries and fine tune them to help deliver the relevant results you want from a search. It provides a way of ranking and grouping those result, and provides aggregations which can explore trends and patterns of data.
Elasticsearch can support all commonly-used data types including Text, Numbers, and Dates. It also supports more complex types such as objects, geo data types, nested types, arrays and many others.
Stability & Reliability
Running from a cluster of dedicated servers not only provides scalability to whatever size you need but also means that Elasticsearch is very robust. Should there be any issues with a server, there is an automatic failover, ensuring your search is continually operational in the event of an issue. Data is automatically replicated to prevent any loss in case of server failure.
Functional enhancements of Elasticsearch over SQL Full Text Search?
Alongside the high-level benefits of using Elasticsearch, there are a number of functional benefits too. Lets examine some of the new functionality available in Elasticsearch over what's in SQL, see what's currently available with Elasticsearch in tradeit, and what is being delivered in the near future.
Suggested categories appear in the search fly-out where the category name matches the search term but this is greatly improved via Elasticsearch through the introduction of fuzziness, synonyms, and analyzers.
Recommended Search Terms
Using SQL, suggested search terms appear in the search fly-out where the customer entered search term partially matches a previously used search term that returned results. This can be useful for sites that are well established and have had a number of months, or years, worth of data built up of user searches, but for new sites with no, or very limited amounts of, data, it doesn't really assist the user.
By using Elasticsearch instead, fuzzy suggestions based on indexed products, rather than previous search terms, can be returned meaning results are delivered immediately and with more logic than relying on previous user searches.
'More Like This' Component
Using Elasticsearch's 'More Like This' query we can power a new metric that automatically lists products similar to the products being viewed. The simplest way consists of asking for other products that are similar to the one provided using tf-idf (term frequency-inverse document frequency), a numerical statistic that reflects how important a word is to a document. The higher the tf-idf, the more 'alike' it is to the product being viewed. In very simplistic terms, lets say for example we have a product that is a hair brush, we may ask for all other products with 'hair brush' in their 'product name' and in their 'description' fields, limited to the 10 closest matches, to be returned.
This significantly reduces the configuration required by the merchant to manually define products that are similar to each other, but the flexibility of Elasticsearch means there are numerous selectable parameters which can help merchants hone the results of their user's searches too, all of which can be controlled by the admin system in tradeit.
Using the Elasticsearch rank feature query, merchants can boost or promote products within listing and search results based on a metric. This is configurable by creating a set of rules (same as component rules and conditions). When any of those rules are matched, the merchant can choose to boost the results by a chosen metric.
For example, 'for all customers, boost in stock products', or 'When viewing the monthly offers category, boost products with the highest sales price'.
NOTE: Metrics calculated at runtime would not be able to boost listing and search results.
Pin Products By Metric
This allows merchants to pin products to the top of the product listing or search results. It is configurable by the merchant by creating a set of rules (same as component rules & conditions), when any of those rules are matched, the merchant can choose to pin specific products, or choose to pin a metric of products, to the listing or results
For example, 'When the customer has a product in the basket, pin products inspired by their basket to the search results'.
Exact Phrase Matching
By using Elasticsearch, exact matches are boosted to ensure they rank above fuzzy matches. The amount they are boosted is configurable via tradeit.
Partial Phrase Matching
Not supported in SQL Full Text Search, Elasticsearch supports partial phrase matching meaning results can be returned from a partial search. For example, a search term of 'Red Sho' would return matches of 'Red Shoes' and 'Red Shorts'. The same search in SQL Full Text Search would return no results.
Keyword matches now cater for fuzziness on search passes allowing for spelling mistakes or mistyping, so if the user enters the term incorrectly the same results would be returned.
Keyword matches are also extended to include language inflections so the same set of products would be returned whichever of the related words was included in the search. For example searching for 'swimming', 'swimmer' or 'swimmers' would return the same set of results.
Keyword matching will also remove 'stop words' so if words like 'to', 'the', 'i', 'and' etc... were included in the search term those words would be removed so only the other words are matched. For example, searching for 'the product' or 'product' would return exactly the same results.
Partial Keyword Matching
Elasticsearch has the ability to match wild cards at the start, middle or end of search term keywords, whilst SQL Full Text Search is restricted to only matching wild cards at the end of a search term.
Elasticsearch enables the level of fuzziness to be made configurable for each search pass including turning it off, setting it to be automatic (where it determines the number of characters that can be changed depending on the length of the search term - this is the current standard), or specifying the number of characters that can be changed.
Field Level Configuration & Documented Indexed Fields
The type of match (keyword/partial) can be defined for each field within a search pass, this means, along with the ability to weight fields in a search, the rank of the matches could be determined on whether it was an exact or partial match e.g. a search pass could include a phrase match on product code so it matches exactly and is weighted more than a keyword match in the same search term than on the product name and description which returns partial matches.
In Elasticsearch you can define and manage a list of synonyms which can be managed via tradeit's admin. As synonyms differ in relevance across industry sectors and product sets, there is no set of default synonyms that is applicable to all. Merchants would need to provide their own list of synonyms relevant to their particular data set but for example a search for 'Laptop' would return results for all 'Laptops' as well as 'Macbooks', 'Notebooks' and 'Netbooks' if the correct synonyms are set up.
Multiple Search Passes In One
Unlike SQL, Elasticsearch can support multiple search passes in one, so all search passes could be executed in one request to Elasticsearch and it will provide a set of results for each search pass.
Multiple Sort Fields
Elasticsearch can apply multiple levels of sorting by specifying multiple fields to sort by.
Speak to us to learn more about how we can configure Elasticsearch as the search provider for tradeit on your ecommerce site.