Scaling Elasticsearch Part 3: Queries

See part 1 and part 2 for an overview of our system and how we scale our indexing. Originally I was planning a separate post for global queries and related posts queries, but it was hard to break into two posts and contributed to me taking forever to write them.

Two types of queries run on the WordPress.com Elasticsearch index: global queries across all posts and local queries that search posts within a single blog. Over 90% of our queries (23 million/day) are local, and the remainder are global (2 million/day).

In this post we’ll show some performance data for global and local queries, and discuss the tradeoffs we made to improve their performance.

An Aside about Query Testing

We used JMeter for most of our query testing. Since we are comparing global and local queries, it was important to get a variety of different queries to run and to run them on mostly full indices (though sometimes ongoing development made that impractical).

Generally we ran real user queries randomly sampled from our search logs. For local queries we were running related posts type queries with the posts pseudo-randomly selected from the index. There had to be a fair bit of hand curation of the queries due to errors in our sampling techniques and the occasional bad query type that would get through. Generating a good mix of queries is a combination of art and deciding that things are good enough.

A Few Mapping Details

I’ve already written a post on how we map WordPress posts to ES documents (and wpes-lib is on github), and there’s a description of how we handle multi-lingual data. Mostly those details are irrelevant for scaling our queries, but there are two important details:

All post documents are child documents of blog documents. The blog documents record meta information about the blog.
Child documents are always stored on the same shard as their parent document. They all use the same routing value. This provides us with a huge optimization. Local searches can use routing on the query so it gets executed on a single node and accesses only a single shard.

The Global Query

The original plan with our global queries was to use the parent blog documents to keep track of some meta information about the blogs to determine whether a blog’s posts were globally searchable (whether they were publicly accessible, had mature content, spam, etc). By using the blog document we’d be able to update a single document and not having to reindex all the posts on the blog. Then we’d use a has_parent filter on all global queries.

Unfortunately we found that parent/child filters did not scale well for our purposes. All parent documents ids had to be loaded into memory (about 5.3 GB of data per node for our 60 million blog documents). We had plenty of RAM available for ES (30 GB) and the id cache seemed to hold steady, but we still saw much slower queries and higher server load when using the has_parent filter. Here’s the data for the final parent/child performance test we ran (note this was tested on a 16 node cluster):

Using has_parent	JMeter threads	Reqs/Sec	Median Latency(ms)	Mean Latency(ms)	Max Latency(ms)	CPU %	Load
yes	2	5.3	538	365	2400	45	12
no	2	10	338	190	1500	30	8
yes	4	7.6	503	514	2500	75	22
no	4	17.5	207	222	2100	50	15

Avoiding using the has_parent filter saved us a lot of servers and gave us a lot more overhead, even though it means we have to bulk reindex posts more often.

In the end our global query looks something like:

POST global-*/post/_search
{
   "query": {
      "filtered": {
         "query": {
            "multi_match": {
              "fields": ["content","title"],
              "query": "Can I haz query?"
            }
         },
         "filter": {
            "and": [
               {
                  "term": {
                     "lang": "en"
                  }
               },
               {
                  "and": [
                    {
                       "term": {
                          "spam": false
                       }
                    },
                    {
                       "term": {
                          "private": false
                       }
                    },
                    {
                       "term": {
                          "mature": false
                       }
                    }
                  ]
               }
            ]
         }
      }
   }
}

One last side note on ES global queries because it cannot be said strongly enough: FILTERS ARE EPIC. If you don’t understand why I am shouting this at you, then you need to go read about and understand Bitsets and how filters are cached.

Sidenote: after re-reading that post I realized we may be able to improve our performance a bit by switching from using AND filters shown above to bool filters. This is why blogging is good. This change cut our median query time in half:

Bitsets work; halved the median elasticsearch query time by using bool filters instead of and. http://t.co/UCqdcGNrBb pic.twitter.com/293VHmKZW9

— Xiao Yu (@HypertextRanch) July 17, 2014

Global Query Performance With Increasing Numbers of Shards

Global queries require gathering results from all shards and then combining the results to give the final result. A search for 10 results across 10 shards requires running the query on each of the shards, and then combining those 100 results to get the final 10 results. More shards means more processing across the cluster, but also more results that need to be combined. This gets even more interesting when you start paging results. To get results 90-100 of a search across ten shards requires requiring combining 1000 results to satisfy the search.

We did some testing of our query performance across a few million blogs as we varied the number of shards using a fixed number of JMeter threads.

Shards/Index	Requests/sec	Median(ms)	Mean(ms)
5	320	191	328
10	320	216	291
25	289	345	332
50	183	467	531

There’s a pretty clear relationship between query latency and number of shards. This result pushed us to try and minimize the total number of shards in our cluster. We can probably also improve query performance by increasing our replication.

The Local Query

Most of our local queries are used for finding related posts. We run a combination of mlt_query queries and multi_match queries to send the text of the current post and find posts that are similar. For a post with the title “The Best”, and content of “This is the best post in the world” the query would look like:

POST global-0m-10m/post/_search?routing=12345
{
 "query": {
   "filtered": {
     "query": {
       "multi_match": {
         "query": "The Best This is the best post in the world.",
         "fields": ["mlt_content"]
       }
     },
     "filter": {
       "and": [
         {
           "term": {
             "blog_id": 12345
           }
         },
         {
           "not": {
             "term": {
               "post_id": 3
             }
           }
         }
      ]
    }
  }
 }
}

Looks simple, but there’s a lot of interesting optimizations to discuss.

Routing

All of our local queries use the search routing parameter to limit the search to a single shard. Organizing our indices and shards so that an entire blog will always be in a single shard is one of the most important optimizations in our system. Without it we would not be able to scale and handle millions of queries because we would be wasting a lot of cycles searching shards that had no or very few documents that were actually related to our search.

Query Type

In the above example, we use the multi_match query. If the content was longer (more than 100 words) we would use the mlt_query. We originally started out with all queries using mlt_query to make the queries faster and ensure good relevancy. However, using mlt_query does not guarantee that the resulting query will actually have any terms in the final search query. A lot will depend on the frequency of the terms in the post and their frequency in the index. Changing to using multi_match for short content gave us a big improvement in the relevancy of our results (as measured by click through rate).

MLT API

We started building related posts by using the MLT API for running the query. In that case we would only send the document id to ES and trust ES to get the post, analyze it, and construct the appropriate mlt_query for it. This worked pretty well, but did not scale as well as building our own query on the client. Switching to mlt_query gave us a 10x improvement in number of requests we could handle and reduced the query response time.

Operation	Requests/sec	Median(ms)	Mean(ms)
MLT API	150	270	306
mlt_query	1200	77	167

From what I could tell the big bottleneck was getting the original document. Of course this change moved a lot of processing off of the ES cluster and onto the web hosts, but web hosts are easier to scale than ES nodes (or at least is a problem our systems team is very good at).

mlt_content Field

The query is going to a single field called mlt_content rather than to separate title and content fields. Searching fewer fields gives us a significant performance boost, and helps us search for words that occur in different fields in different posts. The fairly new multi_match cross_fields option could probably help here, but I assume would not be as performant as a single field.

For a while we were also storing the mlt_content field, but recent work has determined that storing the field did not speed up the mlt_query queries.

The history of how we ended up using mlt_content is also instructive. We started using the mlt_content field and storing it while we were still using the MLT API. Originally we were using the post title and content fields which were getting extracted from the document’s _source. Switching to a stored mlt_content field reduced the average time to get a document before building the query from about 500ms to 100ms. In the end this turned out to not be enough of a performance boost for our application, but is worth looking into for anyone using the MLT API.

Improving Relevancy with Rescoring

We’ve run a couple of tests to improve the relevancy of our related posts. Our strategy here has mostly been to use the mlt_query/multi_match queries as our basic query and then use rescoring to adjust the query results. For instance we build a query that slightly reranks our top 50 results based on the commonality between who has liked the current post and whether they liked the posts that were similar. Running this using the rescoring option had almost no impact on query performance and yet gave a nice bump to the click through rate of our related posts.

Shard Size and Local Query performance

While global queries perform best with a small number of larger shards, local queries are fastest when the shard is smaller. Here’s some data we collected comparing the number of shards to routed query speed when running mlt_query searches:

Shards/Index	Requests/sec	Median(ms)	Mean(ms)	Max(ms)
10	1200	77	167	5900
30	1600	67	112	1100

Less data in each shard (more total shards) has a very significant impact on the number of slow queries and how slow they are.

Final Trade Off of Shard Size and Query Performance

Based off the global and local query results above we decided to go with 25 shards per index as a tradeoff between decent performance for both query types. This was one of the trickier decisions, but it worked reasonably well for us for a while. About 6 months after making this decision though we decided that we were ending up with too many slow queries (queries taking longer than 1 second).

I guess that’s my cue to tease the next post in this (possibly) never-ending series: rebuilding our indices to add 6-7x as many documents while reducing slow queries and speeding up all queries in general. We went from the 99th percentile of our queries taking 1.7 seconds down to 800ms and the median time dropping from 180ms to 50ms. All these improvements while going from 800 million docs in our cluster to 5.5 billion docs. Details coming up in my next post.

Scaling Elasticsearch Part 3: Queries

An Aside about Query Testing

A Few Mapping Details

The Global Query

Global Query Performance With Increasing Numbers of Shards

The Local Query

Routing

Query Type

MLT API

mlt_content Field

Improving Relevancy with Rescoring

Shard Size and Local Query performance

Final Trade Off of Shard Size and Query Performance

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112