Sorting Aggregations Based On Score In Elasticsearch

Recently, I was asked if it was possible to sort aggregations returned as part of response from Elasticsearch using score or more specifically, sort aggregations based on the top score of each bucket (in this case, terms aggregation). As an example, for the following genres aggregation,

    "aggs" : {
        "genres" : {
            "terms" : { "field" : "genre" }
        }
    }

and the response,

{
    ...
    "aggregations" : {
        "genres" : {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets" : [
                {
                    "key" : "electronic",
                    "doc_count" : 6
                },
                {
                    "key" : "rock",
                    "doc_count" : 3
                },
                {
                    "key" : "jazz",
                    "doc_count" : 2
                }
            ]
        }
    }
}

The genres needed to be sorted based on the highest score of document falling into each genre (or bucket). A cursory glance over terms aggregation documentation yielded that order parameter can be based on either doc_count or _key (previously _term) only. Not on the score (yet).

A part of the problem required tracking of most relevant document (i.e. score) being aggregated. Top Hits aggregation does precisely that and in-fact is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket. This was close to what I wanted. Great!

Putting into action, I updated aggregation clause with top_hits sub-aggregator,

    "aggs" : {
        "genres" : {
            "terms" : { "field" : "genre" }
        },
        "aggs": {
             "top_genre_hits": {
                       "top_hits": {}
             }
        }
    }

returns following response,

{
  "aggregations" : {
    "genres" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 697,
      "buckets" : [
        {
          "key" : "electronic",
          "doc_count" : 760,
          "top_genre_hits" : {
            "hits" : {
              "total" : 760,
              "max_score" : 67.01266,
              "hits" : [
              ...
              ]
            }
          }
        }
      ]
    }
  }
}

This felt like a good enough solution but Elasticsearch's documentation mentions that,

At the moment the max (or min) aggregator is needed to make sure the buckets from the terms aggregator are ordered according to the score ...

And so, further improved on the aggregation to include a max aggregator on _score field,

    "aggs" : {
        "genres" : {
            "terms" : { "field" : "genre" }
        },
        "aggs": {
             "top_genre_hits": {
                   "top_hits": {}
             },
            "top_hit": {
              "max": {
                "script": {
                  "source": "_score"
                }
              }
            }
        }
    }

which returns response containing max score for each bucket in a different field,

{
  "aggregations" : {
    "genres" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 697,
      "buckets" : [
        {
          "key" : "electronic",
          "doc_count" : 760,
          "top_genre_hits" : {
            "hits" : {
              "total" : 760,
              "max_score" : 67.01266,
              "hits" : [
              ...
              ]
            }
          },
          "top_hit" : {
             "value" : 74.02320861816406
           }
        }
      ]
    }
  }
}

Note that top_hit.value value representing the max score for each bucket. Now that I have access to top score from each bucket, ideally I'd like to sort the aggregations based on top_hit.value field and be done with it. However, as of now it can't be used in the order option of the terms aggregator. And so, the same would have to be handled on the application level instead.

elasticsearch