本文共 6266 字,大约阅读时间需要 20 分钟。
Elasticsearch除搜索以外,还提供了针对数据统计分析的功能,通过各种API可以构建数据的复杂查询,不同类型的聚合查询都有自己的目的和输出,为了更好的理解这些类型,人们通常又会把它们分为三大类。
每个桶都与一个键和一个文档标准相关联,通过桶的聚合查询,我们将得到一个桶的列表,即:满足条件的文档集合。
计算一组文档的某些指标项的聚合
对其他聚合的输出或相关指标进行二次聚合
Bucket就类似于数据库中的分组,把满足条件的文档分为一组,Elasticsearch提供了很多类型的分组,比如有:range,geo、sample、term等
下面来看几个实际的例子
下面这个表示,查询索引为kibana_sample_data_flights中的文档数据,并按照DestCountry进行聚合查询,命名为:flight_dest,且只查询前5条。
GET /kibana_sample_data_flights/_search{ "aggs": { "flight_dest": { "terms": { "field": "DestCountry", "size": 5 } } }}
查询结果如下,前面是文档数据,最后是flight_dest信息
按照AvgTicketPrice属性,分为三档,分别为:小于500,500到1000,大于1000
GET /kibana_sample_data_flights/_search{ "aggs": { "price_ranges": { "range": { "field": "AvgTicketPrice", "ranges": [ { "to": 500 }, { "from": 500, "to": 1000 }, { "from": 1000 } ] } } }}
查询结果
聚合结果中的key也支持自定义命名,比如:
查询目的地是IT,且按照三类票价进行分组
基于时间范围的聚合查询
GET /user_info_2/_search{ "aggs": { "range": { "date_range": { "field": "update_date", "ranges": [ { "to": "2020-05-01 00:00:00" }, { "from": "2020-05-02 00:00:00", "to": "2020-08-01 00:00:00" }, { "from": "2020-08-02 00:00:00" } ] } } }}
查询结果
{ "took": 9, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 6, "max_score": 1, "hits": [ { "_index": "user_info_2", "_type": "_doc", "_id": "8", "_score": 1, "_source": { "age": "20", "update_date": "2020-05-01 00:00:00" } }, { "_index": "user_info_2", "_type": "_doc", "_id": "9", "_score": 1, "_source": { "name": "赵六", "update_date": "2020-08-01 00:00:00" } }, { "_index": "user_info_2", "_type": "_doc", "_id": "10", "_score": 1, "_source": { "age": null, "update_date": "2020-11-01 00:00:00" } }, { "_index": "user_info_2", "_type": "_doc", "_id": "2", "_score": 1, "_source": { "name": "李四", "age": 29, "address": "中国南京市建邺区", "tel": "13901234568", "update_date": "2020-01-01 00:00:00" } }, { "_index": "user_info_2", "_type": "_doc", "_id": "1", "_score": 1, "_source": { "update_date": "2020-01-01 00:00:00" } }, { "_index": "user_info_2", "_type": "_doc", "_id": "3", "_score": 1, "_source": { "name": "王五", "age": 30, "address": "中国北京市朝阳区", "tel": "13901234567", "update_date": "2020-03-01 00:00:00" } } ] }, "aggregations": { "range": { "buckets": [ { "key": "*-2020-05-01 00:00:00", "to": 1588291200000, "to_as_string": "2020-05-01 00:00:00", "doc_count": 3 }, { "key": "2020-05-02 00:00:00-2020-08-01 00:00:00", "from": 1588377600000, "from_as_string": "2020-05-02 00:00:00", "to": 1596240000000, "to_as_string": "2020-08-01 00:00:00", "doc_count": 0 }, { "key": "2020-08-02 00:00:00-*", "from": 1596326400000, "from_as_string": "2020-08-02 00:00:00", "doc_count": 1 } ] } }}
对经过Filter条件过滤后的结果集进行聚合查询
如下表示,从DestCountry为AU的文档集中进行聚合查询,统计DistanceMiles的平均值。
GET /kibana_sample_data_flights/_search{ "aggs": { "flight_Miles": { "filter": { "term": { "DestCountry": "AU" } }, "aggs": { "avg_miles": { "avg": { "field": "DistanceMiles" } } } } }}
结果如下
统计文档中缺失字段的数量,缺失字段包含值为null的情况
在user_info_2索引中,找缺失age的文档数
GET /user_info_2/_search{ "aggs": { "without_age": { "missing": { "field": "age" } } }}
统计结果为2,一个没有age字段,一个age字段值为null
{ "took": 10, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 6, "max_score": 1, "hits": [ { "_index": "user_info_2", "_type": "_doc", "_id": "9", "_score": 1, "_source": { "name": "赵六" } }, { "_index": "user_info_2", "_type": "_doc", "_id": "8", "_score": 1, "_source": { "age": "20" } }, { "_index": "user_info_2", "_type": "_doc", "_id": "10", "_score": 1, "_source": { "age": null } }, { "_index": "user_info_2", "_type": "_doc", "_id": "2", "_score": 1, "_source": { "name": "李四", "age": 29, "address": "中国南京市建邺区", "tel": "13901234568" } }, { "_index": "user_info_2", "_type": "_doc", "_id": "1", "_score": 1, "_source": { "name": "张三", "age": 28, "address": "中国南京市鼓楼区", "tel": "13901234567" } }, { "_index": "user_info_2", "_type": "_doc", "_id": "3", "_score": 1, "_source": { "name": "王五", "age": 30, "address": "中国北京市朝阳区", "tel": "13901234567" } } ] }, "aggregations": { "without_age": { "doc_count": 2 } }}
直方图聚合,可按照一定的区间进行统计
GET /kibana_sample_data_flights/_search{ "aggs": { "test": { "histogram": { "field": "AvgTicketPrice", "interval": 100 } } }}
查询结果如下
转载地址:http://felrb.baihongyu.com/