博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Elasticsearch学习---聚合查询之Bucket Aggregations
阅读量:2492 次
发布时间:2019-05-11

本文共 6266 字,大约阅读时间需要 20 分钟。

前言

Elasticsearch除搜索以外,还提供了针对数据统计分析的功能,通过各种API可以构建数据的复杂查询,不同类型的聚合查询都有自己的目的和输出,为了更好的理解这些类型,人们通常又会把它们分为三大类。

聚合类型三大类

每个桶都与一个键和一个文档标准相关联,通过桶的聚合查询,我们将得到一个桶的列表,即:满足条件的文档集合。

计算一组文档的某些指标项的聚合

对其他聚合的输出或相关指标进行二次聚合

Bucket Aggregations

Bucket就类似于数据库中的分组,把满足条件的文档分为一组,Elasticsearch提供了很多类型的分组,比如有:range,geo、sample、term等

下面来看几个实际的例子

Term Aggregation

下面这个表示,查询索引为kibana_sample_data_flights中的文档数据,并按照DestCountry进行聚合查询,命名为:flight_dest,且只查询前5条。

GET /kibana_sample_data_flights/_search{
"aggs": {
"flight_dest": {
"terms": {
"field": "DestCountry", "size": 5 } } }}

查询结果如下,前面是文档数据,最后是flight_dest信息

在这里插入图片描述

Range Aggregation

按照AvgTicketPrice属性,分为三档,分别为:小于500,500到1000,大于1000

GET /kibana_sample_data_flights/_search{
"aggs": {
"price_ranges": {
"range": {
"field": "AvgTicketPrice", "ranges": [ {
"to": 500 }, {
"from": 500, "to": 1000 }, {
"from": 1000 } ] } } }}

查询结果

在这里插入图片描述

聚合结果中的key也支持自定义命名,比如:

在这里插入图片描述

查询目的地是IT,且按照三类票价进行分组

在这里插入图片描述

Date Range Aggregation

基于时间范围的聚合查询

GET /user_info_2/_search{
"aggs": {
"range": {
"date_range": {
"field": "update_date", "ranges": [ {
"to": "2020-05-01 00:00:00" }, {
"from": "2020-05-02 00:00:00", "to": "2020-08-01 00:00:00" }, {
"from": "2020-08-02 00:00:00" } ] } } }}

查询结果

{
"took": 9, "timed_out": false, "_shards": {
"total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": {
"total": 6, "max_score": 1, "hits": [ {
"_index": "user_info_2", "_type": "_doc", "_id": "8", "_score": 1, "_source": {
"age": "20", "update_date": "2020-05-01 00:00:00" } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "9", "_score": 1, "_source": {
"name": "赵六", "update_date": "2020-08-01 00:00:00" } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "10", "_score": 1, "_source": {
"age": null, "update_date": "2020-11-01 00:00:00" } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "2", "_score": 1, "_source": {
"name": "李四", "age": 29, "address": "中国南京市建邺区", "tel": "13901234568", "update_date": "2020-01-01 00:00:00" } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "1", "_score": 1, "_source": {
"update_date": "2020-01-01 00:00:00" } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "3", "_score": 1, "_source": {
"name": "王五", "age": 30, "address": "中国北京市朝阳区", "tel": "13901234567", "update_date": "2020-03-01 00:00:00" } } ] }, "aggregations": {
"range": {
"buckets": [ {
"key": "*-2020-05-01 00:00:00", "to": 1588291200000, "to_as_string": "2020-05-01 00:00:00", "doc_count": 3 }, {
"key": "2020-05-02 00:00:00-2020-08-01 00:00:00", "from": 1588377600000, "from_as_string": "2020-05-02 00:00:00", "to": 1596240000000, "to_as_string": "2020-08-01 00:00:00", "doc_count": 0 }, {
"key": "2020-08-02 00:00:00-*", "from": 1596326400000, "from_as_string": "2020-08-02 00:00:00", "doc_count": 1 } ] } }}

Filter Aggregation

对经过Filter条件过滤后的结果集进行聚合查询

如下表示,从DestCountry为AU的文档集中进行聚合查询,统计DistanceMiles的平均值。

GET /kibana_sample_data_flights/_search{
"aggs": {
"flight_Miles": {
"filter": {
"term": {
"DestCountry": "AU" } }, "aggs": {
"avg_miles": {
"avg": {
"field": "DistanceMiles" } } } } }}

结果如下

在这里插入图片描述

Missing Aggregation

统计文档中缺失字段的数量,缺失字段包含值为null的情况

在user_info_2索引中,找缺失age的文档数

GET /user_info_2/_search{
"aggs": {
"without_age": {
"missing": {
"field": "age" } } }}

统计结果为2,一个没有age字段,一个age字段值为null

{
"took": 10, "timed_out": false, "_shards": {
"total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": {
"total": 6, "max_score": 1, "hits": [ {
"_index": "user_info_2", "_type": "_doc", "_id": "9", "_score": 1, "_source": {
"name": "赵六" } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "8", "_score": 1, "_source": {
"age": "20" } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "10", "_score": 1, "_source": {
"age": null } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "2", "_score": 1, "_source": {
"name": "李四", "age": 29, "address": "中国南京市建邺区", "tel": "13901234568" } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "1", "_score": 1, "_source": {
"name": "张三", "age": 28, "address": "中国南京市鼓楼区", "tel": "13901234567" } }, {
"_index": "user_info_2", "_type": "_doc", "_id": "3", "_score": 1, "_source": {
"name": "王五", "age": 30, "address": "中国北京市朝阳区", "tel": "13901234567" } } ] }, "aggregations": {
"without_age": {
"doc_count": 2 } }}

Histogram Aggregation

直方图聚合,可按照一定的区间进行统计

GET /kibana_sample_data_flights/_search{
"aggs": {
"test": {
"histogram": {
"field": "AvgTicketPrice", "interval": 100 } } }}

查询结果如下

在这里插入图片描述

转载地址:http://felrb.baihongyu.com/

你可能感兴趣的文章
记一次断电恢复ORA-01033错误
查看>>
C#修改JPG图片EXIF信息中的GPS信息
查看>>
从零开始的Docker ELK+Filebeat 6.4.0日志管理
查看>>
Sequelize的原始查询的时区问题
查看>>
How it works(1) winston3源码阅读(A)
查看>>
How it works(2) autocannon源码阅读(A)
查看>>
How it works(3) Tilestrata源码阅读(A)
查看>>
How it works(12) Tileserver-GL源码阅读(A) 服务的初始化
查看>>
uni-app 全局变量的几种实现方式
查看>>
echarts 为例讲解 uni-app 如何引用 npm 第三方库
查看>>
uni-app跨页面、跨组件通讯
查看>>
springmvc-helloworld(idea)
查看>>
JDK下载(百度网盘)
查看>>
idea用得溜,代码才能码得快
查看>>
一篇掌握python魔法方法详解
查看>>
数据结构和算法5-非线性-树
查看>>
数据结构和算法6-非线性-图
查看>>
数据结构和算法7-搜索
查看>>
数据结构和算法8-排序
查看>>
windows缺少dll解决办法
查看>>