ElasticSearch入门教程(基于7.9.0)

#ElasticSearch 入门教程

安装

ElasticSearch安装（以linux为例）

下载elasticsearch:

  curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.0-linux-x86_64.tar.gz

解压

  tar -xvf elasticsearch-7.9.0-linux-x86_64.tar.gz

启动
```
  cd elasticsearch-7.9.0/bin
  ./elasticsearch
```
也可以将ES作为后台进程启动：
```
  ./bin/elasticsearch -d -p pid
```
启动成功后会将进程id写入 pid 文件

检查运行状态

  curl -X GET "localhost:9200/_cat/health?v&pretty"
  epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
  1599100417 02:33:37  elasticsearch green           1         1      0   0    0    0        0             0                  -                100.0%

green表示运行正常，集群健康

停止
- 如果是前台运行的，则直接使用 Ctrl + C 停止服务
- 如果是后台运行的，则找到进程id，使用 kill pid 停止服务

安装ik分词器

下载

访问 https://github.com/medcl/elasticsearch-analysis-ik/releases，下载对应版本的分词器

安装

  cd  elasticsearch-7.9.0/plugins
  mkdir ik
  unzip elasticsearch-analysis-ik-7.9.0.zip -d elasticsearch-7.9.0/plugins/ik

重启ES

查看是否安装成功

  curl 'localhost:9200/_cat/plugins?v&pretty'
  name              component   version
  jenkins.csiit.net analysis-ik 7.8.0

说明已经安装成功

初步使用

基本概念

近实时（NRT）

ES是一个近实时搜索平台。这意味着从你索引一个文档到这个文档可以备搜索有很小的延迟（大约1秒钟）。

集群(Cluster)

集群是一个或多个机器/节点管理你的所有数据，在这些所有节点上提供联和索引和搜索功能。一个集群通过一个唯一的集群名标识(默认是 elasticsearch)。集群名非常重要，
只有当一个节点的集群名与集群一致的时候，此节点才能加入到集群内。

节点(Node)

节点是集群中的一个单服务，有存储数据、参与集群索引和搜索的功能。类似于集群，一个节点也是通过一个唯一的节点名来标识，默认的节点名是启动的时候生成的一个UUID，
你可以定义而任意节点名代替默认的。这个名字很重要。当你进行集群管理的时候，查看哪些节点在当前的网络中以及哪些节点在集群中，都需要这个节点名来辨认。

一个节点可以配置集群名加入某个集群。默认情况下，每个节点都加入名字为 elasticsearch 的集群，这意味着，如果你在一个网络内启动了多个节点，并且假如他们能够发现彼此
，他们将自动组成一个集群，叫 elasticsearch。

索引(index)

索引是一系列有共同特征的文档的集合。例如，你可以为顾客数据创建一个索引，为产品的目录创建一个索引，还有另外一个订单索引。一个索引通过索引名（必须是全小写）来标识。
当进行索引、搜索、更新、删除文档的时候需要此索引名来指明索引。

在一个集群内，可以定义任意多个索引。

Type/类型/映射

在索引内，可以定义一个或多个映射。映射可以理解为逻辑上的索引分区，一个映射的用途完全取决于用户。一般来说，映射用来定义有一系列相同属性的文档。例如，
假设你有一个博客平台，并且将数据存入了一个索引。在这个索引内部，你可以为User定义一个映射，为Blog定义一个映射，为评论定义一个映射。

文档/Document

文档是可索引信息的基本单元。例如，你有一个文档对应于一个用户，另一个文档对应于一个商品，另一个文档对应于订单。文档以JSON表示。

在索引的映射里面，你可以存储任意多的文档。虽然文档在物理上属于索引，但是文档必须指定索引内的映射类型。也就是说，我们先定义索引，
然后创建映射，然后将文档多索引到映射里面。

分片/复制

一个索引可能存储非常多的数据以至于超过单个节点的硬件限制。例如一个有数十亿文档的索引，大概占据1TB的磁盘空间，这样可能就不适合存在一个节点（尽管可以这么做），
因为可能会导致查询搜索非常缓慢。

为了解决这个问题， ES提供了将索引细分到多个机器上面的能力，称为分片。在创建索引的时候，可以指定此索引有几个分片。每个分片都是独立、完整的，
可以存于集群中的任意一个节点。

分片特性非常重要，有以下原因：

可以水平伸缩、扩容
可以在多个分片之间分布式、并行的操作查询（可能是在多个节点上），这样可以增加性能/吞吐量.
至于分片如何分布、索引的文档如何聚合回到查询请求里面，这些都是ES本身就提供的，对用户是透明的。

基于以下两点原因，复制也很重要：

它提供了高可用防止节点或者分片不可用。从这点来说，从分片不会跟主分片分配到同一个节点上面。
可以扩展查询吞吐量，因为查询可以并行的在从分片执行。这一点类似于上面分片的功能。
总结一下，每个索引可以切分到多个分片上。一个索引也可以有0个或多个复制副本。一旦配置了复制，每个索引将会有一个主分片和从分片。分片的数量和复制的
数量可以在创建索引的时候指定。索引创建之后，可以动态修改复制的数量，但是不可修改分片的数量。

默认情况下，每个索引分配5个主分片和1个从分片. 这意味着如果集群中有两个节点，你的索引将包括5个主分片和另外5个从分片，这样每个索引就是10个分片。

查看、创建索引(index)

ElasticSearch的索引即index，可以理解为MySQL的 Table (6.x之前可以理解为database， 6.x之后可理解为table)。

查看索引

  curl -XGET 'localhost:9200/_cat/indices?v&pretty'
  health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

此时还没任何索引。

创建索引

  curl -XPUT 'localhost:9200/mj2_corp_info?pretty&pretty'

  {
    "acknowledged" : true,
    "shards_acknowledged" : true,
    "index" : "mj2_corp_info"
  }

可以看到输出，索引已经创建。我们再次查看索引：

  curl 'localhost:9200/_cat/indices?v&pretty'
  health status index         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
  yellow open   mj2_corp_info NtZXt8V4QV-kRtbd9BkBAA   1   1          0            0       208b           208b

已经可以看到刚刚创建的mj2_corp_info索引了。至于状态是yellow, 则是因为mj2_corp_info没有分片。ES默认为每个索引创建一个复制分片，但是我们只有一个node，复制分片不能与
主分片在同一个node上面，所以无法创建复制分片，状态就是yellow.

删除索引

  curl -XDELETE 'localhost:9200/mj2_corp_info?pretty'
  {
    "acknowledged" : true
  }

查看、创建映射

ES的映射即mapping，可以将mapping类比为MySQL的表(两者之间还是有很大区别的，见官方文档))。 6.x之前，一个索引可以创建多个mapping，不过6.x以后，一个索引仅能创建一个mapping。

查看mapping

  curl localhost:9200/mj2_corp_info/_mapping?pretty

  {
    "mj2_corp_info" : {
      "mappings" : { }
    }
  }

此时还没有任何mapping

创建mapping

  curl -XPOST http://localhost:9200/mj2_corp_info/_mapping -H 'Content-Type:application/json' -d'
  {
          "properties": {
              "id":{
                  "type":"long"
              },
              "address": {
                  "type": "text",
                  "analyzer": "ik_max_word",
                  "search_analyzer": "ik_smart"
              },
              "build_state": {
                  "type":"integer"
              },
              "corp_type": {
                  "type": "integer"
              },
              "chargeman": {
                  "type":"keyword"
              },
              "name":{
                  "type":"text",
                  "analyzer": "ik_max_word",
                  "search_analyzer": "ik_smart"
              },
              "coalmine_code":{
                  "type":"keyword"
              }
          }
  }'

  {"acknowledged":true}

出现 acknowledged: true 则代表mapping已经创建成功. ES支持的属性类型，请参考官方文档-映射类型

此时再度查看mapping：

  curl localhost:9200/mj2_corp_info/_mapping?pretty
  {
    "mj2_corp_info" : {
      "mappings" : {
        "properties" : {
          "address" : {
            "type" : "text",
            "analyzer" : "ik_max_word",
            "search_analyzer" : "ik_smart"
          },
          "build_state" : {
            "type" : "integer"
          },
          "chargeman" : {
            "type" : "keyword"
          },
          "coalmine_code" : {
            "type" : "keyword"
          },
          "corp_type" : {
            "type" : "integer"
          },
          "id" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text",
            "analyzer" : "ik_max_word",
            "search_analyzer" : "ik_smart"
          }
        }
      }
    }
  }

索引文档

添加文档

  curl -XPUT -H 'Content-Type:application/json' 'http://localhost:9200/mj2_corp_info/_create/1' -d '
   {
      "id":77,
      "address":"赤峰市元宝山区元宝山镇",
      "build_state":3,
      "corp_type":1,
      "chargeman":"祝文东",
      "name":"内蒙古平庄煤业集团有限责任公司元宝山露天煤矿",
      "coalmine_code":"150403B0012000310033"
   }
  '

  {"_index":"mj2_corp_info","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

根据ID获取文档

  curl 'localhost:9200/mj2_corp_info/_doc/1?pretty'
  {
    "_index" : "mj2_corp_info",
    "_type" : "_doc",
    "_id" : "1",
    "_version" : 1,
    "_seq_no" : 0,
    "_primary_term" : 1,
    "found" : true,
    "_source" : {
      "id" : 77,
      "address" : "赤峰市元宝山区元宝山镇",
      "build_state" : 3,
      "corp_type" : 1,
      "chargeman" : "祝文东",
      "name" : "内蒙古平庄煤业集团有限责任公司元宝山露天煤矿",
      "coalmine_code" : "150403B0012000310033"
    }
  }

添加多个文档(批量)

  curl -XPUT -H 'Content-Type:application/json' 'http://localhost:9200/mj2_corp_info/_bulk?pretty' -d '
   {"index": {"_index":"mj2_corp_info"}}
   {"id":146,"address":"鄂尔多斯市准格尔旗纳日松镇","build_state":3,"corp_type":3,"chargeman":"马军","name":"内蒙古汇能煤电集团羊市塔煤炭有限责任公司二矿","coalmine_code":"150622B0016000200108"}
   {"index": {"_index":"mj2_corp_info"}}
   {"id":223,"address":"内蒙古自治区鄂尔多斯市准格尔旗大路镇","build_state":3,"corp_type":3,"chargeman":"段心灵","name":"内蒙古三维资源集团小鱼沟煤炭有限公司","coalmine_code":"150622B0016000200030"}
   {"index": {"_index":"mj2_corp_info"}}
   {"id":229,"address":"准格尔旗纳日松镇","build_state":3,"corp_type":3,"chargeman":"马军","name":"内蒙古汇能煤电集团羊市塔煤炭有限责任公司一矿","coalmine_code":"150622B0016000200017"}
  '

  {
    "took" : 173,
    "errors" : false,
    "items" : [
      {
        "index" : {
          "_index" : "mj2_corp_info",
          "_type" : "_doc",
          "_id" : "WL_ZUnQBQ1l86axHOlC2",
          "_version" : 1,
          "result" : "created",
          "_shards" : {
            "total" : 2,
            "successful" : 1,
            "failed" : 0
          },
          "_seq_no" : 1,
          "_primary_term" : 1,
          "status" : 201
        }
      },
      {
        "index" : {
          "_index" : "mj2_corp_info",
          "_type" : "_doc",
          "_id" : "Wb_ZUnQBQ1l86axHOlC8",
          "_version" : 1,
          "result" : "created",
          "_shards" : {
            "total" : 2,
            "successful" : 1,
            "failed" : 0
          },
          "_seq_no" : 2,
          "_primary_term" : 1,
          "status" : 201
        }
      },
      {
        "index" : {
          "_index" : "mj2_corp_info",
          "_type" : "_doc",
          "_id" : "Wr_ZUnQBQ1l86axHOlC8",
          "_version" : 1,
          "result" : "created",
          "_shards" : {
            "total" : 2,
            "successful" : 1,
            "failed" : 0
          },
          "_seq_no" : 3,
          "_primary_term" : 1,
          "status" : 201
        }
      }
    ]
  }

批量接口，也可以通过文件调用：

  curl -XPUT -H 'Content-Type:application/json' 'http://localhost:9200/mj2_corp_info/_bulk?pretty' --data-binary '@corp_info.json'

详细的批量语法，请参考官方文档-文档批量操作

搜索文档

根据单一字段查询

根据煤矿编码查询

  curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d ' 
  {
      "query":{
          "match":{
              "coalmine_code":"152921B0016000110366"
          }
      }
  }
  '    

 {
   "took" : 6,
   "timed_out" : false,
   "_shards" : {
     "total" : 1,
     "successful" : 1,
     "skipped" : 0,
     "failed" : 0
   },
   "hits" : {
     "total" : {
       "value" : 1,
       "relation" : "eq"
     },
     "max_score" : 5.8309045,
     "hits" : [
       {
         "_index" : "mj2_corp_info",
         "_type" : "_doc",
         "_id" : "ab_mUnQBQ1l86axHBVCp",
         "_score" : 5.8309045,
         "_source" : {
           "corp_type" : "3",
           "address" : "内蒙古自治区阿拉善左旗温都尔勒图镇",
           "build_state" : "4",
           "coalmine_code" : "152921B0016000110366",
           "chargeman" : "王成滨",
           "name" : "阿拉善左旗青岭煤炭有限责任公司煤矿",
           "id" : 1031
         }
       }
     ]
   }
 }

根据煤矿地址查询

  //查询煤矿地址中包含 关键字 “鄂尔多斯的”
  curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d ' 
  {
      "query":{
          "match":{
              "address":"鄂尔多斯"
          }
      }
  }
  '

根据煤矿原始ID查询

  curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d ' 
  {
      "query":{
          "match":{
              "id": 1422
          }
      }
  }
  '

复合查询

ES中的查询操作分为2种：查询（query）和过滤（filter）。查询即是之前提到的query查询，它（查询）默认会计算每个返回文档的得分，然后根据得分排序。
而过滤（filter）只会筛选出符合的文档，并不计算得分，且它可以缓存文档。所以，单从性能考虑，过滤比查询更快。

bool组合查询

bool查询可以组合多种叶子查询，包含如下：

must: 出现于匹配查询当中，有助于匹配度(_score)的计算
filter: 必须满足才能出现，属于过滤，不会影响分值的计算，但是会过滤掉不符合的数据
should: 该条件下的内容是应该满足的内容，如果符合会增加分值，不符合降低分值，不会不显示
must_not: 满足的内容不会出现，与filter功能相反，属于过滤，不会影响分值的计算，但是会过滤掉不符合的数据

示例

查询地址包含 “鄂尔多斯” 且 build_state 为 3 的数据

SQL: where address like '%鄂尔多斯%' and build_state = 3

  curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d '
  {   
      "from":5,
      "size": 20,
      "query":{
          "bool":{
              "filter":[
                  {"term": {"address": "鄂尔多斯"}},
                  {"term": {"build_state": 3}}
              ]
          }
      }
  }
  '

参数中的 from 和 size 是用于分页参数。

或另一种写法

 curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d '
 {   
   "from":5,
   "size": 5,
   "query":{
       "bool":{
           "must":[
               {"match": {"address": "鄂尔多斯"}},
               {"term": {"build_state": 3}}
           ]
       }
   }
 }
 '

match 与 term 区别： match会对输入的值进行分词，而term不会

查询地址包含鄂尔多斯且 build_state 不为 3 的数据且 corp_type 为 1 的数据

SQL: where address like '%鄂尔多斯%' and build_state != 3 and corp_type = 1

   curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d '
   {   
     "from":5,
     "size": 5,
     "query":{
         "bool":{
             "must":[
                 {"match": {"address": "鄂尔多斯"}},
                 {"match": {"corp_type": 1}}
             ],
             "must_not": [
                  {"term": {"build_state": 3}}
             ]
         }
     }
   }
   '

或

   curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d '
   {   
     "from":5,
     "size": 5,
     "query":{
         "bool":{
             "must":[
                 {"match": {"address": "鄂尔多斯"}}
             ],
             "must_not": [
                  {"term": {"build_state": 3}}
             ],
             "filter": [
                  {"term": {"corp_type": 1}}
             ]
         }
     }
   }
   '

查询地址包含鄂尔多斯且 build_state 为 1,2,3 的数据且 corp_type 为 1,3 的数据

SQL: where address like '%鄂尔多斯%' and build_state in (1,2,3) and corp_type in (1,3)

  curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d '
   {   
     "from":5,
     "size": 5,
     "query":{
         "bool":{
             "must":[
                 {"match": {"address": "鄂尔多斯"}},
                 {"terms": {"corp_type": [1,3]}},
                 {"terms": {"build_state": [1,2,3]}}
             ]
         }
     }
   }
   '

查询地址包含鄂尔多斯且 build_state 不为 1,2,3 的数据且 corp_type 为 1,3 的数据

SQL: where address like '%鄂尔多斯%' and build_state not in (1,2,3) and corp_type in (1,3)

  curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d '
   {   
     "from":5,
     "size": 5,
     "query":{
         "bool":{
             "must":[
                 {"match": {"address": "鄂尔多斯"}},
                 {"terms": {"corp_type": [1,3]}}
             ],
             "must_not": [
                  {"terms": {"build_state": [1,2,3]}}
             ]
         }
     },
     "sort": [
          {"id": "desc"}               
     ]
   }
   '

聚合

查询地址包含鄂尔多斯的不同build_state的煤矿数量

SQL: select build_state, count(*) from mj2_corp_info where address like '%鄂尔多斯%' group by build_state

  curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d '
  {   
      "query":{
          "bool":{
              "must":[
                  {"term": {"address": "鄂尔多斯"}}
              ]
          }
      },
      "aggs": {
          "build_state_count": {
              "terms": {"field": "build_state"}
          }
      }
  }
  '

查询每个 build_state 的最大id与最小id

SQL: select build_state, min(id),max(id) from mj2_corp_info group by build_state

  curl -XGET -H 'Content-Type:application/json' 'localhost:9200/mj2_corp_info/_search?pretty' -d '
  {   
      "aggs": {
          "build_state_count": {
              "terms": {"field": "build_state"},
              "aggs": {
                  "min_id": {
                      "min": {"field": "id"}
                  },
                  "max_id": {
                      "max": {"field": "id"}
                  }
              }
          }
      }
  }
  '

SQL查询支持

ElasticSearch也支持SQL语句查询，例如：

curl -XGET -H 'Content-Type: application/json' 'localhost:9200/_sql?pretty' -d '
 {
    "query": "select * from mj2_corp_info where build_state in (1,2,3)",
    "fetch_size": 5,
    "filter": {
        "term": {"address": "鄂尔多斯"}
    }
 }
 '

或聚合

curl -XGET -H 'Content-Type: application/json' 'localhost:9200/_sql?pretty' -d '
{
   "query": "select build_state, min(id),max(id) from mj2_corp_info group by build_state",
   "fetch_size": 5
}
'

关于SQL的完整文档，请查看X-Pack Sql Access

ES所有的REST API 可参考官方文档

Q.E.D.

Hi,Friend