转载:https://blog.csdn.net/gwd1154978352/article/details/83343933
Elasticsearch中,内置了很多分词器(analyzers),例如standard (标准分词器)、english(英文分词)和chinese (中文分词),默认的是standard,
standard tokenizer:以单词边界进行切分 standard token filter:什么都不做 lowercase token filter:将所有字母转换为小写 stop token filer(默认被禁用):移除停用词,比如a the it等等修改分词器设置
启用english,停用词token filter
PUT /my_index { "settings": { "analysis": { "analyzer": { "es_std": { "type": "standard", "stopwords": "_english_" } } } } }标准分词测试代码
GET /my_index/_analyze { "analyzer": "standard", "text": "a dog is in the house" }结果
{ "tokens": [ { "token": "a", "start_offset": 0, "end_offset": 1, "type": "<ALPHANUM>", "position": 0 }, { "token": "dog", "start_offset": 2, "end_offset": 5, "type": "<ALPHANUM>", "position": 1 }, { "token": "is", "start_offset": 6, "end_offset": 8, "type": "<ALPHANUM>", "position": 2 }, { "token": "in", "start_offset": 9, "end_offset": 11, "type": "<ALPHANUM>", "position": 3 }, { "token": "the", "start_offset": 12, "end_offset": 15, "type": "<ALPHANUM>", "position": 4 }, { "token": "house", "start_offset": 16, "end_offset": 21, "type": "<ALPHANUM>", "position": 5 } ] }设置的英文分词测试代码
GET /my_index/_analyze { "analyzer": "es_std", "text":"a dog is in the house" }结果
{ "tokens": [ { "token": "dog", "start_offset": 2, "end_offset": 5, "type": "<ALPHANUM>", "position": 1 }, { "token": "house", "start_offset": 16, "end_offset": 21, "type": "<ALPHANUM>", "position": 5 } ] }内容解析
测试代码
GET /my_index/_analyze { "text": "tom&jerry are a friend in the house, <a>, HAHA!!", "analyzer": "my_analyzer" }测试结果
{ "tokens": [ { "token": "tomandjerry", "start_offset": 0, "end_offset": 9, "type": "<ALPHANUM>", "position": 0 }, { "token": "are", "start_offset": 10, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 }, { "token": "friend", "start_offset": 16, "end_offset": 22, "type": "<ALPHANUM>", "position": 3 }, { "token": "in", "start_offset": 23, "end_offset": 25, "type": "<ALPHANUM>", "position": 4 }, { "token": "house", "start_offset": 30, "end_offset": 35, "type": "<ALPHANUM>", "position": 6 }, { "token": "HAHA", "start_offset": 42, "end_offset": 46, "type": "<ALPHANUM>", "position": 7 } ] }type中的使用
PUT /my_index/_mapping/my_type { "properties": { "content": { "type": "text", "analyzer": "my_analyzer" } } }