自动补全
当用户在搜索框输入字符时,应提示与该字符有关的搜索项,如图:

这种根据用户输入字母提示完整词条的功能,就是自动补全。因为需要根据拼音字母推断,所以需要用到拼音分词功能。
测试用法如下:
1 2 3 4 5
| POST /_analyze { "text": "如家酒店还不错", "analyzer": "pinyin" }
|
结果:

0. 自定义分词器
默认的拼音分词器会将每个汉字单独分为拼音,而我们希望每个词条形成一组拼音,需要对拼音分词器进行个性化定制。
Elasticsearch 中分词器(analyzer)由三部分组成:
character filters:在 tokenizer 之前对文本进行处理,例如删除字符、替换字符tokenizer:将文本按一定规则切割成词条(term),例如 keyword(不分词)、ik_smart[tokenizer] filter:对 tokenizer 输出的词条做进一步处理,例如大小写转换、同义词处理、拼音处理等
文档分词时依次经过这三个部分处理:

声明自定义分词器的语法如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
| PUT /test { "settings": { "analysis": { "analyzer": { "my_analyzer": { "char_filter": [ "emoticons" ], "tokenizer": "ik_max_word", "filter": ["py"] } }, "char_filter": { "emoticons": { "type": "mapping", "mappings": [ ":) => _happy_", ":( => _sad_" ] } }, "filter": { "py": { "type": "pinyin", "keep_full_pinyin": false, "keep_joined_full_pinyin": true, "keep_original": true, "limit_first_letter_length": 16, "remove_duplicated_term": true, "none_chinese_pinyin_tokenize": false } } } }, "mappings": { "properties": { "name": { "type": "text", "analyzer": "my_analyzer", "search_analyzer": "ik_smart" } } } }
|
测试:

小结:
| 问题 | 说明 |
|---|
| 如何使用拼音分词器 | 下载插件 → 解压到 ES 的 plugins 目录 → 重启 |
| 如何自定义分词器 | 创建索引库时在 settings.analysis 中配置 analyzer、char_filter、filter |
| 拼音分词器注意事项 | 为避免搜索到同音字,搜索时不要使用拼音分词器 |
分析器的两种使用场景
分析器主要在两种场景下被使用:
- 索引时:插入文档时对
text 类型字段分词后写入倒排索引,只看字段上的 analyzer 配置 - 查询时:对查询输入分词后再去倒排索引搜索,优先使用
search_analyzer,其次 analyzer,最后才用 ES 默认设置
如需索引和查询使用不同分词器,在字段上加 search_analyzer 参数即可。
1. 自动补全查询
Elasticsearch 提供了 Completion Suggester 查询来实现自动补全功能,会匹配以用户输入内容开头的词条并返回。字段约束:
- 参与补全查询的字段必须是
completion 类型 - 字段内容一般是由多个补全词条组成的数组
比如,一个这样的索引库:
1 2 3 4 5 6 7 8 9 10 11
| PUT test { "mappings": { "properties": { "title":{ "type": "completion" } } } }
|
然后插入下面的数据:
1 2 3 4 5 6 7 8 9 10 11 12 13
| POST test/_doc { "title": ["Sony", "WH-1000XM3"] } POST test/_doc { "title": ["SK-II", "PITERA"] } POST test/_doc { "title": ["Nintendo", "switch"] }
|
查询的DSL语句如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| GET /test/_search { "suggest": { "title_suggest": { "text": "s", "completion": { "field": "title", "skip_duplicates": true, "size": 10 } } } }
|
2. 实现酒店搜索框自动补全
hotel 索引库尚未设置拼音分词器,需要修改配置(索引库不可修改,只能删除后重新创建)。同时需要添加 suggestion 补全字段,将 brand、business、city 等放入其中。
需要完成以下步骤:
- 修改
hotel 索引库结构,配置自定义拼音分词器 - 修改
name、all 字段,使用自定义分词器 - 添加
suggestion 字段(completion 类型),使用自定义分词器 - 给
HotelDoc 类添加 suggestion 字段,内容包含 brand、business - 重新导入数据到
hotel 索引库
2.1. 修改酒店映射结构
代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
| PUT /hotel { "settings": { "analysis": { "analyzer": { "text_anlyzer": { "tokenizer": "ik_max_word", "filter": "py" }, "completion_analyzer": { "tokenizer": "keyword", "filter": "py" } }, "filter": { "py": { "type": "pinyin", "keep_full_pinyin": false, "keep_joined_full_pinyin": true, "keep_original": true, "limit_first_letter_length": 16, "remove_duplicated_term": true, "none_chinese_pinyin_tokenize": false } } } }, "mappings": { "properties": { "id":{ "type": "keyword" }, "name":{ "type": "text", "analyzer": "text_anlyzer", "search_analyzer": "ik_smart", "copy_to": "all" }, "address":{ "type": "keyword", "index": false }, "price":{ "type": "integer" }, "score":{ "type": "integer" }, "brand":{ "type": "keyword", "copy_to": "all" }, "city":{ "type": "keyword" }, "starName":{ "type": "keyword" }, "business":{ "type": "keyword", "copy_to": "all" }, "location":{ "type": "geo_point" }, "pic":{ "type": "keyword", "index": false }, "all":{ "type": "text", "analyzer": "text_anlyzer", "search_analyzer": "ik_smart" }, "suggestion":{ "type": "completion", "analyzer": "completion_analyzer" } } } }
|
2.2. 修改 HotelDoc 实体
在 HotelDoc 中添加 suggestion 字段(类型 List<String>),将 brand、city、business 等信息放入其中:
代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
| package cn.itcast.hotel.pojo;
import lombok.Data; import lombok.NoArgsConstructor;
import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.List;
@Data @NoArgsConstructor public class HotelDoc { private Long id; private String name; private String address; private Integer price; private Integer score; private String brand; private String city; private String starName; private String business; private String location; private String pic; private Object distance; private Boolean isAD; private List<String> suggestion;
public HotelDoc(Hotel hotel) { this.id = hotel.getId(); this.name = hotel.getName(); this.address = hotel.getAddress(); this.price = hotel.getPrice(); this.score = hotel.getScore(); this.brand = hotel.getBrand(); this.city = hotel.getCity(); this.starName = hotel.getStarName(); this.business = hotel.getBusiness(); this.location = hotel.getLatitude() + ", " + hotel.getLongitude(); this.pic = hotel.getPic(); if(this.business.contains("/")){ String[] arr = this.business.split("/"); this.suggestion = new ArrayList<>(); this.suggestion.add(this.brand); Collections.addAll(this.suggestion, arr); }else { this.suggestion = Arrays.asList(this.brand, this.business); } } }
|
2.3. 重新导入
重新执行导入数据功能,新的酒店数据中将包含 suggestion 字段:

2.4. 自动补全查询的 Java API
自动补全查询的 Java API 示例如下:

自动补全结果的解析代码如下:

2.5. 实现搜索框自动补全
当用户在输入框键入时,前端会发起 Ajax 请求:

返回值是补全词条的集合,类型为List<String>
1)在cn.itcast.hotel.web包下的HotelController中添加新接口,接收新的请求:
1 2 3 4
| @GetMapping("suggestion") public List<String> getSuggestions(@RequestParam("key") String prefix) { return hotelService.getSuggestions(prefix); }
|
- 在
IHotelService 中添加方法:
1
| List<String> getSuggestions(String prefix);
|
- 在
HotelService 中实现该方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
| @Override public List<String> getSuggestions(String prefix) { try { SearchRequest request = new SearchRequest("hotel"); request.source().suggest(new SuggestBuilder().addSuggestion( "suggestions", SuggestBuilders.completionSuggestion("suggestion") .prefix(prefix) .skipDuplicates(true) .size(10) )); SearchResponse response = client.search(request, RequestOptions.DEFAULT); Suggest suggest = response.getSuggest(); CompletionSuggestion suggestions = suggest.getSuggestion("suggestions"); List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions(); List<String> list = new ArrayList<>(options.size()); for (CompletionSuggestion.Entry.Option option : options) { String text = option.getText().toString(); list.add(text); } return list; } catch (IOException e) { throw new RuntimeException(e); } }
|