mysql 千万级 模糊查询优化 :全文检索
mysql 千万级 模糊查询优化 :全文检索
模糊查询首先会想到使用Like
但是 使用的是 Like %语言%
会使索引失效看,在数据体量到百万千万时候 Like查询 效率就会非常低
下面是一个千万级数据库查询:
mysql> select count(*) from book;
+----------+
| count(*) |
+----------+
| 10134213 |
+----------+
1 row in set (11.77 sec)
mysql> select * from book where name like "%剧本%";
+----------------+-----------------+------+------+------+--------+---------------------+---------------------+
| isbn | name | type | hot | num | status | create_time | update_time |
+----------------+-----------------+------+------+------+--------+---------------------+---------------------+
| 110-110-122-31 | 曹禺剧本选 | Fake | 0 | 5 | 0 | 2021-04-29 14:19:43 | 2021-04-29 14:19:43 |
+----------------+-----------------+------+------+------+--------+---------------------+---------------------+
1 row in set (15.08 sec)
mysql> explain select * from book where name like "%剧本%";
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | book | NULL | ALL | NULL | NULL | NULL | NULL | 10355497 | 11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
可以考虑全文检索,来优化查询:
MySQL 5.7.6 之前,全文索引只支持英文全文索引,不支持中文全文索引,需要利用分词器把中文段落预处理拆分成单词,然后存入数据库。
从 MySQL 5.7.6 开始,MySQL内置了ngram全文解析器,用来支持中文、日文、韩文分词。
ngram
ngram就是一段文字里面连续的n个字的序列。
ngram全文解析器能够对文本进行分词,每个单词是连续的n个字的序列。
例如,用ngram全文解析器对“恭喜发财”进行分词:
n=1: "恭", "喜", "发", "财"
n=2: "恭喜", "喜发", "发财"
n=3: "恭喜发", "喜发财"
n=4: "恭喜发财"
MySQL 中使用全局变量 ngram_token_size 来配置 ngram 中 n 的大小,它的取值范围是1到10,默认值是 2。通常ngram_token_size设置为要查询的单词的最小字数。如果需要搜索单字,就要把ngram_token_size设置为1。在默认值是2的情况下,搜索单字是得不到任何结果的。因为中文单词最少是两个汉字,推荐使用默认值2。
方法一
在修改MySQL配置文件 my.ini 中,末尾增加一行 ngram_token_size 的参数设置:ngram_token_size=2
方法二
使用启动命令 mysqld 时,传参如下:mysqld --ngram_token_size=2
创建全文索引
alter table `book` add fulltext index fulltext_name(`name`) WITH PARSER ngram;
查询
mysql> select * from book where name like "%脚本%";
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
| isbn | name | type | hot | num | status | create_time | update_time |
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
| 110-120-119-15 | Lua脚本 | 编程 | 4 | 5 | 0 | 2021-04-01 00:00:00 | 2021-04-01 00:00:00 |
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
1 row in set (15.30 sec)
mysql> SELECT * FROM book WHERE MATCH (`name`) AGAINST ("脚本");
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
| isbn | name | type | hot | num | status | create_time | update_time |
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
| 110-120-119-15 | Lua脚本 | 编程 | 4 | 5 | 0 | 2021-04-01 00:00:00 | 2021-04-01 00:00:00 |
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
1 row in set (0.01 sec)
mysql> explain SELECT * FROM book WHERE MATCH (`name`) AGAINST ("脚本");
+----+-------------+-------+------------+----------+---------------+----------+---------+-------+------+----------+-------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+----------+---------------+----------+---------+-------+------+----------+-------------------------------+
| 1 | SIMPLE | book | NULL | fulltext | ft_index | ft_index | 0 | const | 1 | 100.00 | Using where; Ft_hints: sorted |
+----+-------------+-------+------------+----------+---------------+----------+---------+-------+------+----------+-------------------------------+
1 row in set, 1 warning (0.01 sec)