mysql 千万级 模糊查询优化 :全文检索

模糊查询首先会想到使用Like

但是 使用的是 Like %语言% 会使索引失效看,在数据体量到百万千万时候 Like查询 效率就会非常低

下面是一个千万级数据库查询:

mysql> select count(*) from book;
+----------+
| count(*) |
+----------+
| 10134213 |
+----------+
1 row in set (11.77 sec)
mysql> select * from book where name like "%剧本%";
+----------------+-----------------+------+------+------+--------+---------------------+---------------------+
| isbn           | name            | type | hot  | num  | status | create_time         | update_time         |
+----------------+-----------------+------+------+------+--------+---------------------+---------------------+
| 110-110-122-31 | 曹禺剧本选      | Fake |    0 |    5 |      0 | 2021-04-29 14:19:43 | 2021-04-29 14:19:43 |
+----------------+-----------------+------+------+------+--------+---------------------+---------------------+
1 row in set (15.08 sec)

mysql> explain select * from book where name like "%剧本%";
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------------+
|  1 | SIMPLE      | book  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10355497 |    11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

可以考虑全文检索,来优化查询:

MySQL 5.7.6 之前,全文索引只支持英文全文索引,不支持中文全文索引,需要利用分词器把中文段落预处理拆分成单词,然后存入数据库。
从 MySQL 5.7.6 开始,MySQL内置了ngram全文解析器,用来支持中文、日文、韩文分词。

ngram

ngram就是一段文字里面连续的n个字的序列。
ngram全文解析器能够对文本进行分词,每个单词是连续的n个字的序列。
例如,用ngram全文解析器对“恭喜发财”进行分词:

n=1: "恭", "喜", "发", "财" 
n=2: "恭喜", "喜发", "发财" 
n=3: "恭喜发", "喜发财" 
n=4: "恭喜发财"

MySQL 中使用全局变量 ngram_token_size 来配置 ngram 中 n 的大小,它的取值范围是1到10,默认值是 2。通常ngram_token_size设置为要查询的单词的最小字数。如果需要搜索单字,就要把ngram_token_size设置为1。在默认值是2的情况下,搜索单字是得不到任何结果的。因为中文单词最少是两个汉字,推荐使用默认值2。

  • 方法一
    在修改MySQL配置文件 my.ini 中,末尾增加一行 ngram_token_size 的参数设置:

    ngram_token_size=2
  • 方法二
    使用启动命令 mysqld 时,传参如下:

    mysqld --ngram_token_size=2

创建全文索引

alter table `book` add fulltext index fulltext_name(`name`) WITH PARSER ngram; 

查询

mysql> select * from book where name like "%脚本%";
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
| isbn           | name      | type   | hot  | num  | status | create_time         | update_time         |
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
| 110-120-119-15 | Lua脚本   | 编程   |    4 |    5 |      0 | 2021-04-01 00:00:00 | 2021-04-01 00:00:00 |
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
1 row in set (15.30 sec)

mysql> SELECT * FROM book WHERE MATCH (`name`) AGAINST ("脚本");
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
| isbn           | name      | type   | hot  | num  | status | create_time         | update_time         |
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
| 110-120-119-15 | Lua脚本   | 编程   |    4 |    5 |      0 | 2021-04-01 00:00:00 | 2021-04-01 00:00:00 |
+----------------+-----------+--------+------+------+--------+---------------------+---------------------+
1 row in set (0.01 sec)

mysql> explain SELECT * FROM book WHERE MATCH (`name`) AGAINST ("脚本");
+----+-------------+-------+------------+----------+---------------+----------+---------+-------+------+----------+-------------------------------+
| id | select_type | table | partitions | type     | possible_keys | key      | key_len | ref   | rows | filtered | Extra                         |
+----+-------------+-------+------------+----------+---------------+----------+---------+-------+------+----------+-------------------------------+
|  1 | SIMPLE      | book  | NULL       | fulltext | ft_index      | ft_index | 0       | const |    1 |   100.00 | Using where; Ft_hints: sorted |
+----+-------------+-------+------------+----------+---------------+----------+---------+-------+------+----------+-------------------------------+
1 row in set, 1 warning (0.01 sec)
文章目录