首页
搜索 搜索
当前位置:关注 > 正文

焦点快播:MySQL Execution Plan--DISTINCT语句优化

2023-04-22 09:42:14 博客园

问题描述

在很多业务场景中业务需要过滤掉重复数据,对于MySQL数据库可以有多种SQL写法能实现这种需求,如:

  • 使用DISTINCT,如:


    (相关资料图)

    SELECT DISTINCT username FROM hotel_owner WHERE username IN ("user001","user002");
  • 使用GROUP BY,如:

    SELECT username FROM hotel_owner WHERE username IN ("user001","user002")GROUP BY username;
  • 使用LIMIT 1,如:

    SELECT username FROM(    SELECT username    FROM hotel_owner     WHERE username = "user001"    LIMIT 1) AS T1UNION ALLSELECT username FROM(    SELECT username    FROM hotel_owner     WHERE username = "user002"    LIMIT 1) AS T2
  • 使用EXIST,如:

    SELECT username FROM (    SELECT "user001" AS username    UNION ALL    SELECT "user002" AS username) AS T1WHERE EXISTS(    SELECT username     FROM hotel_owner AS T2    WHERE T1.username = T1.username)
  • 使用临时变量、使用公共表达式+rownumber(MYSQL 8.0)等其他

当前hotel_owner表上有索引idx_username(username),针对上面两个用户的数据量为:

mysql> SELECT username,count(1) AS usercount    -> FROM hotel_owner    -> WHERE username IN ("user001","user002")    -> GROUP BY username;+-------------+-----------+| username    | usercount |+-------------+-----------+| user002 |     16455 || user001 |     18718 |+-------------+-----------+2 rows in set (0.02 sec)

上面4种SQL都能得到相同的执行结果,但查询性能相差50倍以上。

问题原因

MySQL Server架构可分为MySQL网络连接层、MySQL服务层、MySQL存储引擎层三层:

  • MySQL网络连接层,负责处理客户端请求连接。
  • MySQL服务层,负责解析SQL语句生成直接计划,由查询执行引擎与存储引擎层进行交互处理,将处理结果返回给客户端。
  • MySQL存储引擎层,负责MySQL中数据的存储与提取,与底层系统文件进行交互。MySQL存储引擎是插件式的,服务器中的查询执行引擎通过接口与存储引擎进行通信,接口屏蔽了不同存储引擎之间的差异 。

MySQL查询处理流程如下:

由于MySQL架构的分层设计和不同存储引擎内部实现的差异性,MySQL服务层的查询优化器无法针对某个存储引擎进行定制开发,导致MySQL查询优化器在某些场景下无法生成"相对更优"的执行计划,需要研发人员"使用查询提示"或"改写SQL语句"来改变SQL语句的执行计划和提示SQL语句的执行效率。

通过MySQL内部工具profiling能清楚得到上面四种SQL语句的实际执行耗时:

*************************** 1. row ***************************Query_ID: 1Duration: 0.02456375   Query: SELECT DISTINCT usernameFROM hotel_ownerWHERE username IN ("user001","user002")*************************** 2. row ***************************Query_ID: 2Duration: 0.02770700   Query: SELECT usernameFROM hotel_ownerWHERE username IN ("user001","user002")GROUP BY username*************************** 3. row ***************************Query_ID: 3Duration: 0.00054050   Query: SELECT usernameFROM(    SELECT username    FROM hotel_owner    WHERE username = "user001"    LIMIT 1) AS T1UNION ALLSELECT usernameFROM(    SELECT username    FROM hotel_owner    WHERE username = "user002"    LIMIT 1) AS T2*************************** 4. row ***************************Query_ID: 4Duration: 0.00083600   Query: SELECT usernameFROM (    SELECT "user001" AS username    UNION ALL    SELECT "user002" AS username) AS T1WHERE EXISTS(    SELECT username    FROM hotel_owner AS T2    WHERE T1.username = T1.username)

DISTINCT方式和GROUP BY方式耗时接近,耗时分别为24ms和27ms。

LIMIT 1方式和EXISTS方式耗时接近,耗时分别为0.5毫秒和0.8ms。

其中DISTINCT方式的执行计划和执行成本明细为:

mysql> DESC SELECT DISTINCT username    -> FROM hotel_owner    -> WHERE username IN ("user001","user002") \G*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: hotel_owner   partitions: NULL         type: rangepossible_keys: idx_seq_usr,idx_username          key: idx_username      key_len: 152          ref: NULL         rows: 66282     filtered: 100.00        Extra: Using where; Using index1 row in set, 1 warning (0.00 sec)mysql> SHOW PROFILE CPU,BLOCK IO,SWAPS FOR QUERY 1;+----------------------+----------+----------+------------+--------------+---------------+-------+| Status               | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | Swaps |+----------------------+----------+----------+------------+--------------+---------------+-------+| starting             | 0.000065 |     NULL |       NULL |         NULL |          NULL |  NULL || checking permissions | 0.000006 |     NULL |       NULL |         NULL |          NULL |  NULL || Opening tables       | 0.000017 |     NULL |       NULL |         NULL |          NULL |  NULL || init                 | 0.000024 |     NULL |       NULL |         NULL |          NULL |  NULL || System lock          | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL || optimizing           | 0.000008 |     NULL |       NULL |         NULL |          NULL |  NULL || statistics           | 0.000158 |     NULL |       NULL |         NULL |          NULL |  NULL || preparing            | 0.000018 |     NULL |       NULL |         NULL |          NULL |  NULL || Sorting result       | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL || executing            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL || Sending data         | 0.024214 |     NULL |       NULL |         NULL |          NULL |  NULL || end                  | 0.000003 |     NULL |       NULL |         NULL |          NULL |  NULL || query end            | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL || closing tables       | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL || freeing items        | 0.000021 |     NULL |       NULL |         NULL |          NULL |  NULL || cleaning up          | 0.000008 |     NULL |       NULL |         NULL |          NULL |  NULL |+----------------------+----------+----------+------------+--------------+---------------+-------+16 rows in set, 1 warning (0.00 sec)

LIMIT 1方式的执行成本明细为:

mysql> DESC SELECT username    -> FROM(    ->     SELECT username    ->     FROM hotel_owner    ->     WHERE username = "user001"    ->     LIMIT 1    -> ) AS T1    -> UNION ALL    -> SELECT username    -> FROM(    ->     SELECT username    ->     FROM hotel_owner    ->     WHERE username = "user002"    ->     LIMIT 1    -> ) AS T2 \G*************************** 1. row ***************************           id: 1  select_type: PRIMARY        table:    partitions: NULL         type: systempossible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 1     filtered: 100.00        Extra: NULL*************************** 2. row ***************************           id: 2  select_type: DERIVED        table: hotel_owner   partitions: NULL         type: refpossible_keys: idx_username          key: idx_username      key_len: 152          ref: const         rows: 34788     filtered: 100.00        Extra: Using index*************************** 3. row ***************************           id: 3  select_type: UNION        table:    partitions: NULL         type: systempossible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 1     filtered: 100.00        Extra: NULL*************************** 4. row ***************************           id: 4  select_type: DERIVED        table: hotel_owner   partitions: NULL         type: refpossible_keys: idx_username          key: idx_username      key_len: 152          ref: const         rows: 31494     filtered: 100.00        Extra: Using index4 rows in set, 1 warning (0.00 sec)mysql> SHOW PROFILE CPU,BLOCK IO,SWAPS FOR QUERY 3;+----------------------+----------+----------+------------+--------------+---------------+-------+| Status               | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | Swaps |+----------------------+----------+----------+------------+--------------+---------------+-------+| starting             | 0.000099 |     NULL |       NULL |         NULL |          NULL |  NULL || checking permissions | 0.000003 |     NULL |       NULL |         NULL |          NULL |  NULL || checking permissions | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL || Opening tables       | 0.000060 |     NULL |       NULL |         NULL |          NULL |  NULL || init                 | 0.000066 |     NULL |       NULL |         NULL |          NULL |  NULL || System lock          | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL || optimizing           | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL || optimizing           | 0.000008 |     NULL |       NULL |         NULL |          NULL |  NULL || statistics           | 0.000083 |     NULL |       NULL |         NULL |          NULL |  NULL || preparing            | 0.000018 |     NULL |       NULL |         NULL |          NULL |  NULL || executing            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL || Sending data         | 0.000027 |     NULL |       NULL |         NULL |          NULL |  NULL || statistics           | 0.000005 |     NULL |       NULL |         NULL |          NULL |  NULL || preparing            | 0.000005 |     NULL |       NULL |         NULL |          NULL |  NULL || optimizing           | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL || optimizing           | 0.000003 |     NULL |       NULL |         NULL |          NULL |  NULL || statistics           | 0.000044 |     NULL |       NULL |         NULL |          NULL |  NULL || preparing            | 0.000006 |     NULL |       NULL |         NULL |          NULL |  NULL || executing            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL || Sending data         | 0.000020 |     NULL |       NULL |         NULL |          NULL |  NULL || statistics           | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL || preparing            | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL || executing            | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL || Sending data         | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL || executing            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL || Sending data         | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL || end                  | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL || query end            | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL || removing tmp table   | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL || query end            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL || closing tables       | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL || removing tmp table   | 0.000003 |     NULL |       NULL |         NULL |          NULL |  NULL || closing tables       | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL || removing tmp table   | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL || closing tables       | 0.000009 |     NULL |       NULL |         NULL |          NULL |  NULL || freeing items        | 0.000021 |     NULL |       NULL |         NULL |          NULL |  NULL || cleaning up          | 0.000008 |     NULL |       NULL |         NULL |          NULL |  NULL |+----------------------+----------+----------+------------+--------------+---------------+-------+37 rows in set, 1 warning (0.00 sec)

DISTINCT方式和LIMIT 1方式都使用索引,其中最大耗时差异在Sending data部分。

DISTINCT方式的Sending data部分耗时:

| Sending data         | 0.024214 |     NULL |       NULL |         NULL |          NULL |  NULL |

LIMIT 1方式的Sending data部分耗时:

| Sending data         | 0.000027 |     NULL |       NULL |         NULL |          NULL |  NULL || Sending data         | 0.000020 |     NULL |       NULL |         NULL |          NULL |  NULL || Sending data         | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL |

差异原因:

  • DISTINCT方式需要扫描所有满足WHERE条件的16455+18718条记录并将这些记录返回到MySQL Server层,由MySQL Server层负责数据去重处理(DISTINCT)并返回给客户端。
  • LIMIT 1方式针对每个子查询仅需要扫描到第1条满足WHERE条件的记录并将这些记录返回到MySQL Server层,由MySQL Server层负责数据合并(UNION ALL)并返回给客户端。

当满足WHERE条件的记录较少时,无论使用上述4种SQL种的任意1种方式都能快速返回结果,但随着满足WHERE条件的记录增多时,需要结合实际的业务需求和数据分布来编写"高效SQL"。

优化建议

在编写SQL语句时,不仅需要根据业务需求编写"正确SQL",还需要根据"实际数据分布"编写"高效SQL"。