用SQL高效的取每组最大记录

给定以下用户登录的记录表,为user表的子表,主键id自增,user_id为索引,表中有7k多条数据,每个user_id有10条数据

1
2
3
4
5
6
7
CREATE TABLE `user_signin_record`  (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) NOT NULL,
`active_at` datetime NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
INDEX `IDX_user_signin_record_user_id`(`user_id`) USING BTREE
);

如何用SQL高效的取user_id为1、2、3、4、5,每个用户对应的最近的登录记录?以下是4种解法及其变种,以及对应的性能测试

GROUP BY … HAVING + SUB QUERY 写法

1
2
3
4
5
SELECT a.* 
FROM user_signin_record AS a
GROUP BY a.id
HAVING a.id = (SELECT MAX(id) FROM user_signin_record WHERE user_id = a.user_id)
AND user_id IN (1,2,3,4,5);
  • 实际执行时间:1.918s
  • Explain结果:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY a index PRIMARY,IDX_user_signin_record_user_id PRIMARY 8 7747 100.00 Using where
2 DEPENDENT SUBQUERY user_signin_record ref IDX_user_signin_record_user_id IDX_user_signin_record_user_id 9 func 112 100.00 Using index

变种写法:

1
2
3
4
SELECT a.*
FROM user_signin_record AS a
WHERE id = (SELECT MAX(id) FROM user_signin_record WHERE user_id = a.user_id)
AND user_id IN (1,2,3,4,5);

  • 实际执行时间:1.951s
  • Explain结果:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY a ALL IDX_user_signin_record_user_id 7747 37.28 Using where
2 DEPENDENT SUBQUERY user_signin_record ref IDX_user_signin_record_user_id IDX_user_signin_record_user_id 9 sample.a.user_id 112 100.00 Using index

NOT EXISTS + SUB QUERY 写法

1
2
3
4
SELECT a.*
FROM user_signin_record AS a
WHERE NOT EXISTS (SELECT 1 FROM user_signin_record WHERE user_id = a.user_id AND id > a.id)
AND user_id IN (1,2,3,4,5);
  • 实际执行时间:0.842s
  • Explain结果:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY a ALL IDX_user_signin_record_user_id 7747 37.28 Using where
2 DEPENDENT SUBQUERY user_signin_record ref PRIMARY,IDX_user_signin_record_user_id IDX_user_signin_record_user_id 9 sample.a.user_id 112 33.33 Using where; Using index

SUB QUERY + COUNT 写法

1
2
3
4
SELECT a.*
FROM user_signin_record AS a
WHERE 1 > (SELECT COUNT(*) FROM user_signin_record WHERE user_id = a.user_id AND id > a.id)
AND user_id IN (1,2,3,4,5);

将这里的1换成数字N,你将得到一个分组取每组TOP N的SQL,这也是另一种常见的使用场景

  • 实际执行时间:1.718s
  • Explain结果:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY a ALL IDX_user_signin_record_user_id 7747 37.28 Using where
2 DEPENDENT SUBQUERY user_signin_record ref PRIMARY,IDX_user_signin_record_user_id IDX_user_signin_record_user_id 9 sample.a.user_id 112 33.33 Using where; Using index

JOIN 写法

1
2
3
4
5
SELECT a.*
FROM user_signin_record AS a
JOIN (SELECT MAX(id) AS id, user_id FROM user_signin_record GROUP BY user_id) AS b
ON a.id = b.id
WHERE a.user_id IN (1,2,3,4,5);
  • 实际执行时间:0.024s
  • Explain结果:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY ALL 70 100.00 Using where
1 PRIMARY a eq_ref PRIMARY,IDX_user_signin_record_user_id PRIMARY 8 b.id 1 37.28 Using where
2 DERIVED user_signin_record range IDX_user_signin_record_user_id IDX_user_signin_record_user_id 9 70 100.00 Using index for group-by

变种写法:

1
2
3
4
5
SELECT a.*
FROM user_signin_record AS a,
(SELECT user_id, MAX(id) AS id FROM user_signin_record GROUP BY user_id) AS b
WHERE a.id = b.id
AND a.user_id IN (1,2,3,4,5);

  • 实际执行时间:0.024s
  • Explain结果:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY ALL 70 100.00 Using where
1 PRIMARY a eq_ref PRIMARY,IDX_user_signin_record_user_id PRIMARY 8 b.id 1 37.28 Using where
2 DERIVED user_signin_record range IDX_user_signin_record_user_id IDX_user_signin_record_user_id 9 70 100.00 Using index for group-by

结论

本文中JOIN写法的性能表现最好,不代表在所有场景下都是这样,因为它受多种因素影响,MySQL版本(比如本例使用的mysql版本是5.7.20),表规模,表中数据分布的不同(本例中的分布还算比较均匀),可能导致完全不同的结论。因此最好的方法是,实测一下。当然,如果你不在乎性能,建议选SUB QUERY + COUNT 写法,因为它可以快速扩展为取TOP N的逻辑,满足未来可能出现的需求。