表 t

uid	status
1	1
2	2
1	2
1	3
2	3

查询只存在 status=1 的用户，这两种写法有什么区别吗

SELECT * FROM t where `status` = 1 and uid not IN (

SELECT uid FROM  t as t1 WHERE `status` in (2,3) and t.uid = t1.uid
)

SELECT * FROM t where `status` = 1 and uid not IN (

SELECT uid FROM  t as t1 WHERE `status` in (2,3) 
)

场景是查历史用户，即 status != 2,3 的，以上是简化的模型，两个字段都有索引

上面的多了一个子查询和外表的条件，在表数据量大的情况下性能谁更好，leader 说第一种查询效率会更高，因为它只在满足外表的条件下进行了过滤。或者有没有其他性能更好的写法？

8 条回复 • 2023-04-24 14:20:15 +08:00

kkwa56188

2023-04-23 21:16:53 +08:00

不是做 Mysql 的, 感觉是 1 比较快.
其次用 in (value1, value2) 通常是很慢的, 索引不一定管用.

我会换成:
SELECT * FROM t where t.status = 1
and not exists ( SELECT uid FROM t as t1 WHERE status = 2 and t1.uid = t.uid )
and not exists ( SELECT uid FROM t as t2 WHERE t2.status = 3 and t2.uid = t.uid )

joetse

2023-04-23 23:03:51 +08:00 via iPhone

为什么不用 bitmap?

akira

2023-04-23 23:20:47 +08:00

status 区分度这么低，再来个 not in ，性能不好上来吧。试试这么考虑呢，虽然是有点取巧了

select uid, min(status) `mins`, max(status) `maxs`
from t
group by uid
having `mins` = 1 and `maxs` = 1

512357301

2023-04-23 23:29:12 +08:00 via Android

用 group by 呢？
select
uid,
sum(if(status = 1,1,0)) one_cnt,
sum(if(status = 2 or status = 3,1,0)) other_cnt
from t
group by uid
having one_cnt > 0 and other_cnt = 0

没在机器上测试，反正我下意识的写法是这么写，楼主可以测试下。

PS：对楼主和楼上各位的子查询写法感觉很别扭，我一直用不惯子查询嵌套，更习惯 left join ，不过 mysql 出身的貌似都这么写，存在即是合理，我不争辩。

xuanbg

2023-04-24 09:49:51 +08:00

我会写成这样：
with have23 as (SELECT uid FROM t as t1 WHERE `status` in (2,3))
select * from t left join have23 h on h.uid = t.uid where h.uid is null;

Grocker

2023-04-24 09:52:54 +08:00

@akira 实际场景是有 4 种状态，ABCD ，存在 AB 且没有 CD 状态的用户

Grocker

2023-04-24 09:53:54 +08:00

@xuanbg 忘了说了是 mysql5.7

akira

2023-04-24 14:20:15 +08:00

@Grocker 那就分别对 4 个状态做统计，思路是一样的

mysql 查询历史用户的疑问？

表 t