Author Archives: rrdba

Group中的最大值

问题描述

某年某月某日某MySQL DBA的Q群中,说出一道面试题,求根据列A聚集(Aggregate)后B列中最大(或最小)值所对应的C列,为了提高难度,要求不要用关联和子查询。

场景举例

求用户最后登录IP:表login_log有三个字段(user, login_time, login_ip),记录用户登录log,求每个用户最后一次登录的IP,即按user聚集后最大的login_time所对应的login_ip。

解决方案

数据准备

我用的是MariaDB,在使用上,跟MySQL区别不大,命令提示符是MariaDB [test]>,test是数据库名。

执行以下SQL:

MariaDB [test]> create table login_log (user int not null, login_time datetime, login_ip varchar(15) ); 
Query OK, 0 rows affected (0.02 sec)

MariaDB [test]> desc login_log;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| user       | int(11)     | NO   |     | NULL    |       |
| login_time | datetime    | YES  |     | NULL    |       |
| login_ip   | varchar(15) | YES  |     | NULL    |       |
+------------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

MariaDB [test]> insert into login_log 
 (user, login_time, login_ip) values 
 (1, '2013-01-01', '192.168.1.110'), 
 (1, '2013-06-05', '192.168.1.119'), 
 (2, '2013-01-01', '192.168.1.112'), 
 (2, '2013-06-06', '192.168.1.120'); 
 uery OK, 4 rows affected (0.00 sec)
 Records: 4  Duplicates: 0  Warnings: 0

 MariaDB [test]> select * from login_log;
+------+---------------------+---------------+
| user | login_time          | login_ip      |
+------+---------------------+---------------+
|    1 | 2013-01-01 00:00:00 | 192.168.1.110 |
|    1 | 2013-06-05 00:00:00 | 192.168.1.119 |
|    2 | 2013-01-01 00:00:00 | 192.168.1.112 |
|    2 | 2013-06-06 00:00:00 | 192.168.1.120 |
+------+---------------------+---------------+
4 rows in set (0.00 sec)

错误的做法

MariaDB [test]> select user, max(login_time), login_ip from login_log group by user;
+------+---------------------+---------------+
| user | max(login_time)     | login_ip      |
+------+---------------------+---------------+
|    1 | 2013-06-05 00:00:00 | 192.168.1.110 |
|    2 | 2013-06-06 00:00:00 | 192.168.1.112 |
+------+---------------------+---------------+
2 rows in set (0.02 sec)

没有出现我们想要的结果,严格来讲,这个SQL是不合格的,没有在group by中列出的列是不能出现在select list中,只是MySQL默认的sql_mode ONLY_FULL_GROUP_BY 没有打开,允许可以这么写而,不然

select max(login_time), min(login_time), login_ip from login_log group by user;

这个SQL你期望login_ip是最大值还是最小值呢?

正确的做法

MariaDB [test]> select user, substring_index(group_concat(login_ip order by login_time desc),',',1) last_ip from login_log group by 1;
+------+---------------+
| user | last_ip       |
+------+---------------+
|    1 | 192.168.1.119 |
|    2 | 192.168.1.120 |
+------+---------------+
2 rows in set (0.02 sec)

看起来不错,确实是我们所想要的数据。这条sql使用了group_concat函数将所有的login_ip用’,’拼成一个字符串,并按最后登录时间排倒序,最后用substring_index将第一个’,’前的字符串提取出来。

函数解释

group_concat

一个聚集函数,可以把某个列或某几列中的数据连接到一起,可以根据其它字段排序,可以指定连接符号。

MariaDB [test]> select user, group_concat(login_ip order by login_time desc) all_ip from login_log group by 1;
+------+-----------------------------+
| user | all_ip                      |
+------+-----------------------------+
|    1 | 192.168.1.119,192.168.1.110 |
|    2 | 192.168.1.120,192.168.1.112 |
+------+-----------------------------+
2 rows in set (0.01 sec)

MariaDB [test]> select user, group_concat(login_ip order by login_time desc SEPARATOR '##') all_ip from login_log group by 1;
+------+------------------------------+
| user | all_ip                       |
+------+------------------------------+
|    1 | 192.168.1.119##192.168.1.110 |
|    2 | 192.168.1.120##192.168.1.112 |
+------+------------------------------+
2 rows in set (0.00 sec)

substring_index

一个子字符串提取函数,有三个参数,用法是substring_index(str,delim,count),提取str中由第count个分割符delim前的字符串。如果count为负数,则从右往左截取。

MariaDB [test]> select substring_index('a,b,c,d', ',', 2);/*第2个,前的字符串。*/
+------------------------------------+
| substring_index('a,b,c,d', ',', 2) |
+------------------------------------+
| a,b                                |
+------------------------------------+
1 row in set (0.00 sec)

MariaDB [test]> select substring_index('user@renren.com', '@', -1);/*从右往左第1个@后的字符串。*/
+---------------------------------------------+
| substring_index('user@renren.com', '@', -1) |
+---------------------------------------------+
| renren.com                                  |
+---------------------------------------------+
1 row in set (0.00 sec)

扩展阅读

1 2