分析 nginx 的日志,目前把把数据按照需求,处理成[(requesturl1,responsetime1),(requesturl2,responsetime2),.......].
现在想要统计下 url 的次数,以及 avg responsetime,并排序.
我能想到的就是 for 循环,感觉有点 low,请教大家一下更好的办法,多谢!
a =[["d",2],["c",5],["a",9],["b",4],["b",2],["c",9]]
uniqs = list(set([x[0] for x in a]))
res = {}
for i in uniqs:
count = 0
sumtime = 0
for j in a:
if i == j[0]:
count = count + 1
sumtime = sumtime + j[1]
res[i] = [count,sumtime]
lists = []
for i in res.keys():
lists.append((i,res[i][0],res[i][1]/res[i][0]))
print(sorted(lists,key=lambda x:x[1],reverse=True))
1
chenstack 2019-02-25 21:42:02 +08:00 1
稍微减少了几行
from itertools import groupby from operator import itemgetter a = [["d", 2], ["c", 5], ["a", 9], ["b", 4], ["b", 2], ["c", 9]] a = sorted(a, key=itemgetter(0)) lists = [] for key, group in groupby(a, itemgetter(0)): time_list = [item[1] for item in group] lists.append((key, len(time_list), sum(time_list) / len(time_list))) print(sorted(lists, key=itemgetter(1), reverse=True)) |
2
zzz686970 2019-02-25 22:25:43 +08:00 1
```py
import collections a = [["d", 2], ["c", 5], ["a", 9], ["b", 4], ["b", 2], ["c", 9]] res = collections.defaultdict(list) x, y = zip(*a) for i in range(len(x)): res[x[i]] += y[i], print(sorted([(key, len(value), sum(value)/len(value)) for key, value in res.items()] , key = lambda x: (x[1]), reverse = True)) ``` 试了一下 collections |
3
necomancer 2019-02-26 13:26:26 +08:00 1
import pandas as pd
a =[["d",2],["c",5],["a",9],["b",4],["b",2],["c",9]] d = pd.DataFrame(a) [ (_[0], _[1].mean().get_values()) for _ in d.groupby(0)] Out: [('a', array([9.])), ('b', array([3.])), ('c', array([7.])), ('d', array([2.]))] |
4
chenstack 2019-02-26 14:33:26 +08:00 1
似乎还能更短, 要用 Python3
from itertools import groupby from operator import itemgetter a = [["d", 2], ["c", 5], ["a", 9], ["b", 4], ["b", 2], ["c", 9]] print(sorted([(key, len(group), sum(item[1] for item in group) / len(group)) for key, (*group,) in groupby(sorted(a, key=itemgetter(0)), itemgetter(0))], key=itemgetter(1), reverse=True)) |
5
coolloves OP @necomancer 请教,pd 这个没有次数统计.如何统计呢,我也在看 pandas
|
6
necomancer 2019-02-26 16:06:31 +08:00
@coolloves groupby 不就是次数统计了么……你想实现什么的计数? pandas 有 unique 和 nunique 方法。
|
7
princelai 2019-02-28 10:41:39 +08:00 1
```
import pandas as pd a =[["d",2],["c",5],["a",9],["b",4],["b",2],["c",9]] df = pd.DataFrame(a) df.groupby(0).agg(["count","mean"]) ``` Out[5]: 1 count mean 0 a 1 9 b 2 3 c 2 7 d 1 2 |