想从网站日志中提取出,用户的访问记录,不要搜索引擎的,命令怎么写呢。
使用python即可完美提取
这是一般Apache的 Log 内容:
- - [20/Nov/2011:01:10:35 +0100] "GET / HTTP/" 200 259653
- - [20/Nov/2011:01:10:49 +0100] "GET / HTTP/" 304 153
- - [20/Nov/2011:01:10:50 +0100] "GET /2008/1/23/no HTTP/" 404 472
- - [20/Nov/2011:01:10:50 +0100] "GET / HTTP/"
先调出Log文件
with open('/var/log/apache2/') as f:
for line in f:
然后提取用户访问记录
import re
from collections import defaultdict
from heapq import nlargest
with open('log.txt') as f:
count = defaultdict(int)
for line in f:
match = (r' "\w+ (.*?) HTTP/', line)
if match is None:
continue
uri = (1).split('?')[0]
count[uri] = count[uri] + 1
most_common = nlargest(5, (), key=lambda x: x[1])
print most_common
本文来自投稿,不代表微盟圈立场,如若转载,请注明出处:https://www.vm7.com/a/ask/26101.html