了解其他网络请求模块

# 了解其他网络请求模块

# 目标

知道tornadohttp、twistid.web.client、aiohttp、grequests的简单用法

# 这一小节带领大家了解一下除requests、urllib爬虫常用网络请求模块以外的其他模块：tornadohttp、twistid.web.client、aiohttp、grequests

# 1. grequests

pypi模块网站：https://pypi.org/project/grequests/ (opens new window) grequests（异步）利用 requests和gevent库，做了一个简单封装，用法如下：

import grequests

urls = [
    'http://www.heroku.com',
    'http://python-tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://fakedomain/',
    'http://kennethreitz.com'
]

rs= (grequests.get(u) for u in urls) 
for resp in grequests.map(rs): # grequests.map(rs)返回由response对象组成的列表
    print(resp.content) # 用法与requests模块相同

1
2
3
4
5
6
7
8
9
10
11
12
13
14

# 2. aiohttp

aiohttp则是基于asyncio实现的HTTP框架，可以用来发送网络请求并获取响应。一般常见以python3.5+的 async、await等关键字配合使用

# 2.1 基本用法：

import aiohttp
async with aiohttp.get('https://github.com') as r:
        await r.text(encoding='utf-8')
        # await resp.read() # 也可以选择不编码，如读取图像

1
2
3
4

# 2.2 设置timeout

import aiohttp
with aiohttp.Timeout(0.001):
    async with aiohttp.get('https://github.com') as r:
        await r.text()

1
2
3
4

# 2.3 使用session

这里要引入一个类，aiohttp.ClientSession. 首先要建立一个session对象，然后用该session对象去打开网页。session可以进行多项操作，比如post, get, put, head等等，如下面所示：

import aiohttp
async with aiohttp.ClientSession() as session:
    async with session.get('https://api.github.com/events') as resp:
    # async with session.post('http://httpbin.org/post', data=b'data') as resp: # 发送post请求
        print(resp.status)
        print(await resp.text())

1
2
3
4
5
6

# 2.4 自定义headers

将headers放于session.get/post的选项中即可。注意headers接收的是一个字典

url = 'https://api.github.com/some/endpoint'
headers = {'content-type': 'application/json'}
await session.get(url, headers=headers)

1
2
3

# 2.5 使用代理

要实现这个功能，需要在生产session对象的过程中做一些修改。

conn = aiohttp.ProxyConnector(proxy="http://192.168.0.1:8113")  # 代理ip
session = aiohttp.ClientSession(connector=conn)
async with session.get('http://python.org') as resp:
    print(resp.status)

1
2
3
4

如果代理需要认证，则需要再加一个proxy_auth选项。

conn = aiohttp.ProxyConnector(
    proxy="http://192.168.0.1:8113",
    proxy_auth=aiohttp.BasicAuth('username', 'password')
)
session = aiohttp.ClientSession(connector=conn)
async with session.get('http://python.org') as r:
    assert r.status == 200

1
2
3
4
5
6
7

# 2.6 自定义cookie

同样是在session中做修改。

url = 'http://httpbin.org/cookies'
cookies_dict = {} # 自定义cookie
async with ClientSession(cookies_dict) as session: # 在此处使用自定义的cookie字典
    async with session.get(url) as resp:
        assert await resp.json() == {"cookies": cookies_dict} # 验证

1
2
3
4
5

# 3. tornado.httpclient

tornado(龙卷风)也是一个异步的httpWeb框架，它的httpclient模块可以发送网络请求，用起来像这个样子：

def handle_request(response):
    if response.error:
        print "Error:", response.error
    else:
        print response.body

http_client = AsyncHTTPClient()
http_client.fetch("http://www.google.com/", handle_request)

1
2
3
4
5
6
7
8

具体使用方法参考tornado文档：http://tornado-zh.readthedocs.io/zh/latest/httpclient.html (opens new window)

# 4. twistid.web.client发送请求

scrapy就是基于twistid.web.client发送请求获取响应的

from twisted.internet import reactor
from twisted.web.client import getPage
import urllib.parse

num = 0

a = []

def one_done(arg):
    global num
    print(type(arg))
    print(arg.decode())
    a.append(arg)
    num += 1
    if num == 3:
        reactor.stop()

cookies = {
    b'123': b'654'
}
post_data = urllib.parse.urlencode({'check_data': 'adf'})
post_data = bytes(post_data, encoding='utf8')
headers = {b'Content-Type': b'application/x-www-form-urlencoded'}
for i in range(3):
    response = getPage(bytes('http://dig.chouti.com/login', encoding='utf8'),
                       method=bytes('POST', encoding='utf8'),
                       postdata=post_data,
                       headers=headers,
                       cookies=cookies)
    response.addBoth(one_done)

reactor.run()

print(a)
# 要注意的是，postdata这个字典是直接转换为字符串然后转换为bytes，headers和cookies只是将键和值转换为bytes类型了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

太麻烦了！没关系，我们只是作为了解。平时还是用requests！

# 小结

本小结重点
- 知道tornadohttp、twistidhttp、aiohttp、grequests的简单用法

编辑

← sanic、quart类Flask的异步框架介绍通过Fiddler进行手机抓包→