- Title(EN): asyncio and asyncio.Semaphore in Python 3.8+
- Author: dog2
以前的程序放在Python 3.8里跑出错了,原来是由于Python升级3.8后协程库asyncio
又双叒叕更新了。 新版本里asyncio.Semaphore
的用法改变了,本文简单记录一下新写法。
代码说明:用支持异步的http库httpx简单爬数据,用asyncio.Semaphore
控制并发数,而asyncio.Semaphore
在Python 3.8中需要配合上下文管理器contextvars.ContextVar
使用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| import httpx import asyncio
from contextvars import ContextVar
def crawl(concurrency=3): context = ContextVar("concurrent") URL_BASE = 'https://github.com/topics?page='
async def crawl_one(i): sem = context.get() async with sem: async with httpx.AsyncClient() as client: r = await client.get(f"{URL_BASE}{i}") return len(r.text)
async def crawl_all(): context.set(asyncio.Semaphore(concurrency)) tasks = [asyncio.create_task(crawl_one(i)) for i in range(1, 30)] done, pending = await asyncio.wait(tasks) return done
tasks_done = asyncio.run(crawl_all()) return [t.result() for t in tasks_done]
if __name__ == '__main__': for r in crawl(concurrency=1): print(r)
|
参考链接: - Python 协程模块 asyncio 使用指南