-
使用requests库和urlretrieve下载pdf文件
- 网站名称:使用requests库和urlretrieve下载pdf文件
- 网站分类:技术文章
- 收录时间:2025-08-07 13:39
- 网站地址:
“使用requests库和urlretrieve下载pdf文件” 网站介绍
一、代码如下:
import requests #导入请求库
from urllib.request import urlretrieve #从urllib.request导入下载函数urlretrieve
import re,time #导入正则库和时间库
from lxml import etree #从lxml导入etree类
def gethtml(): #定义函数gethtml用来下载pdf文件
url="http://www.gov.cn/zhengce/pdfFile/downloadFile.htm" #设置请求网址
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"
} #设置请求头headers
response=requests.get(url,headers=headers) #通过headers伪装对网站url进行get请求,并将响应内容赋值给response变量
response.encoding=response.apparent_encoding #根据网页内容解析出网页的编码格式并赋值给响应的编码变量response.encoding
html=response.text #将网页的相应的文本内容赋值给html
html=etree.HTML(html) #对html构造了一个XPath解析对象并对自动修正并赋值给html
result=html.xpath('//tbody/tr') #使用xpath找到tr标签并赋值给result
urllist=[] #定义接收网址的空列表urllist
for info in result: #遍历result里的变量info
try: #尝试操作
urllist.append("http://www.gov.cn"+info.xpath('./td[2]/a/@href')[-1]) #将解析到的td标签的href属性值的最后一个元素与"http://www.gov.cn"相加并添加到列表urllist中
except: #当接收到错误时,
continue #继续执行
# print(urllist)
for downurl in urllist: #遍历urllist列表中的网址downurl
urlretrieve(downurl,"E://IT/PYthon/PYTHON试验/gov/"+downurl.split("/")[-1]) #下载网址downurl,并保存到本机的E://IT/PYthon/PYTHON试验/gov/文件夹下面,文件名用下载网址的最后切割的名称
print("E://IT/PYthon/PYTHON试验/gov/"+downurl.split("/")[-1]+"下载成功") #打印下载成功
time.sleep(0.1) #每执行一次下载休眠0.1秒
gethtml() #调用gethtml函数
二、代码运行结果如下:
E://IT/PYthon/PYTHON试验/gov/PDF_ALL.zip下载成功
E://IT/PYthon/PYTHON试验/gov/2020_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2019_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2018_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2017_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2016_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2015_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2014_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2013_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2012_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2011_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2010_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2009_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2008_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2007_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2006_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2005_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2004_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2003_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2002_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2001_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2000_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1999_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1998_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1997_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1996_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1995_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1994muLu.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1994_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1993_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1992muLu.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1992_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1991_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1990_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1989_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1988_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1987_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1986_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1985_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1984_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1983_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1982_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1981_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1980_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1979_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1978_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1973_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1971_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1970_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1969_PDF.pdf下载成功
三、代码和代码运行结果如下图所示:
最终保存到本机的数据如下图所示:
更多相关网站
- 如何使用Java开发在线生成 pdf 文档 ?
- springboot 2整合websocket推送消息、数据流、解析pdf图片并压缩
- pdf,word,ppt,rar,mp4等等文档在线预览
- 7款口碑炸裂的电脑软件!一个比一个好用,Windows装机必备
- 办公小技巧:杜绝意外 让字体永不丢失
- 铅锤哥:处理PDF文件的神器——完美解密、压缩、转换格式等
- 对比工具大PK(对比工具怎么用)
- 好用的迅捷pdf转ppt转换器(迅捷pdf转换器怎么把pdf转成cad)
- 界面美观功能强大,终于可以告别单调的swagger ui了——knife4j
- PDF尺寸统计软件推荐(pdf尺寸测量工具)
- 超实用ISSUU文档下载教程(isas文档)
- 如何将PDF的某一页插入到WORD(如何把pdf文件中的某一页存出来)
- C#技术分享【PDF转换成图片——13种方案】
- 网络另类下载(网络另类下载网站)
- 这些宝藏免费PDF编辑网站,建议收藏起来!
- 速看!2021山东“专升本”招生计划出炉
- 相见恨晚:windows十款必装的逆天神器
- 实现PDF的预览和下载功能(实现pdf的预览和下载功能的方法)
- 最近发表
- 标签列表
-
- mydisktest_v298 (35)
- sql 日期比较 (33)
- document.appendchild (35)
- 头像打包下载 (35)
- 二调符号库 (23)
- acmecadconverter_8.52绿色版 (25)
- 梦幻诛仙表情包 (36)
- java面试宝典2019pdf (26)
- disk++ (30)
- 加密与解密第四版pdf (29)
- iteye (26)
- centos7.4下载 (32)
- intouch2014r2sp1永久授权 (33)
- usb2.0-serial驱动下载 (24)
- jdk1.8.0_191下载 (27)
- axure9注册码 (30)
- virtualdrivemaster (26)
- 数据结构c语言版严蔚敏pdf (25)
- 兔兔工程量计算软件下载 (27)
- 代码整洁之道 pdf (26)
- ccproxy破解版 (31)
- aida64模板 (28)
- engine=innodb (33)
- shiro jwt (28)
- 方格子excel破解版补丁 (25)