Python实现网页源码抓取

时间：2026-05-01 03:28:04

1、用import 命令导入 urllib 库，具体代码为：

import urllib.request

Python实现网页源码抓取

2、用 urllib.request.urlopen 打开一个网页，具体代码为：

file = urllib.request.urlopen("http://www.baidu.com")

经过上面的处理，我们把读取到的网页内中存到了变量 file当中

Python实现网页源码抓取

3、用 read 方法将内容读取出来，具体代码为：

data = file.read()

Python实现网页源码抓取

4、读取到内容之后，我们用文件操作方式，将读取的内容存放到文件当中。

f = open("date.html","wb")

f.write(data)
f.close()

Python实现网页源码抓取

5、我们将内容存到了 date.html 当中，查看文件内容如图所示

Python实现网页源码抓取

6、上面代码是先读取内容，然后用文件操作方式进行保存源码，下面我们直接用模块中的方法进行保存，代码如下：

filename = urllib.request.urlretrieve("http://www.baidu.com",filename="2.html")
urllib.request.urlcleanup()

Python实现网页源码抓取

7、代码总结：

import urllib.request

file = urllib.request.urlopen("http://www.baidu.com")
data = file.read()
f = open("date.html","wb")
f.write(data)
f.close()
#直接将网页写入本地
filename = urllib.request.urlretrieve("http://www.baidu.com",filename="2.html")
urllib.request.urlcleanup()

以上是通过两种不同方式，读取指定网页的内容

Python实现网页源码抓取