【python】当当网某分类页面爬虫练习
君哥
阅读:10390
2年前
评论:22
import requests
from lxml import etree
import csv
import time
import random
import os
os.makedirs('dangdang', exist_ok=True)
writer =csv.writer(open('dangdang/dangdang.csv', 'w', newline='', encoding='utf-8-sig'))
writer.writerow(['图书名称', '上架时间', '出版社', '价格'])
allUrl = ['https://category.dangdang.com/pg{}-cp01.54.92.02.00.00.html'.format(str(i)) for i in range(1, 10)]
count=0
for url in allUrl:
count = count + 1
print(url)
response = requests.get(url)
response.encoding = 'GB2312'
html = etree.html(response.text)
print(html)
allli = html.xpath('//*[@id="component_59"]/li')
print(allli)
print('第{}页开始采集'.format(count))
for li in allli:
print(li)
book_name = li.xpath('./a/@title')[0]
book_time = li.xpath('./p[5]/span[2]/text()')
book_pub = li.xpath('./p[5]/span[3]/a/text()')
book_price = li.xpath('./p[3]/span[1]/text()')
if book_time:
book_time = book_time[0].replace('/','')
else:
book_time = '无'
if book_pub:
book_pub = book_pub[0]
else:
book_pub = '暂无'
if book_price:
book_price = book_price[0].strip('¥')
else:
book_price = 0
rowInfo = (
book_name,
book_time,
book_pub,
book_price
)
print(rowInfo)
writer.writerow(rowInfo)
print('第{}页采集完成!'.format(count))
time.sleep(random.randint(3, 10))


强,我和我的小伙伴们都惊呆了!https://www.2kdy.com
十分赞同楼主!https://www.2kdy.com
不灌水就活不下去了啊!https://www.2kdy.com
我回帖楼主给加积分吗?https://www.cn-helloworlds.cn
楼上的说的很多!https://www.org-wps.cn
缺乏激情了!https://www.youdao-fanyi.it.com
最近回了很多帖子,都没人理我!https://pcs-wps.cn
太高深了,理解力不够用了!https://m-wps.it.com
帖子很有深度!https://win-youdao.it.com
鉴定完毕!https://www.kuailian-vpn.it.com
楼主的等级很高啊!https://www.a-google.com
支持一个https://i-youdao.it.com
我对楼主的敬仰犹如滔滔江水绵延不绝!https://win-youdao.it.com
一口气看完了,我要下去回味回味了!https://www.in-wps.cn
缺乏激情了!https://a-google.com
顶顶更健康!https://of-wps.it.com
帖子好乱!https://www.pcs-wps.com.cn
吹牛的人越来越多了!https://www.mace-wps.cn
论坛人气好旺!https://a-wps.it.com
论坛的人气不行了!https://www.cn-helloworld.org
波场转账节省手续费 - 1.5 TRX=1次转账次数 直接节省80%!无视对方有没有U或者是否交易所- 复制地址【THXfhfV6ThhYzt7d8mm4KL3dE5LWBbwb3s】转 1.5 TRX即可0手续费转账!TG机器人:@jzzTRXbot
专业TRON能量租赁平台 - 1.5 TRX=1次转账次数 直接节省80%!无视对方有没有U或者是否交易所- 复制地址【THXfhfV6ThhYzt7d8mm4KL3dE5LWBbwb3s】转 1.5 TRX即可0手续费转账!TG机器人:@jzzTRXbot