保存到mysql中和保存到excel文件中几乎一毛一样,怎么创建scrapy项目以及一些配置,可以先参考这个链接的内容:
https://blog.csdn.net/Di77HaoWenMing/article/details/108905515 https://blog.csdn.net/Di77HaoWenMing/article/details/108914692然后只需要在pipelines.py文件中添加个新的类即可,我下面就是新加了个SaveMysql的类:
import pandas as pd import pymysql class YxqPipeline: def process_item(self, item, spider): filmname = item['filmname'] mydata = pd.DataFrame({'电影名称':filmname}) mydata.to_excel('E:/filmname.xlsx',index=False) return item class SaveMysql: def process_item(self, item, spider): conn = pymysql.connect(host='localhost', user='root', password='123456', database='demo', charset='utf8') cur = conn.cursor() sql = 'insert into filmname(name) values (%s)' cur.executemany(sql,item['filmname']) conn.commit() cur.close() conn.close() return item如果对mysql的设置或者爬完后怎么写入还不清楚,可以参考下面的链接:
https://blog.csdn.net/Di77HaoWenMing/article/details/108876851 https://blog.csdn.net/Di77HaoWenMing/article/details/108835436最后,在setting文件中进行配置,才可以正常调用这个新增加的SaveMysql类哟
ITEM_PIPELINES = { 'yxq.pipelines.YxqPipeline': 300, 'yxq.pipelines.SaveMysql': 301 }运行后,搞定!