Python爬取扇贝Python必背词汇

    科技2024-07-26  80

    一、需求分析

    爬取网址:http://www.shanbay.com/wordlist/110521/232414/

    需求:

    获取所有的 python 词汇数据,形成字典,存储数据。

    二、代码实现

    #导入包 from urllib import request from lxml import etree #词汇表 words = [] def shanbei(page): url = "http://www.shanbay.com/wordlist/110521/232414/" print(url) rsp = request.urlopen(url) html = rsp.read() #解析html html = etree.HTML(html) tr_list = html.xpath("//tr") # 遍历每个tr元素,每一个tr对应一个单词和介绍 for tr in tr_list: ''' 查相应的单词和介绍 ''' word = {} strong = tr.xpath('.//strong') if len(strong): # strip把找到的内容去掉空格 name = strong[0].text.strip() word['name'] = name # 查找单词的释义 td_content = tr.xpath('./td[@class="span10"]') if len(td_content): content = td_content[0].text.strip() word['content'] = content print(word) if word != {}: words.append(word) if __name__ == '__main__': shanbei(2)

    三、运行结果

    Processed: 0.015, SQL: 8