会员可以在此提问,百战程序员老师有问必答
对大家有帮助的问答会被标记为“推荐”
看完课程过来浏览一下别人提的问题,会帮你学得更全面
截止目前,同学们一共提了 132358个问题
Python 全系列/第十五阶段:Python 爬虫开发/移动端爬虫 211楼
Python 全系列/第十五阶段:Python 爬虫开发/scrapy框架使用 215楼
Python 全系列/第十五阶段:Python 爬虫开发/scrapy框架使用 216楼
Python 全系列/第十五阶段:Python 爬虫开发/移动端爬虫 217楼
Python 全系列/第十五阶段:Python 爬虫开发/爬虫反反爬- 221楼
Python 全系列/第十五阶段:Python 爬虫开发/爬虫基础(旧) 222楼

untitled8.zip

控制台并未输出百度网页的html信息

image.png

Python 全系列/第十五阶段:Python 爬虫开发/移动端爬虫开发- 223楼

import scrapy


class DoubanSpider(scrapy.Spider):
    name = 'douban'
    allowed_domains = ['douban.com']
    start_urls = ['https://movie.douban.com/top250?start=50&filter=']

    def parse(self, response):
        names = response.xpath('//div[@class="hd"]/a/span[1]/text()').getall()
        scores = response.xpath('//span[@class="rating_num"]/text()').extract()

        # 保存数据
        for name,score in zip(names,scores):
            yield {
                'name':name,
                'score':score
            }

老师,我这个为什么生成器生成不了

终端代码是这个

(spider_env) D:\Pycharm开发环境\scrapy04\scrapy04>scrapy crawl douban2 -o douban2.json -t json

D:\Python_env\spider_env\lib\site-packages\scrapy\commands\__init__.py:131: ScrapyDeprecationWarning: The -t command line option is deprecated i

n favor of specifying the output format within the output URI. See the documentation of the -o and -O options for more information.

  feeds = feed_process_params_from_cli(

D:\Python_env\spider_env\lib\site-packages\scrapy\spiderloader.py:37: UserWarning: There are several spiders with the same name:


  DoubanSpider named 'douban' (in scrapy04.spiders.douban)


  DoubanSpider named 'douban' (in scrapy04.spiders.douban2)


  This can cause unexpected behavior.

  warnings.warn(

2022-09-23 10:17:11 [scrapy.utils.log] INFO: Scrapy 2.6.2 started (bot: scrapy04)

2022-09-23 10:17:11 [scrapy.utils.log] INFO: Versions: lxml 4.9.1.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 2.0.1, Twisted 22.8.0,

 Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)], pyOpenSSL 22.0.0 (OpenSSL 3.0.5 5 Jul 2022), cryptograph

y 38.0.1, Platform Windows-10-10.0.22000-SP0

Traceback (most recent call last):

  File "D:\Python_env\spider_env\lib\site-packages\scrapy\spiderloader.py", line 75, in load

    return self._spiders[spider_name]

KeyError: 'douban2'


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "D:\Program Files\Python39\lib\runpy.py", line 197, in _run_module_as_main

    return _run_code(code, main_globals, None,

  File "D:\Program Files\Python39\lib\runpy.py", line 87, in _run_code

    exec(code, run_globals)

  File "D:\Python_env\spider_env\Scripts\scrapy.exe\__main__.py", line 7, in <module>

  File "D:\Python_env\spider_env\lib\site-packages\scrapy\cmdline.py", line 154, in execute

    _run_print_help(parser, _run_command, cmd, args, opts)

  File "D:\Python_env\spider_env\lib\site-packages\scrapy\cmdline.py", line 109, in _run_print_help

    func(*a, **kw)

  File "D:\Python_env\spider_env\lib\site-packages\scrapy\cmdline.py", line 162, in _run_command

    cmd.run(args, opts)

  File "D:\Python_env\spider_env\lib\site-packages\scrapy\commands\crawl.py", line 22, in run

    crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)

  File "D:\Python_env\spider_env\lib\site-packages\scrapy\crawler.py", line 204, in crawl

    crawler = self.create_crawler(crawler_or_spidercls)

  File "D:\Python_env\spider_env\lib\site-packages\scrapy\crawler.py", line 237, in create_crawler

    return self._create_crawler(crawler_or_spidercls)

  File "D:\Python_env\spider_env\lib\site-packages\scrapy\crawler.py", line 312, in _create_crawler

    spidercls = self.spider_loader.load(spidercls)

  File "D:\Python_env\spider_env\lib\site-packages\scrapy\spiderloader.py", line 77, in load

    raise KeyError(f"Spider not found: {spider_name}")

KeyError: 'Spider not found: douban2'


Python 全系列/第十五阶段:Python 爬虫开发/scrapy框架使用 224楼

课程分类

百战程序员微信公众号

百战程序员微信小程序

©2014-2025百战汇智(北京)科技有限公司 All Rights Reserved 北京亦庄经济开发区科创十四街 赛蒂国际工业园
网站维护:百战汇智(北京)科技有限公司
京公网安备 11011402011233号    京ICP备18060230号-3    营业执照    经营许可证:京B2-20212637