会员可以在此提问，百战程序员老师有问必答

对大家有帮助的问答会被标记为“推荐”
看完课程过来浏览一下别人提的问题，会帮你学得更全面

截止目前，同学们一共提了 133940个问题

from pymongo import MongoClient



class MongoDomoPipeline(object):
    def open_mondo(self,spider):
        self.client = MongoClient
        self.db = self.client.movie
        self.collection = self.db.collection
    def process_item(self, item, spider):
        self.collection.insert(item)
        return item
    def close(self,spider):
        self.client.close()

老师我这个老师报错 AttributeError: 'MongoDomoPipeline' object has no attribute 'collection'

Python 全系列/第十六阶段：Python 爬虫开发/爬虫数据存储 736楼

老师，为什么我访问结果一直是这样的

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础（旧） 737楼

老师，我出现同样的情况，

Paused in debugger

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础（旧） 738楼

老师，昨天还能运行，今天怎么运行不出来，怎么回事

Python 全系列/第十六阶段：Python 爬虫开发/爬虫反反爬- 739楼

现在网站是不是魔改了aes，用标准解密解不出来

Python 全系列/第十六阶段：Python 爬虫开发/爬虫反反爬 740楼

    name = soup.select('h1.name')   #[0].text.strip()
    print(name)

是不是网页的问题，h1.name可以在页面查到，但是读取不到打印出来为空的列表

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础（旧） 741楼

老师我这个该怎么解决，把user换成图片中的就有用，写成代码中的就没用

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础（旧） 742楼

那比如没有腾讯VIP能爬取到VIP视频内容吗？这个算黑客技术还是爬虫？

Python 全系列/第十六阶段：Python 爬虫开发/爬虫反反爬- 743楼

老师我这个该怎么解决，把user换成图片中的就有用，写成代码中的就没用

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础（旧） 744楼

安装whl 出现pip没有更新

可是我升级pip之后，还是出现一样的问题，说pip没有更新，为什么呀？导致whl安装不成功

Python 全系列/第十六阶段：Python 爬虫开发/移动端爬虫开发- 745楼

老师，我是照着视频敲得代码，就是URL不一样，我用的是百度地址加的一些乱写的数字构成的，打印出来的是一个error对象【<urlopen error [Errno 11001] getaddrinfo failed>】不是HTTP报错信息，print(e.reason)打印出来的是这个：[Errno 11001] getaddrinfo failed

Python 全系列/第十六阶段：Python 爬虫开发/scrapy框架使用（旧） 746楼

老师，昨天还能运行，今天怎么运行不出来，怎么回事

Python 全系列/第十六阶段：Python 爬虫开发/爬虫反反爬- 747楼

所有的网站都是quote吗？如果不是应该怎么判断这些类似于相互转换的函数都有哪些

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础（旧） 748楼

问题如下：

爬取下一页是，用xpath爬取，没有显示，xpath 写的是对的，不知道错在哪啊了，请老师指点

douluo.py

import scrapy


class DouluoSpider(scrapy.Spider):
    name = 'douluo'
    allowed_domains = ['baidu.com']
    start_urls = ['https://image.baidu.com/search/detail?ct=503316480&z=0&ipn=d&word=%E6%96%97%E7%BD%97%E5%A4%A7%E9%99%86&step_word=&hs=0&pn=0&spn=0&di=83380&pi=0&rn=1&tn=baiduimagedetail&is=0%2C0&istype=0&ie=utf-8&oe=utf-8&in=&cl=2&lm=-1&st=undefined&cs=1017836848%2C1501428868&os=3786179136%2C2901592361&simid=3481113337%2C309418197&adpicid=0&lpn=0&ln=1606&fr=&fmq=1615969790890_R&fm=&ic=undefined&s=undefined&hd=undefined&latest=undefined&copyright=undefined&se=&sme=&tab=0&width=undefined&height=undefined&face=undefined&ist=&jit=&cg=&bdtype=0&oriquery=&objurl=https%3A%2F%2Fgimg2.baidu.com%2Fimage_search%2Fsrc%3Dhttp%3A%2F%2Fimage.uc.cn%2Fs%2Fwemedia%2Fs%2Fupload%2F2019%2Fcf7fb507a5b57be658415dc028a11f9c.jpg%26refer%3Dhttp%3A%2F%2Fimage.uc.cn%26app%3D2002%26size%3Df9999%2C10000%26q%3Da80%26n%3D0%26g%3D0n%26fmt%3Djpeg%3Fsec%3D1618564439%26t%3D96bcfc5d23d8645386a36aeedb907c06&fromurl=ippr_z2C%24qAzdH3FAzdH3Fv5g_z%26e3Br6v7sp76j_z%26e3BvgAzdH3FwAzdH3Fgjof-k8ud88aw9k8wmumlv8l9cubjbb0mvvb1_z%26e3Bip4s%3Fpyrj%3D%25El%25la%25AA%25Ec%25AC%25AC%25Ec%25b8%25An%26t1%3Dk8ud88aw9k8wmumlv8l9cubjbb0mvvb1%26f%3D8%26prs%3Dv5gr6v7sp76j&gsm=1&rpstart=0&rpnum=0&islist=&querylist=&force=undefined']

    def parse(self, response):
        image_url = response.xpath('//div[@class="img-wrapper"]/img/@src').extract_first()
        yield {
            'image_urls': [image_url]
        }

        # 提取翻页的链接
        next_url = response.xpath('//span[@class="img-switch-btn"]').extract_first()
        yield scrapy.Request(response.urljoin(next_url),callback=self.parse())

Python 全系列/第十六阶段：Python 爬虫开发/scrapy 框架高级 749楼

# _*_coding=utf-8 _*_
from fake_useragent import UserAgent
import requests
from lxml import etree
from time import sleep

def get_html(url):
    """
    :param url: 要爬取的url
    :return返回html
    """
    headers = {
        "User-Agent": UserAgent().chrome
    }
    resp = requests.get(url, headers=headers)
    sleep(3)
    if resp.status_code == 200:
        resp.encoding = 'utf-8'
        return resp.text
    else:
        return None

def parse_list(html):
    """
    :param html: 传递进来一个有电影列表的html
    :return 返回一个电影列表的url
    """
    e = etree.HTML(html)
    list_url = ['https://maoyan.com'+ url for url in e.xpath('//div[@class="movie-item-hover"]/a/@href')]
    return list_url

def parse_index(html):
    """
    :param html: 传递进来一个有电影信息的url
    :return  已经提取好的电影信息
    """
    e = etree.HTML(html)
    names = e.xpath('//h1/text()')[0]
    type = e.xpath('//li[@class="ellipsis"]/a/text()')[0]
    actor = e.xpath('//ul[@class="celebrity-list clearfix"]/li[@class="celebrity actor"]/div/a/text()')
    actors = format_actor(actor)
    return {'name': names, 'type': type, 'actor': actors}

def format_actor(actors):
    actor_set = set()  # 去重
    for actor in actors:
        actor_set.add(actor.strip())
    return actor_set

def main():
    num = int(input('请输入要获取多少页数据'))
    for y in range(num):
        url = 'https://maoyan.com/films?showType=3&offset={}'.format(y*30)
        # print(url)
        list_html = get_html(url)
        list_url = parse_list(list_html)
        for url in list_url:
            # print(url)
            info_html = get_html(url)
            movie = parse_index(info_html)
            print(movie)


if __name__ == '__main__':
    main()

老师为啥没有数据啊！

Python 全系列/第十六阶段：Python 爬虫开发/爬虫反反爬- 750楼

Paused in debugger

同学您好