老师,请问,win7系统怎么查看是否开启了虚拟化呢
代码:
import re s="<div><a class='title' href='http://www.baidu.com'>尚学堂bjsxt</a>></div>" pattern=r"[\u4e00-\u9fa5]*" v=re.findall(pattern,s) print(v)
运行结果:
问题:
老师请问一下,为什么在这里我用*匹配汉字的时候结果是这样,可是用+号匹配汉字的时候就可以成功,*不是匹配0个或多个,+是匹配1个或多个,根据贪婪原则,它们不是都应该按多的匹配吗?可是为什么*的匹配结果和+的不同?
index_url ='https://www.kuaidaili.com/usercenter/overview' index_req = Request(index_url,headers =headers) index_resp = opener.open(index_req) print(index_resp.read().decode())
请问这里是什么意思,在登录账号后,还发送请求是为啥
老师我把找到问题在哪了/// 给老师添麻烦了
headers={"UserAgent":UserAgent().random}里的UserAgent写错了,应该这么写User-Agent。
老师那个一键注释快捷键是什么呀
现在登入需要登入验证码,传递参数还能用吗
import requests from bs4 import BeautifulSoup from fake_useragent import UserAgent def get_url(url): proxies={"http":"http://61.135.155.82:443"} headers={"User-Agent":UserAgent().random} resp=requests.get(url,headers=headers,proxies=proxies) resp.encoding="utf-8" if resp.status_code==200: return resp.text else: return None def parse_list(html): soup=BeautifulSoup(html,'lxml') movie_list=soup.select('div[class="movies-list"]>dl[class="movie-list"]>dd>div[class="movie-item film-channel"]>a') return movie_list def parse_index(html): pass def main(): url="https://maoyan.com/films?showType=3&offset=0" html=get_url(url) movie_list=parse_list(html) print(movie_list[0].get('href')) # for url in movie_list: # movie_detail=parse_index # print(movie_detail) if __name__=="__main__": main()
老师好,这个是我使用Beautifulsoup爬取猫眼电影的结果,为什么我这里只爬取出来了一部电影的URL,请问出现这种情况的原因会不会是我的爬虫请求被反爬虫机制拦截了?
老师,这是怎么回事
老师你好,文档中缺少本章节的资料.......................
老师,我给大家避个雷,我今天按照老师的代码一直测试,一直报错:No module named 'scrapy.contrib'卸载安装了这个模块都不行!最后只是将settings里面
ITEM_PIPELINES = { : }
修改成
我真是找了N长时间,我以为是python解释器出问题了,而是scrapy-1.6.0已删除scrapy.contrib耽误了居多时间在这里
老师,我现在要用Beautifulsoup去提取所有的第一个span标签,但是无论我怎么写,都是把全部的提取出来,或者就是只有单独一个,想要这个第一个的电影名称却做到,不知道该怎么写了
老师mac的不讲嘛,每次安装或者是学习新的模块都得自己上网查
环境变量配好了 一直这样 怎么解决?????????????????????????
用scrapy框架爬取网易云音乐,返回的response是id=${x.id},而不是<a href="/song?id=1813864802"> 这种样式的,用requests模块是可以获取id的数字的 scrapy settings文件如下: # Scrapy settings for wangyiyun project # # For simplicity, this file contains only settings considered important or # commonly used. You can find more settings consulting the documentation: # # https://docs.scrapy.org/en/latest/topics/settings.html # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html # https://docs.scrapy.org/en/latest/topics/spider-middleware.html BOT_NAME = 'wangyiyun' SPIDER_MODULES = ['wangyiyun.spiders'] NEWSPIDER_MODULE = 'wangyiyun.spiders' # Crawl responsibly by identifying yourself (and your website) on the user-agent #USER_AGENT = 'wangyiyun (+http://www.yourdomain.com)' # Obey robots.txt rules ROBOTSTXT_OBEY = False LOG_LEVEL = 'ERROR' # Configure maximum concurrent requests performed by Scrapy (default: 16) #CONCURRENT_REQUESTS = 32 # Configure a delay for requests for the same website (default: 0) # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay # See also autothrottle settings and docs #DOWNLOAD_DELAY = 3 # The download delay setting will honor only one of: #CONCURRENT_REQUESTS_PER_DOMAIN = 16 #CONCURRENT_REQUESTS_PER_IP = 16 # Disable cookies (enabled by default) #COOKIES_ENABLED = False # Disable Telnet Console (enabled by default) #TELNETCONSOLE_ENABLED = False # Override the default request headers: DEFAULT_REQUEST_HEADERS = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', # 'Accept-Language': 'en', } USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36' DOWNLOAD_DELAY = 2 # Enable or disable spider middlewares # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html #SPIDER_MIDDLEWARES = { # 'wangyiyun.middlewares.WangyiyunSpiderMiddleware': 543, #} # Enable or disable downloader middlewares # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html DOWNLOADER_MIDDLEWARES = { 'wangyiyun.middlewares.WangyiyunDownloaderMiddleware': 543, } # Enable or disable extensions # See https://docs.scrapy.org/en/latest/topics/extensions.html #EXTENSIONS = { # 'scrapy.extensions.telnet.TelnetConsole': None, #} # Configure item pipelines # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html #ITEM_PIPELINES = { # 'wangyiyun.pipelines.WangyiyunPipeline': 300, #} # Enable and configure the AutoThrottle extension (disabled by default) # See https://docs.scrapy.org/en/latest/topics/autothrottle.html #AUTOTHROTTLE_ENABLED = True # The initial download delay #AUTOTHROTTLE_START_DELAY = 5 # The maximum download delay to be set in case of high latencies #AUTOTHROTTLE_MAX_DELAY = 60 # The average number of requests Scrapy should be sending in parallel to # each remote server #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 # Enable showing throttling stats for every response received: #AUTOTHROTTLE_DEBUG = False # Enable and configure HTTP caching (disabled by default) # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings #HTTPCACHE_ENABLED = True #HTTPCACHE_EXPIRATION_SECS = 0 #HTTPCACHE_DIR = 'httpcache' #HTTPCACHE_IGNORE_HTTP_CODES = [] #HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
非常抱歉给您带来不好的体验!为了更深入的了解您的学习情况以及遇到的问题,您可以直接拨打投诉热线:
我们将在第一时间处理好您的问题!
关于
课程分类
百战程序员微信公众号
百战程序员微信小程序
©2014-2025百战汇智(北京)科技有限公司 All Rights Reserved 北京亦庄经济开发区科创十四街 赛蒂国际工业园网站维护:百战汇智(北京)科技有限公司 京公网安备 11011402011233号 京ICP备18060230号-3 营业执照 经营许可证:京B2-20212637