有效沟通问答-【官方】百战程序员_IT在线教育培训机构

会员可以在此提问，百战程序员老师有问必答

对大家有帮助的问答会被标记为“推荐”
看完课程过来浏览一下别人提的问题，会帮你学得更全面

截止目前，同学们一共提了 133689个问题

时间排序推荐排序

请问报错：无法在json类型的选择器上使用xpath，是什么意思？

Python 全系列/第十六阶段：Python 爬虫开发/scrapy框架使用 241楼

调试模式自己就停止了，而且爬取不到任何数据

Python 全系列/第十六阶段：Python 爬虫开发/scrapy框架使用 242楼

用xpath helper可以找到, 但是在scrapy中的xpath就找不到什么情况, 老师视频里也是这个情况——网页我改过。xpath 直接复制的，但是还是出不来数据

Python 全系列/第十六阶段：Python 爬虫开发/scrapy框架使用 243楼

老师就是网页没有爬取完全，程序就结束了，然后我加了隐式等待和调用js，但是网页还是没有爬取完全，是不是我的逻辑方面有错误，爬取时，还会有那个提示弹窗，必须手动点掉才开始爬取，怎么在代码里就给他跳过

from selenium.webdriver.chrome.service import Service

from selenium.webdriver import Chrome

from lxml import etree

from selenium.webdriver.common.by import By

from time import sleep

def spider_huya():

#创建驱动

service=Service('./chromedriver.exe')

#创建浏览器

driver=Chrome(service=service)

#访问网页

driver.get('https://www.huya.com/g/lol')

count=1

while True:

print('获取了第{%d}页' % count)

count +=1

#拖动滚动条到底部

js='document.documentElement.scrollTop=1000'

driver.execute_script(js)

#设置隐式等待

driver.implicitly_wait(20)

#创建etree对象

e=etree.HTML(driver.page_source)

#提取数据

names=e.xpath('//i[@class="nick"]/text()')

persons=e.xpath('//i[@class="js-num"]/text()')

sleep(3)

#打印数据

#for n,p in zip(names,persons):

#print(f'主播名:{n} 人气:{p}')

#获取选项卡，点击

if driver.page_source.find('laypage_next') ==-1:

break

next_btn=driver.find_element(By.XPATH,'//a[@class="laypage_next"]')

next_btn.click()

sleep(3)

driver.quit()

if __name__=='__main__':

spider_huya()

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础 244楼

这个怎么解决呀

Python 全系列/第十六阶段：Python 爬虫开发/scrapy框架使用 245楼

import requests

from fake_useragent import UserAgent

from lxml import etree

#找到url地址

url='https://www.zongheng.com/rank?nav=recommend&rankType=6'

#设置请求头

headers={'User-Agent':UserAgent().edge}

#发送请求

resp=requests.get(url,headers=headers)

#解析数据

e=etree.HTML(resp.text)

#提取数据

names=e.xpath('//div[@class="rank-modules-works--main-item-title ellipsis-two-lines global-hover"]/span/text()')

authors=e.xpath('//div[@class="rank-modules-works--main-item-author ellipsis"]/span[1]/text()')

#打印数据

for n,a in zip(names,authors):

print(n,a)

老师代码运行之后，没有出结果，2个表达式在浏览器上面试出了结果

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础 246楼

为什么我上面显示的origin地址，我自己去ping这个地址的时候ping不通呢

Python 全系列/第十六阶段：Python 爬虫开发/scrapy框架使用 247楼

老师为啥我按视频里的步骤做完了怎么还弹出来这个

Python 全系列/第十六阶段：Python 爬虫开发/移动端爬虫 248楼

问题已解决：

selenium 4.0.0

Appium-Python-Client 2.1.0

Python 全系列/第十六阶段：Python 爬虫开发/移动端爬虫 249楼

#pip install Appium-Python-Client==2.6.1

#selenium 4.11.2

from appium import webdriver

from selenium.webdriver.common.by import By

from time import sleep

#设置设备信息

desired = {

"platformName": "Android",

"platformVersion": "5.1.1",

"deviceName": "VOG-AL10",

"appPackage": "com.android.browser",

"appActivity": "com.android.browser.BrowserActivity"

}

#设置appium的服务

server = 'http://localhost:4723/wd/hub'

# 新建一个driver

driver = webdriver.Remote(server, options=desired)

#蒙版元素，需要点击一下

value = '/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout[2]/android.widget.LinearLayout/android.widget.FrameLayout/android.webkit.WebView'

driver.find_element(By.XPATH,value).click()

sleep(1)

#定位输入框

input_text = '/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout[2]/android.widget.LinearLayout/android.widget.FrameLayout/android.webkit.WebView/android.webkit.WebView/android.view.View[2]/android.view.View[3]/android.view.View/android.widget.EditText'

input = driver.find_element(By.XPATH,input_text)

#输入内容,清空数据

input.clear()

input.send_keys('音乐')

sleep(1)

#定位按钮

button_text = '/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout[2]/android.widget.LinearLayout/android.widget.FrameLayout/android.webkit.WebView/android.webkit.WebView/android.view.View[2]/android.view.View[3]/android.view.View/android.widget.Button'

button = driver.find_element(By.XPATH,button_text)

#点击按钮

button.click()

sleep(1)

#com.android.browser/.BrowserActivity

启动报错：

Traceback (most recent call last):

File "d:\python\spiderProject\移动端爬虫\appium的实战.py", line 21, in <module>

driver = webdriver.Remote(server, options=desired)

File "D:\virtualenv\spider_env\lib\site-packages\appium\webdriver\webdriver.py", line 230, in __init__

super().__init__(

TypeError: WebDriver.__init__() got an unexpected keyword argument 'desired_capabilities'

Python 全系列/第十六阶段：Python 爬虫开发/移动端爬虫 250楼

老师，这个实验用for语句控制了访问次数，那么如果在某些网站上面，我只访问一次去获取数据，那服务器是不是就不会崩溃啊

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础 251楼

老师，用爬虫爬取数据的时候，假如服务器允许10个人访问，为啥爬虫会占用一个以上的资源？

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础 252楼

证书问题解决不了，这部分内容很难进行下去

Python 全系列/第十六阶段：Python 爬虫开发/移动端爬虫 253楼

老师为什么要引入header

Python 全系列/第十六阶段：Python 爬虫开发/爬虫基础 254楼

老师，移动端爬虫，模拟器配置完代理后，报错

Python 全系列/第十六阶段：Python 爬虫开发/移动端爬虫 255楼

同学您好