Kopfloses Selen über Docker mit Python / Scrapy

889
Chris

Ich versuche Scrapy mit Selenium auf einem Laptop zu verwenden, auf dem ich Kubuntu installiert habe, aber ich verwende nur die Befehlszeile (und starte den X-Server nicht).

Meine erste Frage: Würde ich dann noch Xvfb brauchen?

Was ich jetzt mache:

sudo service docker start sudo service docker status sudo docker run -it --rm --name chrome --shm-size=1024m -p=9222:9222 --cap-add=SYS_ADMIN yukinying/chrome-headless-browser --enable-logging --v=10000  ; Now docker is running, in a second SSH session I do now:  Xvfb :99 & export DISPLAY=:99  ; In the second SSH session now:  scrapy crawl weibospider 

Jetzt bekomme ich eine riesige Liste von DEBUG-Meldungen und Optionsparametern usw.

2017-07-09 18:37:23 [easyprocess] DEBUG: param: "['Xvfb', '-help']"  2017-07-09 18:37:23 [easyprocess] DEBUG: command: ['Xvfb', '-help'] 2017-07-09 18:37:23 [easyprocess] DEBUG: joined command: Xvfb -help 2017-07-09 18:37:24 [easyprocess] DEBUG: process was started (pid=5235) 2017-07-09 18:37:26 [easyprocess] DEBUG: process has ended 2017-07-09 18:37:26 [easyprocess] DEBUG: return code=0 2017-07-09 18:37:26 [easyprocess] DEBUG: stdout= 2017-07-09 18:37:26 [easyprocess] DEBUG: stderr=use: X [:<display>] [option] -a # default pointer acceleration (factor) -ac disable access control restrictions ... 2017-07-09 18:38:35 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request Unhandled error in Deferred: 2017-07-09 18:38:35 [twisted] CRITICAL: Unhandled error in Deferred:  2017-07-09 18:38:35 [twisted] CRITICAL:  Traceback (most recent call last): File "/home/spidy/.local/lib/python3.5/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks result = g.send(result) File "/home/spidy/.local/lib/python3.5/site-packages/scrapy/crawler.py", line 76, in crawl self.spider = self._create_spider(*args, **kwargs) File "/home/spidy/.local/lib/python3.5/site-packages/scrapy/crawler.py", line 99, in _create_spider return self.spidercls.from_crawler(self, *args, **kwargs) File "/home/spidy/.local/lib/python3.5/site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler spider = cls(*args, **kwargs) File "/home/spidy/var/scrapy/weibo/weibo/spiders/weibobrandspider.py", line 26, in __init__ self.browser = webdriver.Firefox() File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/firefox/webdriver.py", line 152, in __init__ keep_alive=True) File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__ self.start_session(desired_capabilities, browser_profile) File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session response = self.execute(Command.NEW_SESSION, parameters) File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute self.error_handler.check_response(response) File "/home/spidy/.local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: connection refused 

Meine Umgebung:

  • Python 3.5.2
  • / usr / local / bin / geckodriver
  • Docker-Version 17.03.1-ce, Build c6d412e
  • Mozilla Firefox 54.0
  • Ubuntu 16.04.2 LTS

Und das Skript:

from pyvirtualdisplay import Display  class WeiboSpider(scrapy.Spider): name = "weibospider"  def __init__(self): display = Display(visible=0, size=(1200, 1000)) display.start()  # This is the problematic line: self.browser = webdriver.Firefox() 

Ich hatte keine Ideen mehr - was mache ich falsch oder was fehlt mir?

0

1 Antwort auf die Frage

-1
Kenuan Developer

Könnte sein:

Andockbild ist für Chromfahrer: yukinying / Chrome-headless-Browser

und Sie verwenden Geckodriver: / usr / local / bin / geckodriver