Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong url in driver #22

Open
Tamplier opened this issue Dec 21, 2018 · 4 comments
Open

Wrong url in driver #22

Tamplier opened this issue Dec 21, 2018 · 4 comments

Comments

@Tamplier
Copy link

Sometimes I receive different URL addresses in response and driver.

self.logger.warning("Request response url: %s" % response.url)
self.logger.warning("Request driver url: %s" % response.meta['driver'].current_url)

And more than that, driver url sometimes duplicates urls from other responces (at the same time response urls are unique).

@Tamplier
Copy link
Author

Oh, I understand. It's a bad idea to share a single instance of WebDriver because parse happens parallel with new requests.

response.meta['driver'].find_element_by_css_selector('button.btn.btn-primary').click()
response = response.replace(body=response.meta['driver'].page_source)

It's how I tried to use driver. It leads to a situation when response will be replaced with body from other url (which is in the driver at the moment)

@clemfromspace
Copy link
Owner

Hi @Tamplier,

Thanks for opening this!
What you experiencing looks similar to #21, and you are right about the single instance of the webdriver.
Exposing the driver in the response meta is not working as expected since many requests are happening at the same time. Unless I find a workaround (creating one webdriver for each parallel request?) I think I will have to stop exposing the driver from the meta.

Another problem from another issue is that since only one instance of webdriver is created, it can slow down the entire Scrapy request / response processing...

I don't have any solution right now for theses problems, but I am currently working on another project who is at least solving the second one: https://github.com/clemfromspace/scrapy-puppeteer (Fully asynchronous webdriver using puppeteer instead of Selenium).

@xtan9
Copy link

xtan9 commented Aug 18, 2021

I'm facing the same problem. Any updates?

@ospaarmann
Copy link

I am facing the same issue. I created an issue before I found this one, so I'm going to link it in order to focus the discussion about the topic: #111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants