From 2e29ce49aacb20be1e7ba9f167087c245101746d Mon Sep 17 00:00:00 2001 From: Yash <76577754+yash-fn@users.noreply.github.com> Date: Tue, 21 Dec 2021 19:04:03 -0600 Subject: [PATCH] Synchronous Driver Actions Solution to README Feel many people will require synchronous solution to requests prior to response parsing to ensure best compatibility with scrapy design which keeps response parsing and request asynchronous. This utilizes the wait_until to perform any driver actions prior to response formation. Could be useful to people so maybe include as use case in README? Also, there may be potential to create a wrapper that does this more user friendly, but for now this good maybe. --- README.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/README.md b/README.md index b642de9..f9d5722 100644 --- a/README.md +++ b/README.md @@ -50,6 +50,29 @@ def parse_result(self, response): ``` For more information about the available driver methods and attributes, refer to the [selenium python documentation](http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.remote.webdriver) +Ideally you do not want any driver actions to be used during response parsing since scrapy design is that responses and requests are asynchronous and this package only utilizes a single webdriver. But sometimes you require that user actions such as clicks, forms, etc be done after request but prior to parsing so that response object has all data you intend to scrape. Here is a solution that may help in those circumstances: + + 1. create custom selenium wait condition (https://selenium-python.readthedocs.io/waits.html) + ```python + class wait_title(object): + def __init__(self): + self.params = "string" + + def __call__(self, driver): + # driver actions go here ... + title = driver.title + if title == self.params: + return title #technically not matter what you return since this package ignores it anyways ... just no boolean if success condition + else + return False #if returns false then continues waiting and will run this function again after some delay + ``` + 2. then supply this custom wait condition to selenium request like so: +```python +SeleniumRequest(url=url, callback=self.parse, wait_until=wait_title(), wait_time=10) +``` + +This will ensure that a given request performs all requisite driver actions before formulating a response object to be parse asynchronously. + The `selector` response attribute work as usual (but contains the html processed by the selenium driver). ```python def parse_result(self, response):