Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support use_incognito_pages for browser_launch_options in PlaywrightCrawler #941

Merged
merged 9 commits into from
Feb 5, 2025

Conversation

Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Jan 29, 2025

Description

  • Improve cookie handling for PlaywrightCrawler. Cookies are now stored in the Session and set in Playwright Context from the Session.
  • Add use_incognito_pages option for browser_launch_options allowing each new page to be launched in a separate context.

Issues

@Mantisus Mantisus requested review from vdusek and janbuchar January 30, 2025 13:45
@janbuchar
Copy link
Collaborator

Does the incognito pages option have any relationship with the cookie handling? Also, a test would be nice 🙂

@Mantisus
Copy link
Collaborator Author

Does the incognito pages option have any relationship with the cookie handling?

Yes. When we work in basic mode we are working with one common context. In this case, cookies will be strayed between sessions.

However, by using incognito pages one page per context, we get a more controlled situation. When a session works only with the cookies intended for it.

@Mantisus Mantisus self-assigned this Jan 30, 2025
@janbuchar
Copy link
Collaborator

Does the incognito pages option have any relationship with the cookie handling?

Yes. When we work in basic mode we are working with one common context. In this case, cookies will be strayed between sessions.

Can this be a desirable state for anyone?

However, by using incognito pages one page per context, we get a more controlled situation. When a session works only with the cookies intended for it.

I sort of think that this should be the default. Do we really need to make it configurable? What does the JS version do?

@B4nan
Copy link
Member

B4nan commented Jan 31, 2025

What does the JS version do?

It's not the default, because it means no browser cache, so a huge perf cost, when we were testing this, things took literally twice the time to finish because of that.

@janbuchar
Copy link
Collaborator

What does the JS version do?

It's not the default, because it means no browser cache, so a huge perf cost, when we were testing this, things took literally twice the time to finish because of that.

So does the JS PlaywrightCrawler also not store cookies in the Session?

@Mantisus
Copy link
Collaborator Author

Mantisus commented Jan 31, 2025

So does the JS PlaywrightCrawler also not store cookies in the Session?

Stores, but when using the basic setup, they are just as flowing between sessions from the Playwright context.

@B4nan
Copy link
Member

B4nan commented Jan 31, 2025

This is the thing we really need to redesign in next major. IDK how cookies behave, but the more important part there is that because of this, we keep the same proxy in one browser instance with the defaults (so persistent contexts). In other words, sessions rotate per request, but proxies only per browser (we have some limits on how many times a browser instance can be used).

Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit, otherwise LGTM, but let's wait for @janbuchar's approve as well.

tests/unit/crawlers/_playwright/test_playwright_crawler.py Outdated Show resolved Hide resolved
@Mantisus Mantisus requested a review from janbuchar February 4, 2025 17:41
@vdusek vdusek merged commit eae3a33 into apify:master Feb 5, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants