14 Feb 14:17

B4nan

f98a238

v2.2.2

What's Changed

fix: ensure request.headers is set by @B4nan in #1281
fix: cookies setting in preNavigationHooks by @AndreyBykov in #1283
refactor: improve logging for fetching next request and timeouts by @B4nan in #1292

This release should help with the infamous 0 concurrency bug. The problem is probably still there, but should be much less common. The main difference is that we now use shorter timeouts for API calls from RequestQueue.

Full Changelog: v2.2.1...v2.2.2

Contributors

B4nan and AndreyBykov

Assets 2

03 Jan 15:01

B4nan

v2.2.1

51cdede

v2.2.1

What's Changed

fix: ignore requests that are no longer in progress by @B4nan in #1258
fix: do not use tryCancel() from inside sync callback by @B4nan in #1265
fix: revert to puppeteer 10.x by @B4nan in #1276
fix: wait when body is not available in infiniteScroll() from Puppeteer utils by @B4nan in #1277
fix: expose logger classes on the utils.log instance by @B4nan in #1278

Full Changelog: v2.2.0...v2.2.1

Contributors

B4nan

Assets 2

17 Dec 13:26

B4nan

v2.2.0

83530c4

v2.2.0

Proxy per page

Up until now, browser crawlers used the same session (and therefore the same proxy) for
all request from a single browser - now get a new proxy for each session. This means
that with incognito pages, each page will get a new proxy, aligning the behaviour with
CheerioCrawler.

This feature is not enabled by default. To use it, we need to enable useIncognitoPages
flag under launchContext:

new Apify.Playwright({
    launchContext: {
        useIncognitoPages: true,
    },
    // ...
})

Note that currently there is a performance overhead for using useIncognitoPages.
Use this flag at your own will.

We are planning to enable this feature by default in SDK v3.0.

Abortable timeouts

Previously when a page function timed out, the task still kept running. This could lead to requests being processed multiple times. In v2.2 we now have abortable timeouts that will cancel the task as early as possible.

Mitigation of zero concurrency issue

Several new timeouts were added to the task function, which should help mitigate the zero concurrency bug. Namely fetching of next request information and reclaiming failed requests back to the queue are now executed with a timeout with 3 additional retries before the task fails. The timeout is always at least 300s (5 minutes), or handleRequestTimeoutSecs if that value is higher.

Full list of changes

fix RequestError: URI malformed in cheerio crawler (#1205)
only provide Cookie header if cookies are present (#1218)
handle extra cases for diffCookie (#1217)
implement proxy per page in browser crawlers (#1228)
add fingerprinting support (#1243)
implement abortable timeouts (#1245)
add timeouts with retries to runTaskFunction() (#1250)
automatically convert google spreadsheet URLs to CSV exports (#1255)

Assets 2

07 Oct 12:42

B4nan

v2.1.0

e71869b

v2.1.0

What's Changed

feat: warn if apify proxy is used in proxyUrls by @szmarczak in #1173
feat: use puppeteer emulating scrolls instead of window.scrollBy by @vladfrangu in #1170
feat: support channel and user links in YouTube regex by @vladfrangu in #1178
feat: add support for cgroups V2 to utils.getMemoryInfo by @mnmkng in #1177
feat: add purgeLocalStorage method by @vladfrangu in #1187
feat: allow passing forceCloud down to the KV store by @vladfrangu in #1186
fix: automatically convert gdoc share urls to csv download ones in request list by @B4nan in #1174
fix YOUTUBE_REGEX_STRING being too greedy by @B4nan in #1171
fix: incorrect offset in fixUrl function by @szmarczak in #1184
fix: catch errors inside request interceptors by @B4nan in #1192
fix: use encodeURIComponent instead of encodeURI by @szmarczak in #1198
fix: merge cookies provided by user with session cookies by @B4nan in #1201

Full Changelog: v2.0.7...v2.1.0

Contributors

B4nan, vladfrangu, and 2 other contributors

Assets 2

08 Sep 07:57

B4nan

v2.0.7

71150be

v2.0.7

Fix casting of int/bool environment variables (e.g. APIFY_LOCAL_STORAGE_ENABLE_WAL_MODE), closes #956
Fix incognito pages and user data dir (#1145)
Add @ts-ignore comments to imports of optional peer dependencies (#1152)
Use config instance in sdk.openSessionPool() (#1154)
Add a breaking callback to infiniteScroll (#1140)

Assets 2

27 Aug 11:50

mnmkng

v2.0.6

7b7a971

v2.0.6

Fix deprecation messages logged from ProxyConfiguration and CheerioCrawler.
Update got-scraping to receive multiple improvements.

Assets 2

24 Aug 16:58

B4nan

v2.0.5

d603700

v2.0.5

2.0.5 / 2021/08/24

Fix error handling in puppeteer crawler

Assets 2

23 Aug 14:41

szmarczak

v2.0.4

d3261ca

v2.0.4

feat: use session token with got-scraping (#1122) d3261ca

This update introduces persistent browser headers when using got-scraping.

Assets 2

20 Aug 13:48

szmarczak

v2.0.3

b9e99b7

v2.0.3

chore: add aborting event to events docs [skip ci] c89f532
fix: refactor requestAsBrowser to Got 12 (#1111) ef9a4ad
fix: limit handleRequestTimeoutMillis to max valid value (#1116) 5948958
fix: disable SSL validation on MITM proxies (#1117) 853c5cd
fix: bump got-scraping to 3.0.1 (#1121) b9e99b7

This release improves the stability of the SDK.

Assets 2

12 Aug 08:39

mnmkng

v2.0.2

e0ae762

v2.0.2

Fix serialization issues in CheerioCrawler caused by parser conflicts in recent versions of cheerio.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

Proxy per page

Abortable timeouts

Mitigation of zero concurrency issue

Full list of changes

What's Changed

Contributors

2.0.5 / 2021/08/24

Releases: apify/crawlee

v2.2.2

What's Changed

Contributors

v2.2.1

What's Changed

Contributors

v2.2.0

Proxy per page

Abortable timeouts

Mitigation of zero concurrency issue

Full list of changes

v2.1.0

What's Changed

Contributors

v2.0.7

v2.0.6

v2.0.5

2.0.5 / 2021/08/24

v2.0.4

v2.0.3

v2.0.2