Releases: apify/crawlee
v2.2.2
What's Changed
- fix: ensure
request.headers
is set by @B4nan in #1281 - fix: cookies setting in preNavigationHooks by @AndreyBykov in #1283
- refactor: improve logging for fetching next request and timeouts by @B4nan in #1292
This release should help with the infamous 0 concurrency bug. The problem is probably still there, but should be much less common. The main difference is that we now use shorter timeouts for API calls from RequestQueue
.
Full Changelog: v2.2.1...v2.2.2
v2.2.1
What's Changed
- fix: ignore requests that are no longer in progress by @B4nan in #1258
- fix: do not use
tryCancel()
from inside sync callback by @B4nan in #1265 - fix: revert to puppeteer 10.x by @B4nan in #1276
- fix: wait when
body
is not available ininfiniteScroll()
from Puppeteer utils by @B4nan in #1277 - fix: expose logger classes on the
utils.log
instance by @B4nan in #1278
Full Changelog: v2.2.0...v2.2.1
v2.2.0
Proxy per page
Up until now, browser crawlers used the same session (and therefore the same proxy) for
all request from a single browser - now get a new proxy for each session. This means
that with incognito pages, each page will get a new proxy, aligning the behaviour with
CheerioCrawler
.
This feature is not enabled by default. To use it, we need to enable useIncognitoPages
flag under launchContext
:
new Apify.Playwright({
launchContext: {
useIncognitoPages: true,
},
// ...
})
Note that currently there is a performance overhead for using
useIncognitoPages
.
Use this flag at your own will.
We are planning to enable this feature by default in SDK v3.0.
Abortable timeouts
Previously when a page function timed out, the task still kept running. This could lead to requests being processed multiple times. In v2.2 we now have abortable timeouts that will cancel the task as early as possible.
Mitigation of zero concurrency issue
Several new timeouts were added to the task function, which should help mitigate the zero concurrency bug. Namely fetching of next request information and reclaiming failed requests back to the queue are now executed with a timeout with 3 additional retries before the task fails. The timeout is always at least 300s (5 minutes), or handleRequestTimeoutSecs
if that value is higher.
Full list of changes
- fix
RequestError: URI malformed
in cheerio crawler (#1205) - only provide Cookie header if cookies are present (#1218)
- handle extra cases for
diffCookie
(#1217) - implement proxy per page in browser crawlers (#1228)
- add fingerprinting support (#1243)
- implement abortable timeouts (#1245)
- add timeouts with retries to
runTaskFunction()
(#1250) - automatically convert google spreadsheet URLs to CSV exports (#1255)
v2.1.0
What's Changed
- feat: warn if apify proxy is used in proxyUrls by @szmarczak in #1173
- feat: use puppeteer emulating scrolls instead of window.scrollBy by @vladfrangu in #1170
- feat: support channel and user links in YouTube regex by @vladfrangu in #1178
- feat: add support for cgroups V2 to utils.getMemoryInfo by @mnmkng in #1177
- feat: add
purgeLocalStorage
method by @vladfrangu in #1187 - feat: allow passing
forceCloud
down to the KV store by @vladfrangu in #1186 - fix: automatically convert gdoc share urls to csv download ones in request list by @B4nan in #1174
- fix
YOUTUBE_REGEX_STRING
being too greedy by @B4nan in #1171 - fix: incorrect offset in
fixUrl
function by @szmarczak in #1184 - fix: catch errors inside request interceptors by @B4nan in #1192
- fix: use encodeURIComponent instead of encodeURI by @szmarczak in #1198
- fix: merge cookies provided by user with session cookies by @B4nan in #1201
Full Changelog: v2.0.7...v2.1.0
v2.0.7
- Fix casting of int/bool environment variables (e.g.
APIFY_LOCAL_STORAGE_ENABLE_WAL_MODE
), closes #956 - Fix incognito pages and user data dir (#1145)
- Add
@ts-ignore
comments to imports of optional peer dependencies (#1152) - Use config instance in
sdk.openSessionPool()
(#1154) - Add a breaking callback to
infiniteScroll
(#1140)
v2.0.6
v2.0.5
v2.0.4
v2.0.3
- chore: add aborting event to events docs [skip ci] c89f532
- fix: refactor requestAsBrowser to Got 12 (#1111) ef9a4ad
- fix: limit handleRequestTimeoutMillis to max valid value (#1116) 5948958
- fix: disable SSL validation on MITM proxies (#1117) 853c5cd
- fix: bump got-scraping to 3.0.1 (#1121) b9e99b7
This release improves the stability of the SDK.