Releases: apify/crawlee
v0.21.8
v0.21.7
v0.21.6
It appears that CheerioCrawler
was correctly retiring sessions on timeouts
and blocked status codes (401, 403, 429), whereas PuppeteerCrawler
did not.
Apologies for the omission, this release fixes the problem.
- Fix sessions not being retired on blocked status codes in
PuppeteerCrawler
. - Fix sessions not being marked bad on navigation timeouts in
PuppeteerCrawler
. - Update
apify-shared
to version0.5.0
.
v0.21.5
This is a very minor release that fixes some issues that were preventing
use of the SDK with Node 14.
- Update the request serialization process which is used in
RequestList
to work with Node 10+ and not only 10 and 12. - Update some TypeScript types that were preventing build due to changes
in typed dependencies.
v0.21.4
The request statistics that you may remember from logs are now persisted in key-value store,
so you won't lose count when your actor restarts. We've also added a lot of useful
stats in there which can be useful to you after a run finishes. Besides that,
we fixed some bugs and annoyances and improved the TypeScript experience a bit.
- Add persistence to
Statistics
class and automatically persist it inBasicCrawler
. - Fix issue where inaccessible Apify Proxy would cause
ProxyConfiguration
to throw
a timeout error. - Update default user agent to Chrome 85
- Bump Puppeteer to 5.2.1 which uses Chromium 85
- TypeScript: Fix
RequestAsBrowserOptions
missing some values and addRequestQueueInfo
as a return value fromrequestQueue.getInfo()
v0.21.3
v0.21.2
v0.21.1
We fixed some bugs, improved a few things and bumped Puppeteer to match latest Chrome 84.
- Allow
Apify.createProxyConfiguration
to be used seamlessly with the proxy component
of Actor Input UI. - Fix integration of plugins into
CheerioCrawler
with thecrawler.use()
function. - Fix a race condition which caused
RequestQueueLocal
to fail handling requests. - Fix broken debug logging in
SessionPool
. - Improve
ProxyConfiguration
error message for missing password / token. - Update Puppeteer to 5.2.0
- Improve docs, update packages and so on.
v0.21.0
This release comes with breaking changes that will affect most, if not all of your projects. See the migration guide for more information and examples.
First large change is a redesigned proxy configuration. Cheerio
and Puppeteer
crawlers now accept a proxyConfiguration
parameter, which is an instance of ProxyConfiguration
. This class now exclusively manages both Apify Proxy and custom proxies. Visit the new proxy management guide
We also removed Apify.utils.getRandomUserAgent()
as it was no longer effective in avoiding bot detection and changed the default values for empty properties in Request
instances.
- BREAKING: Removed
Apify.getApifyProxyUrl()
. To get an Apify Proxy url, useproxyConfiguration.newUrl([sessionId])
. - BREAKING: Removed
useApifyProxy
,apifyProxyGroups
andapifyProxySession
parameters from all applications in the SDK. UseproxyConfiguration
in crawlers andproxyUrl
inrequestAsBrowser
andApify.launchPuppeteer
. - BREAKING: Removed
Apify.utils.getRandomUserAgent()
as it was no longer effective in avoiding bot detection. - BREAKING:
Request
instances no longer initialize empty properties withnull
, which means that:- empty
errorMessages
are now represented by[]
, and - empty
loadedUrl
,payload
andhandledAt
areundefined
.
- empty
- Add
Apify.createProxyConfiguration()
async
function to createProxyConfiguration
instances.ProxyConfiguration
itself is not exposed. - Add
proxyConfiguration
toCheerioCrawlerOptions
andPuppeteerCrawlerOptions
. - Add
proxyInfo
toCheerioHandlePageInputs
andPuppeteerHandlePageInputs
. You can use this object to retrieve information about the currently used proxy inPuppeteer
andCheerio
crawlers. - Add click buttons and scroll up options to
Apify.utils.puppeteer.infiniteScroll()
. - Fixed a bug where intercepted requests would never continue.
- Fixed a bug where
Apify.utils.requestAsBrowser()
would get into redirect loops. - Fix
Apify.utils.getMemoryInfo()
crashing the process on AWS Lambda and on systems running in Docker without memory cgroups enabled. - Update Puppeteer to 3.3.0.
v0.20.4
- Add
Apify.utils.waitForRunToFinish()
which simplifies waiting for an actor run to finish. - Add standard prefixes to log messages to improve readability and orientation in logs.
- Add support for
async
handlers inApify.utils.puppeteer.addInterceptRequestHandler()
- EXPERIMENTAL: Add
cheerioCrawler.use()
function to enable attachingCrawlerExtension
to the crawler to modify its behavior. A plugin that extends functionality. - Fix bug with cookie expiry in
SessionPool
. - Fix issues in documentation.
- Updated
@apify/http-request
to fix issue in theproxy-agent
package. - Updated Puppeteer to 3.0.2