Skip to content

Releases: apify/crawlee

v3.3.3

31 May 11:26
Compare
Choose a tag to compare

3.3.3 (2023-05-31)

Bug Fixes

  • MemoryStorage: handle EXDEV errors when purging storages (#1932) (e656050)
  • set status message every 10 seconds and log it via debug level (#1918) (32aede6)

Features

  • add support for requestsFromUrl to RequestQueue (#1917) (7f2557c)
  • core: add Request.maxRetries to allow overriding the maxRequestRetries (#1925) (c5592db)

v3.3.2

11 May 13:23
Compare
Choose a tag to compare

3.3.2 (2023-05-11)

Bug Fixes

  • MemoryStorage: cache requests in RequestQueue (#1899) (063dcd1)
  • respect config object when creating SessionPool (#1881) (db069df)

Features

  • allow running single crawler instance multiple times (#1844) (9e6eb1e), closes #765
  • HttpCrawler: add parseWithCheerio helper to HttpCrawler (#1906) (ff5f76f)
  • router: allow inline router definition (#1877) (2d241c9)
  • RQv2 memory storage support (#1874) (049486b)
  • support alternate storage clients when opening storages (#1901) (661e550)

v3.3.1

11 Apr 07:15
Compare
Choose a tag to compare

3.3.1 (2023-04-11)

Bug Fixes

  • infiniteScroll() not working in Firefox (#1826) (4286c5d), closes #1821
  • jsdom: add timeout to the window.load wait when runScripts are enabled (806de31)
  • jsdom: delay closing of the window and add some polyfills (2e81618)
  • jsdom: use no-op enqueueLinks in http crawlers when parsing fails (fd35270)
  • MemoryStorage: handling of readable streams for key-value stores when setting records (#1852) (a5ee37d), closes #1843
  • start status message logger after the crawl actually starts (5d1df7a)
  • status message - total requests (#1842) (710f734)
  • Storage: queue up opening storages to prevent issues in concurrent calls (#1865) (044c740)
  • templates: added missing '@types/node' peer dependency (#1860) (d37a7e2)
  • try to detect stuck request queue and fix its state (#1837) (95a9f94)

Features

  • add parseWithCheerio context helper to cheerio crawler (b336a73)
  • jsdom: add parseWithCheerio context helper (c8f0796)

v3.3.0

09 Mar 09:17
Compare
Choose a tag to compare

3.3.0 (2023-03-09)

Bug Fixes

  • add proxyUrl to DownloadListOfUrlsOptions (779be1e), closes #1780
  • CheerioCrawler: pass ixXml down to response parser (#1807) (af7a5c4), closes #1794
  • ignore invalid URLs in enqueueLinks in browser crawlers (#1803) (5ac336c)
  • MemoryStorage: request queues race conditions causing crashes (#1806) (083a9db), closes #1792
  • MemoryStorage: RequestQueue should respect forefront (#1816) (b68e86a), closes #1787
  • MemoryStorage: RequestQueue#handledRequestCount should update (#1817) (a775e4a), closes #1764

Features

v3.2.2

08 Feb 18:53
Compare
Choose a tag to compare

3.2.2 (2023-02-08)

Bug Fixes

  • MemoryStorage: request queues saved in the wrong place (#1779) (19409db)

v3.2.1

07 Feb 11:49
Compare
Choose a tag to compare

3.2.1 (2023-02-07)

Bug Fixes

  • add QueueOperationInfo export to the core package (5ec6c24)

v3.2.0

07 Feb 08:28
Compare
Choose a tag to compare

3.2.0 (2023-02-07)

Bug Fixes

  • allow userData option in enqueueLinksByClickingElements (#1749) (736f85d), closes #1617
  • clone request.userData when creating new request object (#1728) (222ef59), closes #1725
  • Correctly compute pendingRequestCount in request queue (#1765) (946535f)
  • declare missing dependency on tslib (27e96c8), closes #1747
  • ensure CrawlingContext interface is inferred correctly in route handlers (aa84633)
  • KeyValueStore: big buffers should not crash (#1734) (2f682f7), closes #1732 #1710
  • memory-storage: dont fail when storage already purged (#1737) (8694027), closes #1736
  • update playwright to 1.29.2 and make peer dep. less strict (#1735) (c654fcd), closes #1723
  • utils: add missing dependency on ow (bf0e03c), closes #1716

Features

  • add forefront option to all enqueueLinks variants (#1760) (a01459d), closes #1483
  • enqueueLinks: add SameOrigin strategy and relax protocol matching for the other strategies (#1748) (4ba982a)
  • MemoryStorage: read from fs if persistStorage is enabled, ram only otherwise (#1761) (e903980)

v3.1.4

14 Dec 15:45
Compare
Choose a tag to compare

3.1.4 (2022-12-14)

Bug Fixes

v3.1.3

07 Dec 14:26
Compare
Choose a tag to compare

3.1.3 (2022-12-07)

Bug Fixes

Features

  • always show error origin if inside the userland (#1677) (bbe9045)
  • hideInternalConsole in JSDOMCrawler (#1707) (8975f90)

v3.1.2

15 Nov 08:57
Compare
Choose a tag to compare

3.1.2 (2022-11-15)

Bug Fixes

  • injectJQuery in context does not survive navs (#1661) (493a7cf)
  • make router error message more helpful for undefined routes (#1678) (ab359d8)
  • MemoryStorage: correctly respect the desc option (#1666) (b5f37f6)
  • requestHandlerTimeout timing (#1660) (493ea0c)
  • shallow clone browserPoolOptions before normalization (#1665) (22467ca)
  • support headfull mode in playwright js project template (ea2e61b)
  • support headfull mode in puppeteer js project template (e6aceb8)

Features