Skip to content

Commit

Permalink
feat(client): add experimental http3 support
Browse files Browse the repository at this point in the history
  • Loading branch information
j-mendez committed Nov 27, 2023
1 parent 3a68cd6 commit cb2cb80
Show file tree
Hide file tree
Showing 10 changed files with 146 additions and 30 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,6 @@ jobs:
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
- name: Build
run: cargo build --verbose
run: cargo build --verbose --release
- name: Run tests
run: ./target/debug/spider_worker & cargo test --verbose --all-features
run: ./target/release/spider_worker & RUSTFLAGS='--cfg reqwest_unstable' cargo test --verbose --all-features
136 changes: 124 additions & 12 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions examples/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "spider_examples"
version = "1.50.8"
version = "1.50.9"
authors = ["madeindjs <[email protected]>", "j-mendez <[email protected]>"]
description = "Multithreaded web crawler written in Rust."
repository = "https://github.com/spider-rs/spider"
Expand All @@ -22,7 +22,7 @@ htr = "0.5.27"
flexbuffers = "2.0.0"

[dependencies.spider]
version = "1.50.8"
version = "1.50.9"
path = "../spider"
features = ["serde"]

Expand Down
5 changes: 3 additions & 2 deletions spider/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "spider"
version = "1.50.8"
version = "1.50.9"
authors = ["madeindjs <[email protected]>", "j-mendez <[email protected]>"]
description = "The fastest web crawler written in Rust."
repository = "https://github.com/spider-rs/spider"
Expand Down Expand Up @@ -77,4 +77,5 @@ chrome_stealth = ["chrome"]
cookies = ["reqwest/cookies"]
cron = ["dep:chrono", "dep:cron", "dep:async-trait"]
napi = ["dep:napi"]
napi_rustls_tls = ["napi", "reqwest/rustls-tls"]
napi_rustls_tls = ["napi", "reqwest/rustls-tls"]
http3 = ["reqwest/http3"]
16 changes: 8 additions & 8 deletions spider/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This is a basic async example crawling a web page, add spider to your `Cargo.tom

```toml
[dependencies]
spider = "1.50.8"
spider = "1.50.9"
```

And then the code:
Expand Down Expand Up @@ -91,7 +91,7 @@ We have a couple optional feature flags. Regex blacklisting, jemaloc backend, gl

```toml
[dependencies]
spider = { version = "1.50.8", features = ["regex", "ua_generator"] }
spider = { version = "1.50.9", features = ["regex", "ua_generator"] }
```

1. `ua_generator`: Enables auto generating a random real User-Agent.
Expand Down Expand Up @@ -122,7 +122,7 @@ Move processing to a worker, drastically increases performance even if worker is

```toml
[dependencies]
spider = { version = "1.50.8", features = ["decentralized"] }
spider = { version = "1.50.9", features = ["decentralized"] }
```

```sh
Expand All @@ -142,7 +142,7 @@ Use the subscribe method to get a broadcast channel.

```toml
[dependencies]
spider = { version = "1.50.8", features = ["sync"] }
spider = { version = "1.50.9", features = ["sync"] }
```

```rust,no_run
Expand Down Expand Up @@ -172,7 +172,7 @@ Allow regex for blacklisting routes

```toml
[dependencies]
spider = { version = "1.50.8", features = ["regex"] }
spider = { version = "1.50.9", features = ["regex"] }
```

```rust,no_run
Expand All @@ -199,7 +199,7 @@ If you are performing large workloads you may need to control the crawler by ena

```toml
[dependencies]
spider = { version = "1.50.8", features = ["control"] }
spider = { version = "1.50.9", features = ["control"] }
```

```rust
Expand Down Expand Up @@ -269,7 +269,7 @@ Use cron jobs to run crawls continuously at anytime.

```toml
[dependencies]
spider = { version = "1.50.8", features = ["sync", "cron"] }
spider = { version = "1.50.9", features = ["sync", "cron"] }
```

```rust,no_run
Expand Down Expand Up @@ -305,7 +305,7 @@ async fn main() {

```toml
[dependencies]
spider = { version = "1.50.8", features = ["chrome"] }
spider = { version = "1.50.9", features = ["chrome"] }
```

You can use `website.crawl_concurrent_raw` to perform a crawl without chromium when needed. Use the feature flag `chrome_headed` to enable headful browser usage if needed to debug.
Expand Down
1 change: 1 addition & 0 deletions spider/src/features/cron.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ lazy_static! {
}

#[async_trait]
/// A cron job that runs for a website.
pub trait Job: Send + Sync {
/// Default implementation of is_active method will
/// make this job always active
Expand Down
1 change: 1 addition & 0 deletions spider/src/features/glob.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use crate::CaseInsensitiveString;

#[cfg(feature = "glob")]
/// expand a website url to a glob pattern set
pub fn expand_url(url: &str) -> Vec<CaseInsensitiveString> {
use itertools::Itertools;
use regex::Regex;
Expand Down
1 change: 1 addition & 0 deletions spider/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
//! - `chrome_stealth`: Enables stealth mode to make it harder to be detected as a bot.
//! - `cookies`: Enables cookies storing and setting to use for request.
//! - `cron`: Enables the ability to start cron jobs for the website.
//! - `http3`: Enables experimental HTTP/3 client.
pub extern crate bytes;
pub extern crate compact_str;
Expand Down
Loading

0 comments on commit cb2cb80

Please sign in to comment.