-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update downloader.sh #18
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,58 +6,100 @@ | |
# From: https://gist.github.com/jeffmccune/e7d635116f25bc7e12b2a19efbafcdf8 | ||
# From: https://gist.github.com/n0531m/f3714f6ad6ef738a3b0a | ||
|
||
# Script to retrieve and organize Google and Google Cloud IP ranges. | ||
|
||
set -euo pipefail | ||
set -x | ||
|
||
# Check for required dependencies | ||
for cmd in curl dig jq mktemp; do | ||
if ! command -v "$cmd" &> /dev/null; then | ||
echo "Error: $cmd is not installed or not in PATH" >&2 | ||
exit 1 | ||
fi | ||
done | ||
|
||
# Create a temporary directory and ensure cleanup on exit | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think its not need:
|
||
temp_dir=$(mktemp -d) | ||
trap 'rm -rf -- "$temp_dir"' EXIT | ||
|
||
# Function to download files with retries and error handling | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. retries not need too, ~ 3 years this repo exists and I've never had this problem |
||
download_file() { | ||
local url=$1 | ||
local output_file=$2 | ||
local retries=3 | ||
local count=0 | ||
until curl -s "$url" -o "$output_file"; do | ||
count=$((count + 1)) | ||
if [[ $count -ge $retries ]]; then | ||
echo "Error: Failed to download $url after $retries attempts" | ||
exit 1 | ||
fi | ||
sleep 2 # wait before retrying | ||
done | ||
} | ||
|
||
# get from public ranges | ||
curl -s https://www.gstatic.com/ipranges/goog.txt > /tmp/goog.txt | ||
curl -s https://www.gstatic.com/ipranges/cloud.json > /tmp/cloud.json | ||
# Parallel downloads with retries | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How much process cores available in github workers? Why would this be necessary when downloading files smaller than 1MB? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the free plan if I am not wrong you may use up to 2 cores :) |
||
download_file "https://www.gstatic.com/ipranges/goog.txt" "$temp_dir/goog.txt" & | ||
download_file "https://www.gstatic.com/ipranges/cloud.json" "$temp_dir/cloud.json" & | ||
download_file "https://developers.google.com/search/apis/ipranges/googlebot.json" "$temp_dir/googlebot.json" & | ||
wait # Ensure all downloads finish | ||
|
||
# Public GoogleBot IP ranges | ||
# From: https://developers.google.com/search/docs/advanced/crawling/verifying-googlebot | ||
curl -s https://developers.google.com/search/apis/ipranges/googlebot.json > /tmp/googlebot.json | ||
# Fetch Google netblocks using dig command | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like it could be useful |
||
fetch_netblocks() { | ||
local idx=2 | ||
local txt | ||
txt="$(dig TXT _netblocks.google.com +short @8.8.8.8 || true)" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Uhm.. maybe the best can be to collect errors as artifacts for further investigation.. but as you already said.. it works and I completely agree with you, this PR is not needed :) Keep it as my thanks message for your immensely useful repo. I use it every day since years :) |
||
while [[ -n "$txt" ]]; do | ||
echo "$txt" | tr '[:space:]+' "\n" | grep ':' | cut -d: -f2- >> "$temp_dir/netblocks.txt" | ||
txt="$(dig TXT _netblocks${idx}.google.com +short @8.8.8.8 || true)" | ||
((idx++)) | ||
done | ||
} | ||
|
||
# get from netblocks | ||
txt="$(dig TXT _netblocks.google.com +short @8.8.8.8)" | ||
idx=2 | ||
while [[ -n "${txt}" ]]; do | ||
echo "${txt}" | tr '[:space:]+' "\n" | grep ':' | cut -d: -f2- >> /tmp/netblocks.txt | ||
txt="$(dig TXT _netblocks${idx}.google.com +short @8.8.8.8)" | ||
((idx++)) | ||
done | ||
fetch_netblocks | ||
|
||
# get from other netblocks | ||
# Function to resolve DNS SPF records recursively with validation | ||
get_dns_spf() { | ||
dig @8.8.8.8 +short txt "$1" | | ||
tr ' ' '\n' | | ||
while read entry; do | ||
while read -r entry; do | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What's that for? |
||
case "$entry" in | ||
ip4:*) echo "${entry#*:}" ;; | ||
ip6:*) echo "${entry#*:}" ;; | ||
ip4:*) echo "${entry#*:}" ;; | ||
ip6:*) echo "${entry#*:}" ;; | ||
include:*) get_dns_spf "${entry#*:}" ;; | ||
esac | ||
done | ||
done || { | ||
echo "Error: Failed to fetch DNS SPF records for $1" | ||
exit 1 | ||
} | ||
} | ||
|
||
get_dns_spf "_cloud-netblocks.googleusercontent.com" >> /tmp/netblocks.txt | ||
get_dns_spf "_spf.google.com" >> /tmp/netblocks.txt | ||
# Fetch additional SPF-based netblocks with error handling | ||
get_dns_spf "_cloud-netblocks.googleusercontent.com" >> "$temp_dir/netblocks.txt" | ||
get_dns_spf "_spf.google.com" >> "$temp_dir/netblocks.txt" | ||
|
||
# Separate IPv4 and IPv6 ranges | ||
grep -v ':' "$temp_dir/goog.txt" > "$temp_dir/google-ipv4.txt" | ||
jq -r '.prefixes[] | select(.ipv4Prefix != null) | .ipv4Prefix' "$temp_dir/cloud.json" >> "$temp_dir/google-ipv4.txt" | ||
jq -r '.prefixes[] | select(.ipv4Prefix != null) | .ipv4Prefix' "$temp_dir/googlebot.json" >> "$temp_dir/google-ipv4.txt" | ||
grep -v ':' "$temp_dir/netblocks.txt" >> "$temp_dir/google-ipv4.txt" | ||
|
||
# save ipv4 | ||
grep -v ':' /tmp/goog.txt > /tmp/google-ipv4.txt | ||
jq '.prefixes[] | [.ipv4Prefix][] | select(. != null)' -r /tmp/cloud.json >> /tmp/google-ipv4.txt | ||
jq '.prefixes[] | [.ipv4Prefix][] | select(. != null)' -r /tmp/googlebot.json >> /tmp/google-ipv4.txt | ||
grep -v ':' /tmp/netblocks.txt >> /tmp/google-ipv4.txt | ||
grep ':' "$temp_dir/goog.txt" > "$temp_dir/google-ipv6.txt" | ||
jq -r '.prefixes[] | select(.ipv6Prefix != null) | .ipv6Prefix' "$temp_dir/cloud.json" >> "$temp_dir/google-ipv6.txt" | ||
jq -r '.prefixes[] | select(.ipv6Prefix != null) | .ipv6Prefix' "$temp_dir/googlebot.json" >> "$temp_dir/google-ipv6.txt" | ||
grep ':' "$temp_dir/netblocks.txt" >> "$temp_dir/google-ipv6.txt" | ||
|
||
# save ipv6 | ||
grep ':' /tmp/goog.txt > /tmp/google-ipv6.txt | ||
jq '.prefixes[] | [.ipv6Prefix][] | select(. != null)' -r /tmp/cloud.json >> /tmp/google-ipv6.txt | ||
jq '.prefixes[] | [.ipv6Prefix][] | select(. != null)' -r /tmp/googlebot.json >> /tmp/google-ipv6.txt | ||
grep ':' /tmp/netblocks.txt >> /tmp/google-ipv6.txt | ||
# Sort and deduplicate results, and ensure target directory exists | ||
output_dir="google" | ||
mkdir -p "$output_dir" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. directory already exists, this file in it |
||
sort -u "$temp_dir/google-ipv4.txt" > "$output_dir/ipv4.txt" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Without this sorting, it's gonna look bad, see #5 - Q3 |
||
sort -u "$temp_dir/google-ipv6.txt" > "$output_dir/ipv6.txt" | ||
|
||
# Verify files are written correctly | ||
if [[ ! -s "$output_dir/ipv4.txt" || ! -s "$output_dir/ipv6.txt" ]]; then | ||
echo "Error: Output files are empty or failed to generate." | ||
exit 1 | ||
fi | ||
|
||
# sort & uniq | ||
sort -V /tmp/google-ipv4.txt | uniq > google/ipv4.txt | ||
sort -V /tmp/google-ipv6.txt | uniq > google/ipv6.txt | ||
echo "IP ranges saved in $output_dir/ipv4.txt and $output_dir/ipv6.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dependencies check not need for this script, because its run only in github workers