Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6157] Download artifacts from CDN if available #4901

Merged
merged 2 commits into from
Dec 19, 2024

Conversation

adoroszlai
Copy link
Contributor

@adoroszlai adoroszlai commented Dec 13, 2024

What changes were proposed in this pull request?

Current artifacts available in CDN (dlcdn.apache.org) may get removed without notice when new releases appear. To avoid broken links, build scripts contain permanent addresses from archive.apache.org. But download from archive.apache.org may be slow:

Thu, 05 Dec 2024 08:39:53 GMT [INFO] --- download:1.6.0:wget (download-sparkr-files) @ r ---
Thu, 05 Dec 2024 08:39:54 GMT Warning:  No signatures were supplied, skipping file validation
Thu, 05 Dec 2024 08:39:54 GMT [INFO] Read Timeout is set to 60000 milliseconds (apprx 1 minutes)
Thu, 05 Dec 2024 08:45:46 GMT [INFO] Expanding: /home/runner/work/zeppelin/zeppelin/rlang/target/spark-3.5.3-bin-without-hadoop.tgz into /home/runner/work/zeppelin/zeppelin/rlang/target

Apache Infra's closer.lua script can redirect to CDN or archive, depending on artifact availability.

This change replaces archive.apache.org URLs, and one instance of dist.apache.org, with their closer.lua equivalent. Output filename has to be specified for wget unfortunately.

https://issues.apache.org/jira/browse/ZEPPELIN-6157

How was this patch tested?

Tried some of the URLs locally, both from CLI (curl -L --head) and regular build (mvn -DskipTests clean package).

Full CI:

@adoroszlai
Copy link
Contributor Author

Could not get content org.apache.http.conn.HttpHostConnectException: Connect to archive.apache.org:443 [archive.apache.org/65.108.204.189, archive.apache.org/2a01:4f9:1a:a084:0:0:0:2] failed: Network is unreachable (connect failed)

https://github.com/apache/zeppelin/actions/runs/12319979560/job/34388256366?pr=4901#step:6:1531

Error seems to be unrelated, happens in other runs:
https://github.com/apache/zeppelin/actions/runs/12080785392/job/34254210876#step:6:572
https://github.com/apache/zeppelin/actions/runs/11996006824/job/33440092834#step:8:565

@Reamer
Copy link
Contributor

Reamer commented Dec 14, 2024

I have had the experience that download servers which return the lua script were not available. What is important to me is a stable download and a stable CI environment. Speed is of secondary importance.

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Reamer Reamer merged commit b6e40d4 into apache:master Dec 19, 2024
17 checks passed
asfgit pushed a commit that referenced this pull request Dec 19, 2024
## What changes were proposed in this pull request?

Current artifacts available in CDN (`dlcdn.apache.org`) may get removed without notice when new releases appear.  To avoid broken links, build scripts contain permanent addresses from `archive.apache.org`.  But download from `archive.apache.org` may be slow:

```
Thu, 05 Dec 2024 08:39:53 GMT [INFO] --- download:1.6.0:wget (download-sparkr-files) <at> r ---
Thu, 05 Dec 2024 08:39:54 GMT Warning:  No signatures were supplied, skipping file validation
Thu, 05 Dec 2024 08:39:54 GMT [INFO] Read Timeout is set to 60000 milliseconds (apprx 1 minutes)
Thu, 05 Dec 2024 08:45:46 GMT [INFO] Expanding: /home/runner/work/zeppelin/zeppelin/rlang/target/spark-3.5.3-bin-without-hadoop.tgz into /home/runner/work/zeppelin/zeppelin/rlang/target
```

Apache Infra's [`closer.lua` script](https://infra.apache.org/release-download-pages.html#closer) can redirect to CDN or archive, depending on artifact availability.

This change replaces `archive.apache.org` URLs, and one instance of `dist.apache.org`, with their `closer.lua` equivalent.  Output filename has to be specified for `wget` unfortunately.

https://issues.apache.org/jira/browse/ZEPPELIN-6157

## How was this patch tested?

Tried some of the URLs locally, both from CLI (`curl -L --head`) and regular build (`mvn -DskipTests clean package`).

Full CI:
- quick: https://github.com/adoroszlai/zeppelin/actions/runs/12319072153
- frontend: https://github.com/adoroszlai/zeppelin/actions/runs/12319072142
- core: https://github.com/adoroszlai/zeppelin/actions/runs/12319072156

Closes #4901 from adoroszlai/ZEPPELIN-6157.

Signed-off-by: Philipp Dallig <[email protected]>
@Reamer
Copy link
Contributor

Reamer commented Dec 19, 2024

Merged into master/branch-0.12

@adoroszlai adoroszlai deleted the ZEPPELIN-6157 branch December 19, 2024 09:44
@adoroszlai
Copy link
Contributor Author

Thanks @Reamer for reviewing and merging this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants