-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZEPPELIN-5969] Remove Hadoop2 and move to Hadoop3 shaded client #4691
Conversation
<version>${hadoop.version}</version> | ||
<scope>compile</scope> | ||
</dependency> | ||
|
||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<dependency> | |
<dependency> | |
<groupId>org.apache.hadoop</groupId> | |
<artifactId>hadoop-client-api</artifactId> | |
<version>${hadoop.version}</version> | |
<scope>compile</scope> | |
</dependency> | |
<dependency> |
Without the dependency on hadoop-client-api, r.ir and r.shiny throw an exception upon execution: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking. I think hadoop-client-runtime
will pull hadoop-client-api
, thus there should be no problem.
This PR is not ready (I'm a little busy these days, I may can continue this work next week)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadoop-client-api
is a runtime dependency of hadoop-client-runtime
, so maven-shaded-plugin
should include it. However, I have created the R interpreter package on my local machine and without adding the dependency explicitly, hadoop-client-api
is not present in the shaded jar. (But maybe I am doing somthing wrong.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know that the maven-shade plugin also adds runtime dependencies? After a first overview I think that the dependency hadoop-client-api
has to be added manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description at https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html says:
Requires dependency resolution of artifacts in scope: runtime
But I am really not sure, how all dependencies are resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, I know where the issue comes from.
Do we want to respect the hadoop.deps.scope
if we switch to the Hadoop Shaded client?
Note: Spark drops such profiles and always ships Hadoop Shaded client jars into the binary artifacts.
<hadoop.deps.scope>provided</hadoop.deps.scope>
...
<profile>
<id>include-hadoop</id>
<properties>
<hadoop.deps.scope>compile</hadoop.deps.scope>
</properties>
</profile>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for the introduction, I believe, was the interpreter launch mode YARN. See #3786
I don't use this setup myself. As far as I understand it correctly, YARN has already loaded the Hadoop dependencies and therefore they don't need to be in Zeppelin additionally. Perhaps @zjffdu can say more about this.
I am of the opinion that we should include the shaded Hadoop dependency, as this is not available in Kubernetes, Docker, for example. However, we should make sure that we do not deliver the library more than once (e.g. additionally in plugins See #3817)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand it correctly, YARN has already loaded the Hadoop dependencies and therefore they don't need to be in Zeppelin additionally.
There is a switch in YARN to enable/disable Hadoop class population for containers.
we should make sure that we do not deliver the library more than once
QQ, I understand we should not include Hadoop classes in plugins, because they will be loaded into the same JVM with Zeppelin server, so that they can share the Hadoop classes. What about the interpreteres? I assume the interpreters are always run in dedicated JVMs, so Hadoop classes seem always necessary (except for those runtimes who already provided Hadoop classes, e.g. Spark, Flink)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a switch in YARN to enable/disable Hadoop class population for containers.
I don't know how this is used in Zeppelin.
QQ, I understand we should not include Hadoop classes in plugins, because they will be loaded into the same JVM with Zeppelin server, so that they can share the Hadoop classes. What about the interpreteres? I assume the interpreters are always run in dedicated JVMs, so Hadoop classes seem always necessary (except for those runtimes who already provided Hadoop classes, e.g. Spark, Flink)?
Correct the Zeppelin server & the zengine use the same JVM as the Zeppelin plugins.
In my opinion, the interpreters usually run in separate JVM instances. We should set the scope of Hadoop to Provided in the interpreter, because I think the Hadoop code in the interpreter is only in use for YARN. See
zeppelin/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/YarnUtils.java
Line 20 in 56da029
import org.apache.hadoop.conf.Configuration; |
Maybe there will be a way to remove the dependency at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised this question, the rlang interpreter can work both independently and combine with SparkR, for independent case, Hadoop deps are not required, for SparkR case, the Hadoop jars are handled by Spark runtime, either ship Hadoop jars in SPARK_HOME/jars, or setup HADOOP_CLASSPATH properly, so the rlang interpreter don't need to ship Hadoop deps
da4bb5e
to
167b290
Compare
I basically have fixed all compile and test issues, the next step is to split it into several small PRs to speed up the review process. I think we should start with the interpreter modules one by one, and then zengine, server and other modules, eventually dropping the @Reamer could you give some advice? |
I would prefer a larger PR. Where individual tasks are contained in commits. It was clear that the drop of Hadoop2 is very large. Thank you for your work so far. I think it's great that you have deleted all the excludes in the parent Btw. I do not insist on co-authorship. |
unfortunately, I found the IT does not run properly now, see #4699, we may need to postpone this PR after recovering IT |
7e72132
to
b629b1d
Compare
<dependency> | ||
<groupId>org.apache.hive</groupId> | ||
<artifactId>hive-metastore</artifactId> | ||
<version>${hive.version}</version> | ||
<scope>provided</scope> | ||
<exclusions> | ||
<exclusion> | ||
<artifactId>hadoop-auth</artifactId> | ||
<groupId>org.apache.hadoop</groupId> | ||
<groupId>org.apache.hive</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive's transitive deps management is crazy, the exclusion list is borrowed from Apache Flink's pom.xml
@Reamer it's ready for review, please take a look when you have time |
<hadoop3.2.version>3.2.4</hadoop3.2.version> | ||
<hadoop3.3.version>3.3.6</hadoop3.3.version> | ||
<hadoop.version>${hadoop2.7.version}</hadoop.version> | ||
<hadoop.version>${hadoop3.3.version}</hadoop.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hadoop 3.2 is EOL, from the community perspective, we'd better choose a supported version.
https://lists.apache.org/thread/ybygrhvqok0f44s7yzm6c24n0b0s727s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another reason is, we use the hadoop-client-minicluster
to launch Mini YARN cluster for testing in some modules, while since Spark 3.2.0, the pre-built Spark tgz ships Hadoop 3.3.x jars, Hadoop requires that client version <= server version, so we must use Hadoop 3.3 to launch Mini YARN cluster to allow Spark works properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this PR. It removes a lot of unnecessary legacy from the Zeppelin code. I have set a few comments that I would like to have clarified before the merge.
The Python 3.8 test failure should be addressed in #4748 |
@Reamer all failed tests are known flaky tests, this patch should be good to go :) |
I will merge the pull request on Wednesday as long as no further comments are received. |
Could this change break the build? I try to collect and get an error
|
@Armadik mind providing a reproducible step? e.g. build command, start command, OS platform, JDK version, etc. |
I see an error when running the zeppelin.sh script Ubuntu 22.04.4 LTS apt install -y curl git maven openjdk-11-jdk npm libfontconfig r-base-dev r-cran-evaluate sudo tar -zxf apache-maven-3.6.3-bin.tar.gz -C /usr/local/ sudo ln -s /usr/local/apache-maven-3.6.3/bin/mvn /usr/local/bin/mvn cd Documents/ git clone https://github.com/apache/zeppelin.git cd zeppelin/ export MAVEN_OPTS="-Xms1024M -Xmx4096M -XX:MaxMetaspaceSize=1024m -XX:-UseGCOverheadLimit -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=war" ./mvnw -B package -DskipTests -Pbuild-distr -Pspark-3.3 -Pinclude-hadoop -Phadoop3 -Pspark-scala-2.12 -Pweb-angular -Pweb-dist -pl '!groovy,!submarine,!flink,!cassandra,!jdbc,!bigquery,!alluxio,!mongodb,!neo4j' -am --no-transfer-progress |
|
@Armadik sorry, can not reproduce, both classic and new UI are good on my side. |
I tried a clean build. It seems the problem was in my environment( |
* [MINOR] Update jdbc.md Change postgres username (apache#4704) change postgres default username mysql_user/mysql_password to pg_user/pg_password. * Bump mathjax from 2.7.0 to 3.0.0 in /zeppelin-web (apache#4705) Bumps [mathjax](https://github.com/mathjax/MathJax) from 2.7.0 to 3.0.0. - [Release notes](https://github.com/mathjax/MathJax/releases) - [Commits](mathjax/MathJax@2.7.0...3.0.0) --- updated-dependencies: - dependency-name: mathjax dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump org.apache.shiro:shiro-core from 1.10.0 to 1.13.0 (apache#4703) Bumps [org.apache.shiro:shiro-core](https://github.com/apache/shiro) from 1.10.0 to 1.13.0. - [Release notes](https://github.com/apache/shiro/releases) - [Changelog](https://github.com/apache/shiro/blob/main/RELEASE-NOTES) - [Commits](apache/shiro@shiro-root-1.10.0...shiro-root-1.13.0) --- updated-dependencies: - dependency-name: org.apache.shiro:shiro-core dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump follow-redirects from 1.15.3 to 1.15.4 in /zeppelin-web (apache#4702) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.3 to 1.15.4. - [Release notes](https://github.com/follow-redirects/follow-redirects/releases) - [Commits](follow-redirects/follow-redirects@v1.15.3...v1.15.4) --- updated-dependencies: - dependency-name: follow-redirects dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump follow-redirects from 1.15.3 to 1.15.4 in /zeppelin-web-angular (apache#4701) Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.3 to 1.15.4. - [Release notes](https://github.com/follow-redirects/follow-redirects/releases) - [Commits](follow-redirects/follow-redirects@v1.15.3...v1.15.4) --- updated-dependencies: - dependency-name: follow-redirects dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [MINOR] Upgrade jackson version in /zeppelin-server (apache#4635) * [HOTFIX] Disable shell interpreter (apache#4708) * Fix GCSNotebookRepoTests (apache#4711) * [HOTFIX] Check permission when updating cron information (apache#4631) * [HOTFIX] Check permission when updating cron information * [HOTFIX] Fix commented * [HOTFIX] Check permission when updating cron information * [HOTFIX] Check permission when updating cron information * [HOTFIX] Check permission when updating cron information * [MINOR] Change minimum java version to 11 in docs (apache#4710) * [MINOR] Change minimum java version to 11 in docs * [MINOR] Change minimum java version to 11 in docs * [HOTFIX] Change the link of `helium.json` from S3 to zeppelin.apache.org (apache#4713) * [ZEPPELIN-5990] Disable sensitive configuration for JDBC url (apache#4709) * [ZEPPELIN-5990] Disable sensitive configuration for JDBC url * [ZEPPELIN-5990] Disable sensitive configuration for JDBC url * [ZEPPELIN-5995] Update Kubernetes Library and hopefully fix flaky tests (apache#4712) * [MINOR] Set Snapshot version to 0.12.0-SNAPSHOT (apache#4720) * change version to 0.11.1-SNAPSHOT * change version * change to 0.12.0-SNAPSHOT * Bump org.postgresql:postgresql from 42.4.3 to 42.7.2 in /jdbc (apache#4723) Bumps [org.postgresql:postgresql](https://github.com/pgjdbc/pgjdbc) from 42.4.3 to 42.7.2. - [Release notes](https://github.com/pgjdbc/pgjdbc/releases) - [Changelog](https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md) - [Commits](https://github.com/pgjdbc/pgjdbc/commits) --- updated-dependencies: - dependency-name: org.postgresql:postgresql dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [HOTFIX] Escape Ldap search filters (apache#4714) * [HOTFIX] Escape envs when using `.conf` (apache#4715) * Fix the new zeppelin ui. The specific reason is that the use of excessively wide column widths resulted in the remaining fields being squeezed out of the screen, and nz table did not have a scroll bar set to display scrolling. (apache#4727) * Bump ip from 1.1.8 to 1.1.9 in /zeppelin-web (apache#4724) Bumps [ip](https://github.com/indutny/node-ip) from 1.1.8 to 1.1.9. - [Commits](indutny/node-ip@v1.1.8...v1.1.9) --- updated-dependencies: - dependency-name: ip dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump lodash from 4.17.15 to 4.17.21 in /zeppelin-web-angular (apache#4689) Bumps [lodash](https://github.com/lodash/lodash) from 4.17.15 to 4.17.21. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.15...4.17.21) --- updated-dependencies: - dependency-name: lodash dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [HOTFIX] Escape HeliumPackage information (apache#4728) * [NO-ISSUE] Use reload4j instead of log4j (apache#4719) * [NO-ISSUE] Upgrade org.json jar (apache#4722) * [ZEPPELIN-6001] k8s images fix (apache#4729) * Update zeppelin-interpreter Dockerfile fixed ARG version; Error "bzip2: Compressed file ends unexpectedly;" fixed by adding ---no-iri flag to wget * Update zeppelin-server Dockerfile fixed ARG version; fixed JAVA_HOME env. * Update scripts/docker/zeppelin-interpreter/Dockerfile Co-authored-by: Philipp Dallig <[email protected]> --------- Co-authored-by: Philipp Dallig <[email protected]> * [ZEPPELIN-6000] Polish some files mainly in zengine (apache#4731) * some misc polish * some misc polish * [ZEPPELIN-6003] Log source info of SQL in JDBCInterpreter (apache#4732) * [ZEPPELIN-6003] Log detail info of SQL in JDBCInterpreter * Update Co-authored-by: Philipp Dallig <[email protected]> --------- Co-authored-by: Philipp Dallig <[email protected]> * [ZEPPELIN-6002] Fix completer NPE (apache#4730) Co-authored-by: Philipp Dallig <[email protected]> * [ZEPPELIN-5986] Bump Maven surefire/failsafe plugins to recover JUnit5 tests (apache#4734) * [ZEPPELIN-5986] Re-enable Junit 5 integration tests by upgrading maven plugins * Fix SparkIntegrationTest * Disable the LivyInterpreterIT * Selenium * NPE * Disable testEditOnDoubleClick * [ZEPPELIN-6005] Update Kyuubi JDBC docs (apache#4738) * Bump org.apache.commons:commons-configuration2 from 2.8.0 to 2.10.1 (apache#4740) Bumps org.apache.commons:commons-configuration2 from 2.8.0 to 2.10.1. --- updated-dependencies: - dependency-name: org.apache.commons:commons-configuration2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [MINOR] Fix missing changed versions (apache#4737) * [MINOR] Fix missing changed versions * nit * Bump com.nimbusds:nimbus-jose-jwt in /zeppelin-server (apache#4733) Bumps [com.nimbusds:nimbus-jose-jwt](https://bitbucket.org/connect2id/nimbus-jose-jwt) from 9.13 to 9.37.2. - [Changelog](https://bitbucket.org/connect2id/nimbus-jose-jwt/src/master/CHANGELOG.txt) - [Commits](https://bitbucket.org/connect2id/nimbus-jose-jwt/branches/compare/9.37.2..9.13) --- updated-dependencies: - dependency-name: com.nimbusds:nimbus-jose-jwt dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [ZEPPELIN-6007] Enhance release scripts for tar shasum commands detection (apache#4747) * [ZEPPELIN-6007] Enhance release scripts for tar shasum commands detection * fix * nit * indent * [ZEPPELIN-6008] Fix parameter usage of bokeh in test case (apache#4748) * [ZEPPELIN-6008] Pin plotly 5.19.0 * Revert "[ZEPPELIN-6008] Pin plotly 5.19.0" This reverts commit 60d9ce9. * Pin bokeh=3.3.4 * Revert "Pin bokeh=3.3.4" This reverts commit 44ccc98. * Remove usage of deprecated paramater legend * [ZEPPELIN-5969] Remove Hadoop2 and move to Hadoop3 shaded client (apache#4691) * Drop hadoop2 in github actions * Update docs * Drop hadoop2 support * Remove hadoop2 integration tests * findbugs use the same version in all modules * Use hadoop3.3 for tests * Move to scala 2.12 * Try to fix flink * Usage of metals * Remove duplicate version and groupid * Fix Flink with Hadoop3 * fix log * R * fix * fix * fix * fix * hadoop-3.3 * fix * fix * Address comments * address comments --------- Co-authored-by: Philipp Dallig <[email protected]> * [HOTFIX] Remove rendering helium description as HTML in Frontend (apache#4755) * Bump express from 4.18.2 to 4.19.2 in /zeppelin-web-angular (apache#4744) Bumps [express](https://github.com/expressjs/express) from 4.18.2 to 4.19.2. - [Release notes](https://github.com/expressjs/express/releases) - [Changelog](https://github.com/expressjs/express/blob/master/History.md) - [Commits](expressjs/express@4.18.2...4.19.2) --- updated-dependencies: - dependency-name: express dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * update scripts/docker/zeppelin/bin/Dockerfile to JDK11 (apache#4753) * update scripts/docker/zeppelin/bin/Dockerfile to JDK11 * update miniconda to py3.9 * Update Dockerfile to remove extra line * Bump some apache common libs (apache#4757) * [ZEPPELIN-6006] Remove command line applications when downloading applications (apache#4746) * Move Files with java * Use java to download external dependecies * Improve code after review * Correct Mirror-URL and compilation * [ZEPPELIN-6016] Rewrite and enable Livy integration tests (apache#4743) * wip * nit * nit * wip * wip * fix * [ZEPPELIN-5973] Bump Livy 0.8.0-incubating * nit * Spark 3.5.1 * test * fix * comment * nit * nit * nit * [ZEPPELIN-6017] Revert changes about ZEPPELIN_IDENT_STRING in ZEPPELIN-5421 * [ZEPPELIN-6015] Update ci-action plugins (apache#4759) * [ZEPPELIN-5999] Reduce instance objects from Zeppelin (apache#4726) * Remove ZeppelinConfiguration Singelton and add MiniZeppelinServer * Add ZeppelinConfiguration to Interpreter * Remove static pluginmanager and configstorage * Inject servicelocator into SessionConfiguratior * use custom serviceLocator in integration tests * Reorder code * code cleanup * Add ZeppelinConfiguration as class variable to InterpreterOption * Avoid leaking third-party libs * [ZEPPELIN-6019] Remove Submarine support (apache#4763) * [ZEPPELIN-6022] Skip decryption of credentials.json when file is empty (apache#4765) * Skip decryption when empty * Use more elegant empty json string check * [ZEPPELIN-6018] Update gRPC version from 1.51.0 to 1.55.1 for successful Apache Zeppelin build on s390x architecture (apache#4758) * Update grpc version in pom.xml for successful Apache Zeppelin build on s390x architecture In the s390x architecture, the Apache Zeppelin package builds successfully when the grpc dependency version in the pom.xml file is changed from version 1.51.0 to version 1.62.2. Therefore, I have updated the grpc version in the pom.xml file. * Update LICENSE Updated the grpc version in LICENSE from 1.51.0 to 1.62.2 * grpc-version-change-to 1.55.1 * [ZEPPELIN-6027] Enhanced Integration with Apache Kyuubi (apache#4770) * [ZEPPELIN-6027] Enhanced Integration with Apache Kyuubi * fix style * [ZEPPELIN-6028] Enhance default value assignment for ZEPPELIN_IDENT_STRING (apache#4772) * [ZEPPELIN-6029] Set COPYFILE_DISABLE=1 for macOS tar (apache#4774) * [ZEPPELIN-6029] Add --disable-copyfile for macOS tar * COPYFILE_DISABLE=1 * ODP-1644: Removed unsupported interpreter * ODP-1315: Bumpup loadash to 4.x * ODP: fix odp version * ODP-303 New UI build fix for missing projects * [ODP-1315] ODP-1644: CVE fixes * ODP-1644: Update to Spark Version * ODP-1644: Added libthrift, updated hadoop and phoenix versions * ODP-1644: Hive 4.0.0 support for JDBC * ODP-1644: Added two variables that were previously missed * ODP-1644: Fixed node/npm version issue when testing with arm64 MacOS * ODP-1644: Increased timeouts/sleeps to pass tests * ODP-1644: Increased ms in Thread.sleep * ODP-1829: Updated pom.xml files * ODP-1829: Removed uneeded tests * ODP-1829: Fixed tests failures * ODP-1829: Disabled Finicky Helium Test * ODP-1829: Increased wait time so that tests would not fail * ODP-1829: Thread.Sleep -> Thread.sleep * Zeppelin Build fixes * ODP-1644: Updated version number to 11.2.3.3.6.0-1 --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: bigpear0201 <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Shefali Singh <[email protected]> Co-authored-by: Jongyoul Lee <[email protected]> Co-authored-by: Philipp Dallig <[email protected]> Co-authored-by: PJ Fanning <[email protected]> Co-authored-by: myongyun <[email protected]> Co-authored-by: th1nksnow <[email protected]> Co-authored-by: Manhua <[email protected]> Co-authored-by: Manhua <[email protected]> Co-authored-by: Cheng Pan <[email protected]> Co-authored-by: Cheng Pan <[email protected]> Co-authored-by: Gayle <[email protected]> Co-authored-by: zeotuan <[email protected]> Co-authored-by: Aditi Sharma <[email protected]> Co-authored-by: Prabhjyot Singh <[email protected]> Co-authored-by: shubhamsharma <[email protected]>
What is this PR for?
This PR is based on #4674.
This PR aligns Hadoop dependencies version (3.3.6) in all Zeppelin modules, additionally, it switches to Hadoop shaded client (introduced in HADOOP-11804 since Hadoop 3.0.0) which is recommended by Hadoop, Spark also switched to Hadoop shaded client in SPARK-33212 (3.2.0), which dramatically reduce the Hadoop dependency management efforts, basically, only 3 hadoop jars are required now:
hadoop-client-api
hadoop-client-runtime
hadoop-client-minicluster
(only for testing)There is a JDK 10 compatible issue fixed in HIVE-21508 (Hive 2.3.7), so this PR also upgrades Flink's Hive version from 2.3.4 to 2.3.7. Hive 2.3.7 is chosen because Flink supports Hive 2.3.8 and 2.3.9 in FLINK-26739 (1.16.0) while Zeppelin supports Flink 1.15~1.17.
What type of PR is it?
Improvement
What is the Jira issue?
ZEPPELIN-5969
How should this be tested?
The test cases are modified corresponding, and all GHA tests are passed
Screenshots (if appropriate)
Questions: