Spark-Yarn-Ozone problems #4897

ericbrow · 2023-06-14T17:22:08Z

ericbrow
Jun 14, 2023

I'll start with a short description, but I can add a ton of details if this isn't a simple answer.
I have Ozone up and running well on 5 nodes, 1 OM, 1 SCM, 3 Data nodes. I have a sixth node for Yarn running as the Yarn Resource manager, and node managers running on the 3 ozone data nodes. Everything shows up in the Yarn UI as ready to go. The Yarn node is also set up so it can run ozone and HDFS commands without issue. HDFS is pointing to the OM as described in the documentation.

I can also successfully run the SparkPi example using the yarn node locally.
When I try to submit the SparkPi example with Yarn as the master, after several lines of the standard spark stuff, I hit a wall with the following issue:
23/06/14 12:13:25 INFO RetryInvocationHandler: java.lang.IllegalStateException, while invoking $Proxy33.submitRequest over nodeId=null,nodeAddress=ddl07ozom01.vuhl.root.mrc.local:9862 after 1 failover attempts. Trying to failover after sleeping for 4000ms.
This process retries every 2 seconds until I kill the command.
*Note that the only place the OM comes into play on this node is acting the gateway to Ozone for hdfs commands.

In the Resource Manager logs, Yarn assigns the submission a single digit application ID (Allocated new applicationId: 24), and the application ID number starts over again each time I restart yarn. No other logs show any indication of activity. All firewalls are disabled. Tcpdumps show no communication between this yarn node (where spark-submit takes place), and the OM. No security has been enabled for any of these applications yet.

Any assistance would be greatly appreciated.
Thank you,
Eric Brown

Answered by ericbrow

Aug 15, 2023

Thanks to @GeorgeJahad comments in the last community call, I was able to get all this working.

The key error in my issue was of trying to cast an unshaded variable as shaded. The fix for this was instead of adding $OZONE_HOME/share/ozone/lib/ozone-filesystem-hadoop3-1.3.0.jar into the necessary places for spark and HDFS, I used ozone-filesystem-hadoop3-client-1.3.0.jar.

Following this, I did have to re-do my configs to use o3fs instead of ofs, but now my spark jobs are successfully submitted to yarn, and ran across the yarn cluster. I also had to get the log4j jar file added to the spark classpath, but those errors were rather explicit in what was missing.

I'd like to thank everyone who …

View full answer

fapifta · 2023-06-16T07:16:39Z

fapifta
Jun 16, 2023
Collaborator

The log message itself suggests that something went wrong with sending the request to the Ozone Manager. Can you please enable debug level logging for the hadoop root logger in the job, and collect the exception stack trace that will be printed with this same message on debug level by the RetryInvocationHandler? I guess that will tell us more about the exact failure with the request.

0 replies

ericbrow · 2023-07-26T20:25:52Z

ericbrow
Jul 26, 2023
Author

Sorry it took me a while to get back to this discussion. I've increased the logging levels for spark to see why it wasn't getting to the Ozone manager. On the machine where I run the spark command, I am able to run both hdfs dfs and ozone fs commands to read and write to the Ozone file system.

I'm attaching the extended output of the spark-submit command, and it looks like it may be related to HDDS-6570?

Any guidance is greatly appreciated.
Thanks,
Eric
spark-submit.log

0 replies

jojochuang · 2023-07-26T20:35:59Z

jojochuang
Jul 26, 2023
Collaborator

Yup. https://issues.apache.org/jira/browse/HDDS-6926 addresses this issue.

0 replies

ericbrow · 2023-07-27T15:10:59Z

ericbrow
Jul 27, 2023
Author

Thanks so much for the reply.

According to the discussion on the Jira ticket, the problem comes from different jars using shaded vs unshaded, and the ticket is marked at closed, which leads me to think that this issue was resolved in newer versions of Ozone.

I'm using Spark 3.4.0 (built for use with Hadoop) and Ozone 1.3.0, along with the version of Yarn included with Hadoop 3.3.5, and I'm still getting the error described in above.

I'm sorry if I'm dense here, but is there a way to get Spark working with Ozone without recompiling jar files?

Thanks again,
Eric

0 replies

ericbrow · 2023-08-15T20:33:40Z

ericbrow
Aug 15, 2023
Author

Thanks to @GeorgeJahad comments in the last community call, I was able to get all this working.

The key error in my issue was of trying to cast an unshaded variable as shaded. The fix for this was instead of adding $OZONE_HOME/share/ozone/lib/ozone-filesystem-hadoop3-1.3.0.jar into the necessary places for spark and HDFS, I used ozone-filesystem-hadoop3-client-1.3.0.jar.

Following this, I did have to re-do my configs to use o3fs instead of ofs, but now my spark jobs are successfully submitted to yarn, and ran across the yarn cluster. I also had to get the log4j jar file added to the spark classpath, but those errors were rather explicit in what was missing.

I'd like to thank everyone who helped for their assistance in getting around this issue.
Thanks,
Eric

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark-Yarn-Ozone problems #4897

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Spark-Yarn-Ozone problems #4897

ericbrow Jun 14, 2023

Replies: 5 comments

fapifta Jun 16, 2023 Collaborator

ericbrow Jul 26, 2023 Author

jojochuang Jul 26, 2023 Collaborator

ericbrow Jul 27, 2023 Author

ericbrow Aug 15, 2023 Author

ericbrow
Jun 14, 2023

fapifta
Jun 16, 2023
Collaborator

ericbrow
Jul 26, 2023
Author

jojochuang
Jul 26, 2023
Collaborator

ericbrow
Jul 27, 2023
Author

ericbrow
Aug 15, 2023
Author