-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building Zeppelin 0.6.0 with R on AWS EMR Cluster failed #17
Comments
It looks to me that Hadoop is failing to start. Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set. Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?
|
Thanks Elbamos I will try install zeppelin with just -Pr The spark is installed on EMR cluster by default and it is external to How can I set the spark_home to use the external spark? In On Fri, May 6, 2016 at 12:41 PM, elbamos [email protected] wrote:
|
Hello Elbamos I tried install zeppeln with just -Pr mvn clean package -Pr -DskipTests And set the spark_home in zeppelin-env.sh as below: export MASTER= But I still could get R working in zeppelin, when I tried the R commmand it org.apache.spark.SparkException: Yarn application has already ended! It Looks like the Yarn is not properly configured with Spark. Any idea what I The EMR cluster is created with two applications Spark and Ganglia. Thanks On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu [email protected] wrote:
|
I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.
|
Hello Elbamos I already get the regular spark interpreter working before here is the mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn And the configuration for Zeppelin: export MASTER=yarn-client But when I try to add R in zeppelin, I did maven complie command: mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn And the configuration for Zeppelin: export MASTER=yarn-client Now everything is not working it all give this error: java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener On Tue, May 10, 2016 at 1:23 PM, elbamos [email protected] wrote:
|
Hello Elbamos I misunderstood you, I will try make R working with spark first not with Should I compile zeppelin using this command: mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr and set zeppelin configuration: export MASTER=spark_maste node Thanks On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu [email protected] wrote:
|
Before you try that how about
I think you should be able to remove all the profiles except R and yarn but let's try this first.
|
Hello Elbamos I tried your suggestion again with following command and configuration:
Then when I run zeppelin for any command I got an error: org.apache.spark.SparkException: Yarn application has already ended! It After that I was trying to remove Yarn from zeppelin and using the existing configuration for zeppelin
When I run any zeppelin command I got this error: org.apache.spark.SparkException: Could not parse Spark Master URL: '' Thanks On Tue, May 10, 2016 at 4:52 PM, elbamos [email protected] wrote:
|
I think your spark master should be set to (For e.g)export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal |
Thanks Akshay I will try that. On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash [email protected]
|
Hello Akshay I did a list instances command on my EMR cluster master node, here is the { I only have 1 master node and 1 slave node. Based on your previous reply, I For this cluster the spark master should be set to Thanks On Mon, May 16, 2016 at 1:09 PM, Pengcheng Liu [email protected] wrote:
|
Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to OR if you prefer using linux editor in CentOS $ cd /etc/spark/conf If the above doesn't work try working it without the listening port. |
Hello guys I tried Akshay's suggestion, but it still didn't work for me. Compiled just R in zeppelin: Configuration for zeppelin : export MASTER=spark://ip-172-31-59-226.ec2.internal Now when running R command in notebook, it gave this error: org.apache.spark.SparkException: Invalid master URL: Looks like all I need is the correct spark master URL, but I couldn't find http://stackoverflow.com/questions/30760792/how-to-find-spark-master-url-on-amazon-emr From this link, my understanding is that EMR spark cluster is created with Can anyone help me with this battle? I have been struggled with issue for Thanks in Advance On Mon, May 16, 2016 at 1:27 PM, Pengcheng Liu [email protected] wrote:
|
Hello Akshay After adding the port to spark master URL and restarting the zeppelin org.apache.thrift.transport.TTransportException at This looks like one step close to get it working. Thanks On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash [email protected]
|
Hello guys Thanks for helping me on this issue, really appreciated your time and Instead of piece of information, I want to give all the information so you The EMR cluster I created is with release label emr-4.4.0 with I used the following script to install zeppelin 0.6.0 as bootstrap action #! /bin/bash -ex if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true"
export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3 Building Zeppelin with R
SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf Getting cluster IDCLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"' Getting Spark host URL from aws emr list-instances commandSPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID Putting values in zeppelin-env.sh
export MASTER=spark://${SPARK_MASTER_URL}:7077 Start the Zeppelin daemon
fi Previously, when I built zeppelin with this command: mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the I tried the script running after the cluster is created, the variable Does this mean the spark is not installed when I try to install zeppelin as If that is the case, why the previous command is working fine? This original script is coming from this post: https://gist.github.com/andershammar/224e1077021d0ea376dd Thanks On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu [email protected] wrote:
|
Hello Everyone We finally figured out the issue, instead of using %r we should %knitr to Thanks On Tue, May 17, 2016 at 10:38 AM, Pengcheng Liu [email protected] wrote:
|
Both of those should work without a problem. If you are using the latest Zeppelin from master, though, there are a lot of recently introduced bugs that could cause this. You may be happier using the version from my repo.
|
Hey guys, |
I created a EMR cluster with Zeppelin on AWS using the instruction on below link:
https://gist.github.com/andershammar/224e1077021d0ea376dd
After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter
successfully,
add some R packages before building zeppelin in maven and change the mvn complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests
However, when start to write R command in zeppelin notebook, I got this error:
java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
Here is the configuration of AWS EMR Cluster:
Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0
It seems like a AWS issue, but don't know what I did wrong
The text was updated successfully, but these errors were encountered: