Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building Zeppelin 0.6.0 with R on AWS EMR Cluster failed #17

Open
zenonlpc opened this issue May 6, 2016 · 18 comments
Open

Building Zeppelin 0.6.0 with R on AWS EMR Cluster failed #17

zenonlpc opened this issue May 6, 2016 · 18 comments

Comments

@zenonlpc
Copy link

zenonlpc commented May 6, 2016

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong

@elbamos
Copy link
Owner

elbamos commented May 6, 2016

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with spark installed under Zeppelin? The difference is whether the SPARK_HOME env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc [email protected] wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

@zenonlpc
Copy link
Author

zenonlpc commented May 6, 2016

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to
zeppelin.

How can I set the spark_home to use the external spark? In
zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos [email protected] wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with
spark installed under Zeppelin? The difference is whether the SPARK_HOME
env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install -Pr
-DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc [email protected] wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on
below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to
build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the mvn
complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this
error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@zenonlpc
Copy link
Author

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER=
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1
-Dspark.executor.cores=8 -Dspark.executor.memory=9193M
-Dspark.default.parallelism=16"
export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R commmand it
gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76)
at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70)
at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50)
at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea what I
did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks
Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu [email protected] wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to
zeppelin.

How can I set the spark_home to use the external spark? In
zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos [email protected] wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with
spark installed under Zeppelin? The difference is whether the SPARK_HOME
env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install
-Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc [email protected] wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on
below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to
build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the
mvn complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6
-Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this
error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@elbamos
Copy link
Owner

elbamos commented May 10, 2016

I'm not sure, but I do know that setting Zeppelin up to use yarn can be tricky. I would try to get it working with the regular spark interpreter first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc [email protected] wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER=
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1
-Dspark.executor.cores=8 -Dspark.executor.memory=9193M
-Dspark.default.parallelism=16"
export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R commmand it
gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76)
at
org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70)
at org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50)
at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea what I
did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks
Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu [email protected] wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to
zeppelin.

How can I set the spark_home to use the external spark? In
zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos [email protected] wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or with
spark installed under Zeppelin? The difference is whether the SPARK_HOME
env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install
-Pr -DskipTests , and running with external Spark and see if that fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc [email protected] wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction on
below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was able to
build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the
mvn complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6
-Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got this
error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

@zenonlpc
Copy link
Author

Hello Elbamos

I already get the regular spark interpreter working before here is the
maven complie command I used :

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client
export HADOOP_HOME=/usr/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

But when I try to add R in zeppelin, I did maven complie command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-Pr -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now everything is not working it all give this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

On Tue, May 10, 2016 at 1:23 PM, elbamos [email protected] wrote:

I'm not sure, but I do know that setting Zeppelin up to use yarn can be
tricky. I would try to get it working with the regular spark interpreter
first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc [email protected] wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER=
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1
-Dspark.executor.cores=8 -Dspark.executor.memory=9193M
-Dspark.default.parallelism=16"
export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R commmand
it
gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76)
at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70)
at
org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50)
at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56)
at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea what
I
did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks
Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu [email protected]
wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to
zeppelin.

How can I set the spark_home to use the external spark? In
zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos [email protected]
wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or
with
spark installed under Zeppelin? The difference is whether the
SPARK_HOME
env variable is set.

Can you try installing Zeppelin with simply: mvn clean package install
-Pr -DskipTests , and running with external Spark and see if that
fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc [email protected]
wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction
on
below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was
able to
build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change the
mvn complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6
-Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got
this
error:

java.lang.ClassNotFoundException:
com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at
org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at

org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)

at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
<
#17 (comment)


You are receiving this because you commented.

Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@zenonlpc
Copy link
Author

Hello Elbamos

I misunderstood you, I will try make R working with spark first not with
Yarn.

Should I compile zeppelin using this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr
-DskipTests

and set zeppelin configuration:

export MASTER=spark_maste node
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Thanks

On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu [email protected] wrote:

Hello Elbamos

I already get the regular spark interpreter working before here is the
maven complie command I used :

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client
export HADOOP_HOME=/usr/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

But when I try to add R in zeppelin, I did maven complie command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-Pr -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now everything is not working it all give this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

On Tue, May 10, 2016 at 1:23 PM, elbamos [email protected] wrote:

I'm not sure, but I do know that setting Zeppelin up to use yarn can be
tricky. I would try to get it working with the regular spark interpreter
first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc [email protected]
wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER=
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1
-Dspark.executor.cores=8 -Dspark.executor.memory=9193M
-Dspark.default.parallelism=16"
export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R
commmand it
gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76)
at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70)
at
org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50)
at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56)
at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea
what I
did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks
Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu [email protected]
wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to
zeppelin.

How can I set the spark_home to use the external spark? In
zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos [email protected]
wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or
with
spark installed under Zeppelin? The difference is whether the
SPARK_HOME
env variable is set.

Can you try installing Zeppelin with simply: mvn clean package
install
-Pr -DskipTests , and running with external Spark and see if that
fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc [email protected]
wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction
on
below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was
able to
build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change
the
mvn complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6
-Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got
this
error:

java.lang.ClassNotFoundException:
com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at
java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at
org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at

org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)

at

org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
<
#17 (comment)


You are receiving this because you commented.

Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@elbamos
Copy link
Owner

elbamos commented May 10, 2016

Before you try that how about

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn -Pr -DskipTests

I think you should be able to remove all the profiles except R and yarn but let's try this first.

On May 10, 2016, at 3:22 PM, zenonlpc [email protected] wrote:

Hello Elbamos

I misunderstood you, I will try make R working with spark first not with
Yarn.

Should I compile zeppelin using this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr
-DskipTests

and set zeppelin configuration:

export MASTER=spark_maste node
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Thanks

On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu [email protected] wrote:

Hello Elbamos

I already get the regular spark interpreter working before here is the
maven complie command I used :

And the configuration for Zeppelin:

export MASTER=yarn-client
export HADOOP_HOME=/usr/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

But when I try to add R in zeppelin, I did maven complie command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-Pr -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now everything is not working it all give this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

On Tue, May 10, 2016 at 1:23 PM, elbamos [email protected] wrote:

I'm not sure, but I do know that setting Zeppelin up to use yarn can be
tricky. I would try to get it working with the regular spark interpreter
first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc [email protected]
wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER=
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1
-Dspark.executor.cores=8 -Dspark.executor.memory=9193M
-Dspark.default.parallelism=16"
export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R
commmand it
gives me this error:

org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76)
at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70)
at
org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50)
at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56)
at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea
what I
did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks
Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu [email protected]
wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is external to
zeppelin.

How can I set the spark_home to use the external spark? In
zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos [email protected]
wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin home or
with
spark installed under Zeppelin? The difference is whether the
SPARK_HOME
env variable is set.

Can you try installing Zeppelin with simply: mvn clean package
install
-Pr -DskipTests , and running with external Spark and see if that
fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc [email protected]
wrote:

I created a EMR cluster with Zeppelin on AWS using the instruction
on
below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was
able to
build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and change
the
mvn complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6
-Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I got
this
error:

java.lang.ClassNotFoundException:
com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at
java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at
org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at

org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)

at

org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
<
#17 (comment)


You are receiving this because you commented.

Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

@zenonlpc
Copy link
Author

Hello Elbamos

I tried your suggestion again with following command and configuration:
mvn clean package -Pyarn -Pr -DskipTests

export MASTER=yarn-client
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Then when I run zeppelin for any command I got an error:

org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

After that I was trying to remove Yarn from zeppelin and using the existing
spark in EMR cluster
The maven command I used is
mvn clean package -Pr -DskipTests

configuration for zeppelin

export MASTER=spark://sparkmasternode:7077
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

When I run any zeppelin command I got this error:

org.apache.spark.SparkException: Could not parse Spark Master URL: ''
at
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2735)
at org.apache.spark.SparkContext.(SparkContext.scala:522)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)

Thanks

On Tue, May 10, 2016 at 4:52 PM, elbamos [email protected] wrote:

Before you try that how about

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-Pr -DskipTests

I think you should be able to remove all the profiles except R and yarn
but let's try this first.

On May 10, 2016, at 3:22 PM, zenonlpc [email protected] wrote:

Hello Elbamos

I misunderstood you, I will try make R working with spark first not with
Yarn.

Should I compile zeppelin using this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pr
-DskipTests

and set zeppelin configuration:

export MASTER=spark_maste node
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Thanks

On Tue, May 10, 2016 at 3:17 PM, Pengcheng Liu [email protected]
wrote:

Hello Elbamos

I already get the regular spark interpreter working before here is the
maven complie command I used :

And the configuration for Zeppelin:

export MASTER=yarn-client
export HADOOP_HOME=/usr/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

But when I try to add R in zeppelin, I did maven complie command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6
-Pyarn
-Pr -DskipTests

And the configuration for Zeppelin:

export MASTER=yarn-client
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now everything is not working it all give this error:

java.lang.ClassNotFoundException: com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at
org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at
org.apache.spark.deploy.yarn.Client.cleanupStagingDir(Client.scala:166)
at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:152)
at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at

org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

On Tue, May 10, 2016 at 1:23 PM, elbamos [email protected]
wrote:

I'm not sure, but I do know that setting Zeppelin up to use yarn can
be
tricky. I would try to get it working with the regular spark
interpreter
first and confirm that yarn is working.

On May 10, 2016, at 11:46 AM, zenonlpc [email protected]
wrote:

Hello Elbamos

I tried install zeppeln with just -Pr

mvn clean package -Pr -DskipTests

And set the spark_home in zeppelin-env.sh as below:

export MASTER=
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.instances=1
-Dspark.executor.cores=8 -Dspark.executor.memory=9193M
-Dspark.default.parallelism=16"
export PYTHONPATH=:/usr/lib/spark/python

But I still could get R working in zeppelin, when I tried the R
commmand it
gives me this error:

org.apache.spark.SparkException: Yarn application has already
ended! It
might have been killed or unable to launch application master.
at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)

at

org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)

at

org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)

at org.apache.spark.SparkContext.(SparkContext.scala:530)
at

org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)

at

org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)

at

org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)

at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)

at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)

at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:76)

at

org.apache.zeppelin.rinterpreter.RInterpreter.getSparkInterpreter(RInterpreter.scala:70)

at

org.apache.zeppelin.rinterpreter.RInterpreter.open(RInterpreter.scala:50)

at org.apache.zeppelin.rinterpreter.RRepl.open(RRepl.java:56)
at

org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)

at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)

at

org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)

at

org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)

at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at

org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)

at

java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)

at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)

at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Looks like the Yarn is not properly configured with Spark. Any idea
what I
did wrong?

The EMR cluster is created with two applications Spark and Ganglia.

Thanks
Zenon

On Fri, May 6, 2016 at 1:57 PM, Pengcheng Liu [email protected]
wrote:

Thanks Elbamos

I will try install zeppelin with just -Pr

The spark is installed on EMR cluster by default and it is
external to
zeppelin.

How can I set the spark_home to use the external spark? In
zeppelin-env.sh?

On Fri, May 6, 2016 at 12:41 PM, elbamos <
[email protected]>
wrote:

It looks to me that Hadoop is failing to start.

Are you running Zeppelin with spark external to the Zeppelin
home or
with
spark installed under Zeppelin? The difference is whether the
SPARK_HOME
env variable is set.

Can you try installing Zeppelin with simply: mvn clean package
install
-Pr -DskipTests , and running with external Spark and see if that
fixes it?

On May 6, 2016, at 9:32 AM, zenonlpc <[email protected]

wrote:

I created a EMR cluster with Zeppelin on AWS using the
instruction
on
below link:

https://gist.github.com/andershammar/224e1077021d0ea376dd

After some modification of the installZeppelin.sh script, I was
able to
build zeppelin with R interpreter

successfully,

add some R packages before building zeppelin in maven and
change
the
mvn complie with R option
mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0
-Phadoop-2.6
-Pyarn -Pr -DskipTests

However, when start to write R command in zeppelin notebook, I
got
this
error:

java.lang.ClassNotFoundException:
com.amazonaws.event.ProgressListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at
java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at
java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
at java.util.ServiceLoader$1.next(ServiceLoader.java:
at
org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2563)
at

org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)

at

org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)

Here is the configuration of AWS EMR Cluster:

Hadoop: Hadoop amazon 2.7.2
applications: Spark1.6.0 Ganglia 3.7.2
release label is emr-4.4.0

It seems like a AWS issue, but don't know what I did wrong


You are receiving this because you are subscribed to this
thread.
Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
<

#17 (comment)


You are receiving this because you commented.

Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
<
#17 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@akshayprakash
Copy link

akshayprakash commented May 16, 2016

I think your spark master should be set to

(For e.g)

export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal
Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0 with Zeppelin-R notebook

@zenonlpc
Copy link
Author

Thanks Akshay

I will try that.

On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash [email protected]
wrote:

I think your spark master should be set to
(For e.g)

export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal
Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0
with Zeppelin-R notebook


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@zenonlpc
Copy link
Author

Hello Akshay

I did a list instances command on my EMR cluster master node, here is the
result:

{
"Instances": [
{
"Status": {
"Timeline": {
"ReadyDateTime": 1463419008.136,
"CreationDateTime": 1463418705.629
},
"State": "RUNNING",
"StateChangeReason": {}
},
"Ec2InstanceId": "i-07cb76b585791dc13",
"PublicDnsName": "ec2-52-87-213-254.compute-1.amazonaws.com",
"PrivateDnsName": "ip-172-31-59-226.ec2.internal",
"PublicIpAddress": "52.87.213.254",
"Id": "ci-3P2QMFSOSKF2S",
"PrivateIpAddress": "172.31.59.226"
},
{
"Status": {
"Timeline": {
"ReadyDateTime": 1463419008.136,
"CreationDateTime": 1463418719.445
},
"State": "RUNNING",
"StateChangeReason": {}
},
"Ec2InstanceId": "i-0cd3184eb7788816a",
"PublicDnsName": "ec2-52-90-79-148.compute-1.amazonaws.com",
"PrivateDnsName": "ip-172-31-58-205.ec2.internal",
"PublicIpAddress": "52.90.79.148",
"Id": "ci-13EARLMDOU64L",
"PrivateIpAddress": "172.31.58.205"
}
]
}

I only have 1 master node and 1 slave node. Based on your previous reply, I
should use the priveateDNS Name as the spark master host name is this
correct?

For this cluster the spark master should be set to
spark://ip-172-31-59-226.ec2.internal:7077

Thanks

On Mon, May 16, 2016 at 1:09 PM, Pengcheng Liu [email protected] wrote:

Thanks Akshay

I will try that.

On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash <[email protected]

wrote:

I think your spark master should be set to
(For e.g)

export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal
Anyways it worked well for me. I am running a 8 node cluster on EMR 4.5.0
with Zeppelin-R notebook


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@akshayprakash
Copy link

Yes... go to the Interpreter section of the Zeppelin notebook and set the master (spark) to
spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf
$ vi spark-env.sh
export SPARK_HOME=/usr/lib/spark
export MASTER=spark://ip-172-31-59-226.ec2.internal:7077
(Esc)
:wq

If the above doesn't work try working it without the listening port.

@zenonlpc
Copy link
Author

Hello guys

I tried Akshay's suggestion, but it still didn't work for me.

Compiled just R in zeppelin:
mvn clean package -Pr -DskipTests

Configuration for zeppelin :

export MASTER=spark://ip-172-31-59-226.ec2.internal
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python

Now when running R command in notebook, it gave this error:

org.apache.spark.SparkException: Invalid master URL:
spark://ip-172-31-59-226.ec2.internal
at
org.apache.spark.util.Utils$.extractHostPortFromSparkUrl(Utils.scala:2121)
at org.apache.spark.rpc.RpcAddress$.fromSparkURL(RpcAddress.scala:47)
at
org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48)
at
org.apache.spark.deploy.client.AppClient$$anonfun$1.apply(AppClient.scala:48)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.deploy.client.AppClient.(AppClient.scala:48)
at
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.start(SparkDeploySchedulerBackend.scala:93)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:530)
at
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
at
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
at
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)

Looks like all I need is the correct spark master URL, but I couldn't find
it easily. So I googled found this link:

http://stackoverflow.com/questions/30760792/how-to-find-spark-master-url-on-amazon-emr

From this link, my understanding is that EMR spark cluster is created with
YARN installed as default, so if I want to use external spark distribution
installed by EMR, I am stuck with YARN.

Can anyone help me with this battle? I have been struggled with issue for
almost two weeks.

Thanks in Advance

On Mon, May 16, 2016 at 1:27 PM, Pengcheng Liu [email protected] wrote:

Hello Akshay

I did a list instances command on my EMR cluster master node, here is the
result:

{
"Instances": [
{
"Status": {
"Timeline": {
"ReadyDateTime": 1463419008.136,
"CreationDateTime": 1463418705.629
},
"State": "RUNNING",
"StateChangeReason": {}
},
"Ec2InstanceId": "i-07cb76b585791dc13",
"PublicDnsName": "ec2-52-87-213-254.compute-1.amazonaws.com",
"PrivateDnsName": "ip-172-31-59-226.ec2.internal",
"PublicIpAddress": "52.87.213.254",
"Id": "ci-3P2QMFSOSKF2S",
"PrivateIpAddress": "172.31.59.226"
},
{
"Status": {
"Timeline": {
"ReadyDateTime": 1463419008.136,
"CreationDateTime": 1463418719.445
},
"State": "RUNNING",
"StateChangeReason": {}
},
"Ec2InstanceId": "i-0cd3184eb7788816a",
"PublicDnsName": "ec2-52-90-79-148.compute-1.amazonaws.com",
"PrivateDnsName": "ip-172-31-58-205.ec2.internal",
"PublicIpAddress": "52.90.79.148",
"Id": "ci-13EARLMDOU64L",
"PrivateIpAddress": "172.31.58.205"
}
]
}

I only have 1 master node and 1 slave node. Based on your previous reply,
I should use the priveateDNS Name as the spark master host name is this
correct?

For this cluster the spark master should be set to
spark://ip-172-31-59-226.ec2.internal:7077

Thanks

On Mon, May 16, 2016 at 1:09 PM, Pengcheng Liu [email protected] wrote:

Thanks Akshay

I will try that.

On Mon, May 16, 2016 at 12:15 PM, Akshay Prakash <
[email protected]> wrote:

I think your spark master should be set to
(For e.g)

export MASTER=spark://ip-111-11-11-11.us-west-2.compute.internal
Anyways it worked well for me. I am running a 8 node cluster on EMR
4.5.0 with Zeppelin-R notebook


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@zenonlpc
Copy link
Author

Hello Akshay

After adding the port to spark master URL and restarting the zeppelin
server, I got this error when running R command:

org.apache.thrift.transport.TTransportException at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at
org.apache.zeppelin.scheduler.Job.run(Job.java:176) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

This looks like one step close to get it working.

Thanks
Zenon

On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash [email protected]
wrote:

Yes... go to the Interpreter section of the Zeppelin notebook and set the
master (spark) to
spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf
$ vi spark-env.sh
export SPARK_HOME=/usr/lib/spark
export MASTER=spark://ip-172-31-59-226.ec2.internal:7077
(Esc)
:wq

If the above doesn't work try working it without the listening port.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@zenonlpc
Copy link
Author

Hello guys

Thanks for helping me on this issue, really appreciated your time and
effort.

Instead of piece of information, I want to give all the information so you
might be bale to help resolve this quickly.

The EMR cluster I created is with release label emr-4.4.0 with
Spark(1.6.0) , Ganaglia (3.7.2) as applications.

I used the following script to install zeppelin 0.6.0 as bootstrap action
while the cluster started.


#! /bin/bash -ex

if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true"
]; then
# Install Git
sudo yum -y install git
# Install Maven
wget -P /tmp
http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
sudo mkdir /opt/apache-maven
sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven

cat <<EOF >> /home/hadoop/.bashrc

export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3
export PATH=$MAVEN_HOME/bin:$PATH
EOF
source /home/hadoop/.bashrc
# Install Zeppelin
git clone https://github.com/apache/incubator-zeppelin.git
/home/hadoop/zeppelin
cd /home/hadoop/zeppelin
# install some R packages before build zeppelin
sudo mkdir /tmp/rjars/
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/stringi_1.0-1.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/magrittr_1.5.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/stringr_1.0.0.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/evaluate_0.9.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/mime_0.4.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/digest_0.6.9.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/formatR_1.4.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/highr_0.6.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/markdown_0.7.7.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/yaml_2.1.13.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/knitr_1.13.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/Rcpp_0.12.5.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/htmltools_0.3.5.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/base64enc_0.1-3.tar.gz
sudo R CMD INSTALL /tmp/rjars/stringi_1.0-1.tar.gz
sudo R CMD INSTALL /tmp/rjars/magrittr_1.5.tar.gz
sudo R CMD INSTALL /tmp/rjars/stringr_1.0.0.tar.gz
sudo R CMD INSTALL /tmp/rjars/evaluate_0.9.tar.gz
sudo R CMD INSTALL /tmp/rjars/mime_0.4.tar.gz
sudo R CMD INSTALL /tmp/rjars/digest_0.6.9.tar.gz
sudo R CMD INSTALL /tmp/rjars/formatR_1.4.tar.gz
sudo R CMD INSTALL /tmp/rjars/highr_0.6.tar.gz
sudo R CMD INSTALL /tmp/rjars/markdown_0.7.7.tar.gz
sudo R CMD INSTALL /tmp/rjars/yaml_2.1.13.tar.gz
sudo R CMD INSTALL /tmp/rjars/knitr_1.13.tar.gz
sudo R CMD INSTALL /tmp/rjars/Rcpp_0.12.5.tar.gz
sudo R CMD INSTALL /tmp/rjars/htmltools_0.3.5.tar.gz
sudo R CMD INSTALL /tmp/rjars/base64enc_0.1-3.tar.gz

Building Zeppelin with R

  mvn clean package -Pr -DskipTests

# Configure Zeppelin

SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf
echo ${SPARK_DEFAULTS}
declare -a ZEPPELIN_JAVA_OPTS
if [ -f $SPARK_DEFAULTS ]; then
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.executor.instances $SPARK_DEFAULTS | awk '{print
"-D" $1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.executor.cores $SPARK_DEFAULTS | awk '{print "-D"
$1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.executor.memory $SPARK_DEFAULTS | awk '{print "-D"
$1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.default.parallelism $SPARK_DEFAULTS | awk '{print
"-D" $1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.yarn.executor.memoryOverhead $SPARK_DEFAULTS | awk
'{print "-D" $1 "=" $2}'))
fi
echo ${SPARK_DEFAULTS}
echo "${ZEPPELIN_JAVA_OPTS[@]}"

Getting cluster ID

CLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"'
'{print $4}')
echo $CLUSTER_ID

Getting Spark host URL from aws emr list-instances command

SPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID
--instance-group-types MASTER | grep -i PrivateDnsName | awk -F '"' '{print
$4}')
echo $SPARK_MASTER_URL

Putting values in zeppelin-env.sh

cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
cat <<EOF>> conf/zeppelin-env.sh

export MASTER=spark://${SPARK_MASTER_URL}:7077
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
EOF
# change zeppelin port to 7002
cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
sed -i -e 's/8080/7002/g' conf/zeppelin-site.xml

Start the Zeppelin daemon

bin/zeppelin-daemon.sh start

fi


Previously, when I built zeppelin with this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-DskipTests

It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the
ZEPPELIN_JAVA_OPTS is empty.

I tried the script running after the cluster is created, the variable
ZEPPELIN_JAVA_OPTS is populated correctly. But when running as bootstrap
action, the variable ZEPPELIN_JAVA_OPTS is empty.

Does this mean the spark is not installed when I try to install zeppelin as
bootstrap action during the cluster creation?

If that is the case, why the previous command is working fine?

This original script is coming from this post:

https://gist.github.com/andershammar/224e1077021d0ea376dd

Thanks
Zenon

On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu [email protected] wrote:

Hello Akshay

After adding the port to spark master URL and restarting the zeppelin
server, I got this error when running R command:

org.apache.thrift.transport.TTransportException at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at
org.apache.zeppelin.scheduler.Job.run(Job.java:176) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

This looks like one step close to get it working.

Thanks
Zenon

On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash [email protected]
wrote:

Yes... go to the Interpreter section of the Zeppelin notebook and set the
master (spark) to
spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf
$ vi spark-env.sh
export SPARK_HOME=/usr/lib/spark
export MASTER=spark://ip-172-31-59-226.ec2.internal:7077
(Esc)
:wq

If the above doesn't work try working it without the listening port.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@zenonlpc
Copy link
Author

Hello Everyone

We finally figured out the issue, instead of using %r we should %knitr to
run R code in zeppelin

Thanks
Zenon

On Tue, May 17, 2016 at 10:38 AM, Pengcheng Liu [email protected] wrote:

Hello guys

Thanks for helping me on this issue, really appreciated your time and
effort.

Instead of piece of information, I want to give all the information so you
might be bale to help resolve this quickly.

The EMR cluster I created is with release label emr-4.4.0 with
Spark(1.6.0) , Ganaglia (3.7.2) as applications.

I used the following script to install zeppelin 0.6.0 as bootstrap action
while the cluster started.


#! /bin/bash -ex

if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true"
]; then
# Install Git
sudo yum -y install git
# Install Maven
wget -P /tmp
http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
sudo mkdir /opt/apache-maven
sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven

cat <<EOF >> /home/hadoop/.bashrc

export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3
export PATH=$MAVEN_HOME/bin:$PATH
EOF
source /home/hadoop/.bashrc
# Install Zeppelin
git clone https://github.com/apache/incubator-zeppelin.git
/home/hadoop/zeppelin
cd /home/hadoop/zeppelin
# install some R packages before build zeppelin
sudo mkdir /tmp/rjars/
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/stringi_1.0-1.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/magrittr_1.5.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/stringr_1.0.0.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/evaluate_0.9.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/mime_0.4.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/digest_0.6.9.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/formatR_1.4.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/highr_0.6.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/markdown_0.7.7.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/yaml_2.1.13.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/knitr_1.13.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/Rcpp_0.12.5.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/htmltools_0.3.5.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/base64enc_0.1-3.tar.gz
sudo R CMD INSTALL /tmp/rjars/stringi_1.0-1.tar.gz
sudo R CMD INSTALL /tmp/rjars/magrittr_1.5.tar.gz
sudo R CMD INSTALL /tmp/rjars/stringr_1.0.0.tar.gz
sudo R CMD INSTALL /tmp/rjars/evaluate_0.9.tar.gz
sudo R CMD INSTALL /tmp/rjars/mime_0.4.tar.gz
sudo R CMD INSTALL /tmp/rjars/digest_0.6.9.tar.gz
sudo R CMD INSTALL /tmp/rjars/formatR_1.4.tar.gz
sudo R CMD INSTALL /tmp/rjars/highr_0.6.tar.gz
sudo R CMD INSTALL /tmp/rjars/markdown_0.7.7.tar.gz
sudo R CMD INSTALL /tmp/rjars/yaml_2.1.13.tar.gz
sudo R CMD INSTALL /tmp/rjars/knitr_1.13.tar.gz
sudo R CMD INSTALL /tmp/rjars/Rcpp_0.12.5.tar.gz
sudo R CMD INSTALL /tmp/rjars/htmltools_0.3.5.tar.gz
sudo R CMD INSTALL /tmp/rjars/base64enc_0.1-3.tar.gz

Building Zeppelin with R

  mvn clean package -Pr -DskipTests

# Configure Zeppelin

SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf
echo ${SPARK_DEFAULTS}
declare -a ZEPPELIN_JAVA_OPTS
if [ -f $SPARK_DEFAULTS ]; then
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.executor.instances $SPARK_DEFAULTS | awk '{print
"-D" $1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.executor.cores $SPARK_DEFAULTS | awk '{print "-D"
$1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.executor.memory $SPARK_DEFAULTS | awk '{print
"-D" $1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.default.parallelism $SPARK_DEFAULTS | awk '{print
"-D" $1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.yarn.executor.memoryOverhead $SPARK_DEFAULTS |
awk '{print "-D" $1 "=" $2}'))
fi
echo ${SPARK_DEFAULTS}
echo "${ZEPPELIN_JAVA_OPTS[@]}"

Getting cluster ID

CLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"'
'{print $4}')
echo $CLUSTER_ID

Getting Spark host URL from aws emr list-instances command

SPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID
--instance-group-types MASTER | grep -i PrivateDnsName | awk -F '"' '{print
$4}')
echo $SPARK_MASTER_URL

Putting values in zeppelin-env.sh

cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
cat <<EOF>> conf/zeppelin-env.sh

export MASTER=spark://${SPARK_MASTER_URL}:7077
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
EOF
# change zeppelin port to 7002
cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
sed -i -e 's/8080/7002/g' conf/zeppelin-site.xml

Start the Zeppelin daemon

bin/zeppelin-daemon.sh start

fi


Previously, when I built zeppelin with this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-DskipTests

It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the
ZEPPELIN_JAVA_OPTS is empty.

I tried the script running after the cluster is created, the variable
ZEPPELIN_JAVA_OPTS is populated correctly. But when running as bootstrap
action, the variable ZEPPELIN_JAVA_OPTS is empty.

Does this mean the spark is not installed when I try to install zeppelin
as bootstrap action during the cluster creation?

If that is the case, why the previous command is working fine?

This original script is coming from this post:

https://gist.github.com/andershammar/224e1077021d0ea376dd

Thanks
Zenon

On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu [email protected] wrote:

Hello Akshay

After adding the port to spark master URL and restarting the zeppelin
server, I got this error when running R command:

org.apache.thrift.transport.TTransportException at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at
org.apache.zeppelin.scheduler.Job.run(Job.java:176) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

This looks like one step close to get it working.

Thanks
Zenon

On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash <[email protected]

wrote:

Yes... go to the Interpreter section of the Zeppelin notebook and set
the master (spark) to
spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf
$ vi spark-env.sh
export SPARK_HOME=/usr/lib/spark
export MASTER=spark://ip-172-31-59-226.ec2.internal:7077
(Esc)
:wq

If the above doesn't work try working it without the listening port.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)

@elbamos
Copy link
Owner

elbamos commented Jun 23, 2016

Both of those should work without a problem. If you are using the latest Zeppelin from master, though, there are a lot of recently introduced bugs that could cause this. You may be happier using the version from my repo.

On Jun 23, 2016, at 10:53 AM, zenonlpc [email protected] wrote:

Hello Everyone

We finally figured out the issue, instead of using %r we should %knitr to
run R code in zeppelin

Thanks
Zenon

On Tue, May 17, 2016 at 10:38 AM, Pengcheng Liu [email protected] wrote:

Hello guys

Thanks for helping me on this issue, really appreciated your time and
effort.

Instead of piece of information, I want to give all the information so you
might be bale to help resolve this quickly.

The EMR cluster I created is with release label emr-4.4.0 with
Spark(1.6.0) , Ganaglia (3.7.2) as applications.

I used the following script to install zeppelin 0.6.0 as bootstrap action
while the cluster started.


#! /bin/bash -ex

if [ "$(cat /mnt/var/lib/info/instance.json | jq -r .isMaster)" == "true"
]; then

Install Git

sudo yum -y install git

Install Maven

wget -P /tmp
http://apache.mirrors.spacedump.net/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
sudo mkdir /opt/apache-maven
sudo tar -xvzf /tmp/apache-maven-3.3.3-bin.tar.gz -C /opt/apache-maven

cat <> /home/hadoop/.bashrc

export MAVEN_HOME=/opt/apache-maven/apache-maven-3.3.3
export PATH=$MAVEN_HOME/bin:$PATH
EOF
source /home/hadoop/.bashrc

Install Zeppelin

git clone https://github.com/apache/incubator-zeppelin.git
/home/hadoop/zeppelin
cd /home/hadoop/zeppelin

install some R packages before build zeppelin

sudo mkdir /tmp/rjars/
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/stringi_1.0-1.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/magrittr_1.5.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/stringr_1.0.0.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/evaluate_0.9.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/mime_0.4.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/digest_0.6.9.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/formatR_1.4.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/highr_0.6.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/markdown_0.7.7.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/yaml_2.1.13.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/knitr_1.13.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/Rcpp_0.12.5.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/htmltools_0.3.5.tar.gz
sudo wget -P /tmp/rjars/
https://rweb.crmda.ku.edu/cran/src/contrib/base64enc_0.1-3.tar.gz
sudo R CMD INSTALL /tmp/rjars/stringi_1.0-1.tar.gz
sudo R CMD INSTALL /tmp/rjars/magrittr_1.5.tar.gz
sudo R CMD INSTALL /tmp/rjars/stringr_1.0.0.tar.gz
sudo R CMD INSTALL /tmp/rjars/evaluate_0.9.tar.gz
sudo R CMD INSTALL /tmp/rjars/mime_0.4.tar.gz
sudo R CMD INSTALL /tmp/rjars/digest_0.6.9.tar.gz
sudo R CMD INSTALL /tmp/rjars/formatR_1.4.tar.gz
sudo R CMD INSTALL /tmp/rjars/highr_0.6.tar.gz
sudo R CMD INSTALL /tmp/rjars/markdown_0.7.7.tar.gz
sudo R CMD INSTALL /tmp/rjars/yaml_2.1.13.tar.gz
sudo R CMD INSTALL /tmp/rjars/knitr_1.13.tar.gz
sudo R CMD INSTALL /tmp/rjars/Rcpp_0.12.5.tar.gz
sudo R CMD INSTALL /tmp/rjars/htmltools_0.3.5.tar.gz
sudo R CMD INSTALL /tmp/rjars/base64enc_0.1-3.tar.gz

Building Zeppelin with R

mvn clean package -Pr -DskipTests

Configure Zeppelin

SPARK_DEFAULTS=/usr/lib/spark/conf/spark-defaults.conf
echo ${SPARK_DEFAULTS}
declare -a ZEPPELIN_JAVA_OPTS
if [ -f $SPARK_DEFAULTS ]; then
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.executor.instances $SPARK_DEFAULTS | awk '{print
"-D" $1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.executor.cores $SPARK_DEFAULTS | awk '{print "-D"
$1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.executor.memory $SPARK_DEFAULTS | awk '{print
"-D" $1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.default.parallelism $SPARK_DEFAULTS | awk '{print
"-D" $1 "=" $2}'))
ZEPPELIN_JAVA_OPTS=("${ZEPPELIN_JAVA_OPTS[@]}"
$(grep spark.yarn.executor.memoryOverhead $SPARK_DEFAULTS |
awk '{print "-D" $1 "=" $2}'))
fi
echo ${SPARK_DEFAULTS}
echo "${ZEPPELIN_JAVA_OPTS[@]}"

Getting cluster ID

CLUSTER_ID=$(aws emr list-clusters --active | grep -i id | awk -F '"'
'{print $4}')
echo $CLUSTER_ID

Getting Spark host URL from aws emr list-instances command

SPARK_MASTER_URL=$(aws emr list-instances --cluster-id $CLUSTER_ID
--instance-group-types MASTER | grep -i PrivateDnsName | awk -F '"' '{print
$4}')
echo $SPARK_MASTER_URL

Putting values in zeppelin-env.sh

cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
cat <> conf/zeppelin-env.sh
export MASTER=spark://${SPARK_MASTER_URL}:7077
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_SPARK_USEHIVECONTEXT=false
export ZEPPELIN_JAVA_OPTS="${ZEPPELIN_JAVA_OPTS[@]}"
export PYTHONPATH=$PYTHONPATH:/usr/lib/spark/python
EOF

change zeppelin port to 7002

cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
sed -i -e 's/8080/7002/g' conf/zeppelin-site.xml

Start the Zeppelin daemon

bin/zeppelin-daemon.sh start
fi


Previously, when I built zeppelin with this command:

mvn clean package -Pspark-1.6 -Dhadoop.version=2.6.0 -Phadoop-2.6 -Pyarn
-DskipTests

It worked fine, after I remove -Pspark, -Phadoop, -Pyarn I noticed the
ZEPPELIN_JAVA_OPTS is empty.

I tried the script running after the cluster is created, the variable
ZEPPELIN_JAVA_OPTS is populated correctly. But when running as bootstrap
action, the variable ZEPPELIN_JAVA_OPTS is empty.

Does this mean the spark is not installed when I try to install zeppelin
as bootstrap action during the cluster creation?

If that is the case, why the previous command is working fine?

This original script is coming from this post:

https://gist.github.com/andershammar/224e1077021d0ea376dd

Thanks
Zenon

On Mon, May 16, 2016 at 2:17 PM, Pengcheng Liu [email protected] wrote:

Hello Akshay

After adding the port to spark master URL and restarting the zeppelin
server, I got this error when running R command:

org.apache.thrift.transport.TTransportException at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:232)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:216)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:259)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:262) at
org.apache.zeppelin.scheduler.Job.run(Job.java:176) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:328)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

This looks like one step close to get it working.

Thanks
Zenon

On Mon, May 16, 2016 at 1:55 PM, Akshay Prakash <[email protected]

wrote:

Yes... go to the Interpreter section of the Zeppelin notebook and set
the master (spark) to
spark://ip-172-31-59-226.ec2.internal:7077

OR if you prefer using linux editor in CentOS

$ cd /etc/spark/conf
$ vi spark-env.sh
export SPARK_HOME=/usr/lib/spark
export MASTER=spark://ip-172-31-59-226.ec2.internal:7077
(Esc)
:wq

If the above doesn't work try working it without the listening port.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#17 (comment)


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@pramitchoudhary
Copy link

Hey guys,
Bumped into a similar error while running using the zeppelin demon provided by the EMR instance. I followed the steps as mentioned here and was successful in launch in the sparkR shell but getting 'r' interpreter not found error. The version of zeppelin running on EMR is 0.6.1. I tried following the conversation on the mailing list and from my understanding, the r interpreter should be part of the build right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants