Frage

I'm working on a project using R-Hadoop, and got this problem.

I'm using JSch in JAVA to ssh to remote hadoop pseudo-cluster, and here are part of Java code to create connection.

/* Create a connection instance */
Connection conn = new Connection(hostname);
/* Now connect */
conn.connect();
/* Authenticate */
boolean isAuthenticated = conn.authenticateWithPassword(username, password);
if (isAuthenticated == false)
throw new IOException("Authentication failed.");
/* Create a session */
Session sess = conn.openSession();
//sess.execCommand("uname -a && date && uptime && who");
sess.execCommand("Rscript -e 'args1 <- \"Dell\"; args2 <- 1; source(\"/usr/local/R/mytest.R\")'");
//sess.execCommand("ls");
sess.waitForCondition(ChannelCondition.TIMEOUT, 50);

I tried several simple R scripts, and my codes worked fine. But when it comes to R-Hadoop, the R script will stop running. But if I run Rscript -e 'args1 <- "Dell"; args2 <- 1; source("/usr/local/R/mytest.R")' directly in remote server, everything works fine.

Here is what I got after taking Hong Ooi's suggestion: Instead of using Rscript, I used following command:

sess.execCommand("R CMD BATCH --no-save --no-restore '--args args1=\"Dell\" args2=1' /usr/local/R/mytest.R /usr/local/R/whathappened.txt");

And in the whathappened.txt, I got following error:

> args=(commandArgs(TRUE))
> for(i in 1:length(args)){
+      eval(parse(text=args[[i]]))
+ }
> source("/usr/local/R/main.R")
> main(args1,args2)
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
  call: fun(libname, pkgname)
  error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: package/namespace load failed for 鈥榬hdfs鈥?
Execution halted

Well, now the problem is much clearer. Unfortunately, I'm pretty new to linux, and have no idea how to solve this.

War es hilfreich?

Lösung 2

Well, I just found another solution by myself:

Instead of caring about env from outside Hadoop cluster, can set env in R scripts like:

Sys.setenv(HADOOP_HOME="put your HADOOP_HOME path here")
Sys.setenv(HADOOP_CMD="put your HADOOP_CMD path here")

library(rmr2)
library(rhdfs)

Andere Tipps

Well, I solved this problem like this:

sess.execCommand("source /etc/profile; R CMD BATCH --no-save --no-restore '--args args1=\"Dell\" args2=1' /usr/local/R/mytest.R /usr/local/R/whathappened.txt");

The problem was caused by environment. SSH to the remote Hadoop cluster actually uses a different environment, so variables like $HADOOP_CMD will not be discovered. There are multiple ways to let the SSH session know how to pick the environment variables.

In my method, the "source /etc/profile" can tell the sshed environment where to find the environment virables.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top