[R] need help in trying out sparklyr - spark_connect will not work on local copy of Spark

Taylor, Ronald C Ronald.Taylor at pnnl.gov
Thu Feb 2 00:23:17 CET 2017


Hello R-help list,

I am a new list member. My first question: I was trying out sparklyr (in R ver 3.3.2) on my Red Hat Linux workstation, following the instructions at spark.rstudio.com as to how to download and use a local copy of Spark. The Spark download appears to work. However, when I try to issue the spark_connect, to get started, I get the error msgs that  you see below.

I cannot find any guidance as to how to fix this. Quite frustrating. Can somebody give me a bit of help? Does something need to be added to my PATH env var in my .mycshrc file, for example? Is there a closed port problem? Has anybody run into this type of error msg? Do I need to do something additional to start up the local copy of Spark that is not mentioned in the RStudio online documentation?

-          Ron

%%%%%%%%%%%%%%%%%%%%

Here is the spark_install (apparently successful) and then the error msg on the spark_connect():

> spark_install(version = "1.6.2")

Installing Spark 1.6.2 for Hadoop 2.6 or later.

Downloading from:

- 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'

Installing to:

- '~/.cache/spark/spark-1.6.2-bin-hadoop2.6'

trying URL 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'

Content type 'application/x-tar' length 278057117 bytes (265.2 MB)

==================================================

downloaded 265.2 MB

Installation complete.

>

> sc <- spark_connect(master = "local")

Error in force(code) :

  Failed while connecting to sparklyr to port (8880) for sessionid (3689): Gateway in port (8880) did not respond.

    Path: /home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-submit

    Parameters: --class, sparklyr.Backend, --jars, '/usr/lib64/R/library/sparklyr/java/spark-csv_2.11-1.3.0.jar','/usr/lib64/R/library/sparklyr/java/commons-csv-1.1.jar','/usr/lib64/R/library/sparklyr/java/univocity-parsers-1.5.1.jar', '/usr/lib64/R/library/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 3689

---- Output Log ----

/home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-class: line 86: /usr/local/bin/bin/java: No such file or directory

---- Error Log ----

>

%%%%%%%%%%%%%%%%%%

And here is the entire screen output of my R session, from the R invocation on:

sidney115% R

R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"

Copyright (C) 2016 The R Foundation for Statistical Computing

Platform: x86_64-redhat-linux-gnu (64-bit)

>

> library(sparklyr)

>

> ls(pos = "package:sparklyr")

  [1] "%>%"

  [2] "compile_package_jars"

  [3] "connection_config"

  [4] "connection_is_open"

  [5] "copy_to"

  [6] "ensure_scalar_boolean"

  [7] "ensure_scalar_character"

  [8] "ensure_scalar_double"

  [9] "ensure_scalar_integer"

[10] "find_scalac"

[11] "ft_binarizer"

[12] "ft_bucketizer"

[13] "ft_discrete_cosine_transform"

[14] "ft_elementwise_product"

[15] "ft_index_to_string"

[16] "ft_one_hot_encoder"

[17] "ft_quantile_discretizer"

[18] "ft_regex_tokenizer"

[19] "ft_sql_transformer"

[20] "ft_string_indexer"

[21] "ft_tokenizer"

[22] "ft_vector_assembler"

[23] "hive_context"

[24] "invoke"

[25] "invoke_method"

[26] "invoke_new"

[27] "invoke_static"

[28] "java_context"

[29] "livy_available_versions"

[30] "livy_config"

[31] "livy_home_dir"

[32] "livy_install"

[33] "livy_install_dir"

[34] "livy_installed_versions"

[35] "livy_service_start"

[36] "livy_service_stop"

[37] "ml_als_factorization"

[38] "ml_binary_classification_eval"

[39] "ml_classification_eval"

[40] "ml_create_dummy_variables"

[41] "ml_decision_tree"

[42] "ml_generalized_linear_regression"

[43] "ml_gradient_boosted_trees"

[44] "ml_kmeans"

[45] "ml_lda"

[46] "ml_linear_regression"

[47] "ml_load"

[48] "ml_logistic_regression"

[49] "ml_model"

[50] "ml_multilayer_perceptron"

[51] "ml_naive_bayes"

[52] "ml_one_vs_rest"

[53] "ml_options"

[54] "ml_pca"

[55] "ml_prepare_dataframe"

[56] "ml_prepare_features"

[57] "ml_prepare_response_features_intercept"

[58] "ml_random_forest"

[59] "ml_save"

[60] "ml_survival_regression"

[61] "ml_tree_feature_importance"

[62] "na.replace"

[63] "print_jobj"

[64] "register_extension"

[65] "registered_extensions"

[66] "sdf_copy_to"

[67] "sdf_import"

[68] "sdf_load_parquet"

[69] "sdf_load_table"

[70] "sdf_mutate"

[71] "sdf_mutate_"

[72] "sdf_partition"

[73] "sdf_persist"

[74] "sdf_predict"

[75] "sdf_quantile"

[76] "sdf_read_column"

[77] "sdf_register"

[78] "sdf_sample"

[79] "sdf_save_parquet"

[80] "sdf_save_table"

[81] "sdf_schema"

[82] "sdf_sort"

[83] "sdf_with_unique_id"

[84] "spark_available_versions"

[85] "spark_compilation_spec"

[86] "spark_compile"

[87] "spark_config"

[88] "spark_connect"

[89] "spark_connection"

[90] "spark_connection_is_open"

[91] "spark_context"

[92] "spark_dataframe"

[93] "spark_default_compilation_spec"

[94] "spark_dependency"

[95] "spark_disconnect"

[96] "spark_disconnect_all"

[97] "spark_home_dir"

[98] "spark_install"

[99] "spark_install_dir"

[100] "spark_install_tar"

[101] "spark_installed_versions"

[102] "spark_jobj"

[103] "spark_load_table"

[104] "spark_log"

[105] "spark_read_csv"

[106] "spark_read_json"

[107] "spark_read_parquet"

[108] "spark_save_table"

[109] "spark_session"

[110] "spark_uninstall"

[111] "spark_version"

[112] "spark_version_from_home"

[113] "spark_web"

[114] "spark_write_csv"

[115] "spark_write_json"

[116] "spark_write_parquet"

[117] "tbl_cache"

[118] "tbl_uncache"

>

>

>

> spark_install(version = "1.6.2")

Installing Spark 1.6.2 for Hadoop 2.6 or later.

Downloading from:

- 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'

Installing to:

- '~/.cache/spark/spark-1.6.2-bin-hadoop2.6'

trying URL 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'

Content type 'application/x-tar' length 278057117 bytes (265.2 MB)

==================================================

downloaded 265.2 MB

Installation complete.

>

> sc <- spark_connect(master = "local")

Error in force(code) :

  Failed while connecting to sparklyr to port (8880) for sessionid (3689): Gateway in port (8880) did not respond.

    Path: /home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-submit

    Parameters: --class, sparklyr.Backend, --jars, '/usr/lib64/R/library/sparklyr/java/spark-csv_2.11-1.3.0.jar','/usr/lib64/R/library/sparklyr/java/commons-csv-1.1.jar','/usr/lib64/R/library/sparklyr/java/univocity-parsers-1.5.1.jar', '/usr/lib64/R/library/sparklyr/java/sparklyr-1.6-2.10.jar', 8880, 3689


---- Output Log ----

/home/rtaylor/.cache/spark/spark-1.6.2-bin-hadoop2.6/bin/spark-class: line 86: /usr/local/bin/bin/java: No such file or directory

---- Error Log ----

>

%%%%%%%%%%%%%%%%%%

Ronald C. Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
Richland, WA 99352
phone: (509) 372-6568,  email: ronald.taylor at pnnl.gov
web page:  http://www.pnnl.gov/science/staff/staff_info.asp?staff_num=7048


	[[alternative HTML version deleted]]



More information about the R-help mailing list