Tiny Notes

Yet another Software Architecture blog

Installing rJava and xlsx in OS X El Capitan

Installing rJava and xlsx are quite tricky as R is unable to load the JVM. Following are the list of errors and fixes for the same.

Error # tar: Failed to set default locale

tar: Failed to set default locale

Fix this by forcing the LANG.

$ defaults write org.R-project.R force.LANG en_US.UTF-8

Error # Unable to load libserver.dylib
In order to fix this you need to create a link for libserver.dylib in /usr/local/lib

$ sudo ln -s $(/usr/libexec/java_home)/jre/lib/server/libjvm.dylib /usr/local/lib/libserver.dylib

Modify the java configuration

$ sudo R javareconf

Even after this if you are not able to load the rJava, please install with type=source option

>install.packages("rJava", type="source")
>libary("rJava")
>install.packages("xlsx")
>libary("xlsx")

Setup Apache Spark with Jupyter

Install Python and Jupyter

  • Download and install python 2.7.x release if python is not pre-installed in /usr/local/bin (PySpark do not support python 3 version yet).
  • Install pip and virtualenv
$ curl -O get-pip.py  # download get-pip.py
$ python get-pip.py   # install pip
$ pip install virtualenv # install virtualenv
  • Create a separate virtualenv for your playground though not mandatory.
$ virtualenv sparkenv   # create virtualenv named sparkenv
$ source sparkenv/bin/activate # activate the virtualenv
  • Install Jupyter
$ pip install jupyter

Download and Setup Spark in OSX

$ export SCALA_HOME=/Users/kalyan/scala-2.10.4
$ export PATH=$PATH:$SCALA_HOME/bin
  • Go to Spark root directory and run in command line:
$ sbt/sbt clean assembly
  • Then start up Spark, also from Spark root folder:
$ ./bin/spark-shell
  • Download Scala version 2.10.x
  • Download sbt # Setup Spark in OSX
  • Make sure you have activated the virtualenv that is previously created (sparkenv)
  • Create a new kernel (not mandatory but maintain this as a good practice)
$ python -m ipykernel install --user --name sparkkernel --display-name "sparkkernel"
  • The kernel spec is available which can be customized if needed (/Users/kalyan/Library/Jupyter/kernels/vdiag/kernel.json).
{
    "display_name": "pyspark",
    "argv": [
        "/Users/gopalk/Work/WS/pyenvs/vdiag/bin/python",
        "-m",
        "ipykernel",
        "-f",
        "{connection_file}"
    ],
    "language": "python"
}
  • Set the needed environment variables and start the notebook. The environment variables can also be set in the spec file (but not recommended due to portability reasons)
$ export PYTHONPATH="${SPARK_HOME}/python/:$PYTHONPATH"
$ export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.9-src.zip:$PYTHONPATH"
$ jupyter notebook
  • Now you have the notebook ready with pyspark modules available.

 

© 2019 Tiny Notes

Theme by Anders NorenUp ↑