Now, initialize the entry points of Spark: SparkContext and SparkConf (Old Style) from pyspark import SparkContext, SparkConf If you plan to use 2.4 version, please use below code to initialize import os If you plan to use 2.3 version, please use below code to initialize import os Os.environ = "/usr/local/anaconda/bin/python" # In below two lines, use /usr/bin/python2.7 if you want to use Python 2 Os.environ = "/usr/hdp/current/spark2-client" If you want to access Spark 2.2, use below code: import os You can check the available spark versions using the following command- !ls /usr/spark* If you choose to do the setup manually instead of using the package, then you can access different versions of Spark by following the steps below: You can specify any other version too whichever you want to use. It wraps up all these tasks in just two lines of code: import findspark For the latter, findspark is a suitable choice. You can do that either manually or you can use a package that does all this work for you. This code to initialize is also available in GitHub Repository here.įor accessing Spark, you have to set several environment variables and system paths. To start python notebook, Click on “Jupyter” button under My Lab and then click on “New -> Python 3” Please follow below steps to access the Jupyter notebook on CloudxLab For more details on the Jupyter Notebook, please see the Jupyter website. It is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media. The IPython Notebook is now known as the Jupyter Notebook. The following instructions cover 2.2, 2.3 and 2.4 versions of Apache Spark. You can run PySpark code in Jupyter notebook on CloudxLab.
0 Comments
Leave a Reply. |