Interactive Spark shell

You can run Spark commands interactively in the Spark shell. The Spark shell is available in Scala, Python, and R.

Launch a long-running interactive bash session using dcos task exec.

From your interactive bash session, pull and run a Spark Docker image.

docker pull mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2
docker run -it --net=host mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2 /bin/bash

Run the Spark shell from within the Docker image.

For the Scala Spark shell:

./bin/spark-shell --master mesos://<internal-leader-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2 --conf spark.mesos.executor.home=/opt/spark/dist

For the Python Spark shell:

./bin/pyspark --master mesos://<internal-leader-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2 --conf spark.mesos.executor.home=/opt/spark/dist

For the R Spark shell:

./bin/sparkR --master mesos://<internal-leader-ip>:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2 --conf spark.mesos.executor.home=/opt/spark/dist

NOTE: Find your internal leader IP by going to dcos-url/mesos. The internal leader IP is listed in the upper left hand corner.

Run Spark commands interactively.

In the Scala shell:

val textFile = sc.textFile("/opt/spark/dist/README.md")
textFile.count()

In the Python shell:

textFile = sc.textFile("/opt/spark/dist/README.md")
textFile.count()

In the R shell:

df <- as.DataFrame(faithful)
head(df)

Interactive Spark Shell

Running commands interactively in the Apache Spark shell

Interactive Spark shell