This section contains examples for using DC/OS Data Science Engine.
Basic
Perform a default installation by following the instructions in the Quick Start Guide section.
To run any job, first you have to open the lab and choose the notebook you want to run.
You can select from many notebooks available in the lab, for example, Scala, Python, R, and so forth.
A Notebook consists of cells; each cell can be of type markdown
or code
.
In the markdown
cell, you can write text or HTML. In the code
cell, you can type your code as shown in the example below.
Python Kernel
Open a Python Notebook
and put the following sections in different code cells.
Scala Kernel
Open a Scala Notebook
and put the following sections in different code cells.
Java Kernel
Open a Java Notebook
and put the following sections in different code cells.
R Kernel
Open an R Notebook
and put the following in a code cell.
Clojure Kernel
Open a Clojure Notebook
and put the following in a code cell.
Groovy Kernel
Open a Groovy Notebook
and put the following in a code cell.
Kotlin Kernel
Open a Kotlin Notebook
and put the following sections in different code cells.
Advanced
Launching a Spark job
Using Terminal
Open a Terminal
from the Notebook UI and run an example spark-submit
job:
Using Python Notebook
Open a Python Notebook
and put the following in a code cell:
Using BeakerX Kernel
BeakerX Kernel supports Spark Magic, which allows you to configure and start a Spark session via the UI.
To use BeakerX Spark Magic, open a Scala Notebook
and put the following in a separate code cells:
If you want to use HDFS as storage or as an event log directory for a Spark History Server, HDFS configuration files should be added to the classpath. To make HDFS configuration available for the notebook, add the following line in the first cell of the notebook:
where hadoop_conf
is a directory holding Hadoop configuration files, such as core-site.xml
and hdfs-site.xml
.