This section contains examples for using DC/OS Data Science Engine.
Basic
Perform a default installation by following the instructions in the Quick Start Guide section.
To run any job, first you have to open the lab and choose the notebook you want to run.
You can select from many notebooks available in the lab, for example, Scala, Python, R, and so forth.
A Notebook consists of cells; each cell can be of type markdown
or code
.
In the markdown
cell, you can write text or HTML. In the code
cell, you can type your code as shown in the example below.
Python Kernel
Open a Python Notebook
and put the following sections in different code cells.
def initMatrix(nrow, ncol):
mat = []
counter = 1
for i in range(0, nrow):
row = []
for j in range(0, ncol):
row.append(counter)
counter += 1
mat.append(row)
return mat
def sumMatrix(mat):
sum = 0
for row in mat:
for x in row:
sum += x
return sum
mat = initMatrix(10, 10)
sum = sumMatrix(mat)
print(sum)
Scala Kernel
Open a Scala Notebook
and put the following sections in different code cells.
def initMatrix(nrow: Int, ncol: Int): Array[Array[Int]] = {
val mat = Array.ofDim[Int](nrow, ncol)
var counter = 1
for(i <- 0 to 9) {
for(j <- 0 to 9) {
mat(i)(j) = counter
counter += 1
}
}
mat
}
def sumMatrix(mat: Array[Array[Int]]): Int = {
var sum = 0
for (i <- 0 to 9) {
for (j <- 0 to 9) {
sum = sum + mat(i)(j)
}
}
sum
}
val mat = initMatrix(10, 10)
val sum = sumMatrix(mat)
Java Kernel
Open a Java Notebook
and put the following sections in different code cells.
class Matrix {
private int[][] mat;
// constructor to initialize matrix of given number of rows and columns
public Matrix(int row, int col) {
mat = new int[row][col];
int counter = 1;
for(int i = 0; i < row; i++) {
for(int j = 0; j < col; j++) {
mat[i][j] = counter++;
}
}
}
// finding sum of all the numbers in the matrix
public int sum() {
int sum = 0;
for(int i = 0; i < mat.length; i++) {
for(int j = 0; j < mat[i].length; j++) {
sum += mat[i][j];
}
}
return sum;
}
}
Matrix mat = new Matrix(10, 10);
return mat.sum();
R Kernel
Open an R Notebook
and put the following in a code cell.
mat <- matrix(data = seq(1, 100, by=1), nrow = 10, ncol = 10)
sum = 0
# calculating sum of all numbers
for (r in 1:nrow(mat)) {
for(c in 1:ncol(mat)) {
sum = sum + mat[r,c]
}
}
print(sum)
Clojure Kernel
Open a Clojure Notebook
and put the following in a code cell.
;; add numbers from 1 to 100
(reduce + (range 1 101 1))
Groovy Kernel
Open a Groovy Notebook
and put the following in a code cell.
def sum = 0
1.upto(100) {
sum = sum + it
}
sum
Kotlin Kernel
Open a Kotlin Notebook
and put the following sections in different code cells.
fun initMatrix(nrow: Int, ncol: Int): Array<IntArray> {
val mat = Array(nrow, {IntArray(ncol)})
var counter = 1
for(i in 0..9) {
for(j in 0..9) {
mat[i][j] = counter
counter += 1
}
}
return mat
}
fun sumMatrix(mat: Array<IntArray>): Int {
var sum = 0
for(i in 0..9) {
for(j in 0..9) {
sum += mat[i][j]
}
}
return sum
}
val mat = initMatrix(10, 10)
val sum = sumMatrix(mat)
sum
Advanced
Launching a Spark job
Using Terminal
Open a Terminal
from the Notebook UI and run an example spark-submit
job:
spark-submit --class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.4.0.jar 100
Using Python Notebook
Open a Python Notebook
and put the following in a code cell:
from __future__ import print_function
import sys
from random import random
from operator import add
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("PythonPi")\
.getOrCreate()
partitions = 2
n = 100000 * partitions
def f(_):
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 <= 1 else 0
count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))
spark.stop
Using BeakerX Kernel
BeakerX Kernel supports Spark Magic, which allows you to configure and start a Spark session via the UI.
To use BeakerX Spark Magic, open a Scala Notebook
and put the following in a separate code cells:
%%spark --start
val count = spark.sparkContext.parallelize(1 to 10000).sum()
count
If you want to use HDFS as storage or as an event log directory for a Spark History Server, HDFS configuration files should be added to the classpath. To make HDFS configuration available for the notebook, add the following line in the first cell of the notebook:
%classpath add jar /mnt/mesos/sandbox/hadoop_conf
where hadoop_conf
is a directory holding Hadoop configuration files, such as core-site.xml
and hdfs-site.xml
.