TensorFlow is an end-to-end open source platform for machine learning. It is included in your DC/OS Data Science Engine installation.

Using TensorFlow with Python

Open a Python Notebook and put the following sections in different code cells.

Prepare the test data:

import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

x_train /= 255
x_test /= 255

Define a model:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D

model = Sequential()
model.add(Conv2D(28, kernel_size=(3,3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256, activation=tf.nn.relu))
model.add(Dropout(0.2))
model.add(Dense(10,activation=tf.nn.softmax))

# Training and evaluating the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(x=x_train,y=y_train, epochs=10)

model.evaluate(x_test, y_test)

Use the model to predict a hand-written number:

image_index = 5555 # should be '3'
pred = model.predict(x_test[image_index].reshape(1, 28, 28, 1))
print("predicted number: {}".format(pred.argmax()))

TensorFlow on Spark

DC/OS Data Science Engine includes TensorFlow on Spark integration, which allows you to run TensorFlow in a distributed mode, using Apache Spark as an engine.

Here is an example notebook of Tensorflow on Spark using HDFS as a storage backend.

Launch Terminal from Notebook UI.

Clone the TensorFlow on Spark repository and download the sample dataset:

rm -rf TensorFlowOnSpark && git clone https://github.com/yahoo/TensorFlowOnSpark
rm -rf mnist && mkdir mnist
curl -fsSL -O https://infinity-artifacts.s3-us-west-2.amazonaws.com/jupyter/mnist.zip
unzip -d mnist/ mnist.zip

List files in the target HDFS directory and remove it if it is not empty.
```
hdfs dfs -ls -R mnist/ && hdfs dfs -rm -R mnist/
```

Generate sample data and save to HDFS.

spark-submit \
  --verbose \
  $(pwd)/TensorFlowOnSpark/examples/mnist/mnist_data_setup.py \
  --output mnist/csv \
  --format csv

hdfs dfs -ls -R  mnist

Train the model and checkpoint it to the target directory in HDFS.

spark-submit \
  --verbose \
  --py-files $(pwd)/TensorFlowOnSpark/examples/mnist/spark/mnist_dist.py \
  $(pwd)/TensorFlowOnSpark/examples/mnist/spark/mnist_spark.py \
  --cluster_size 4 \
  --images mnist/csv/train/images \
  --labels mnist/csv/train/labels \
  --format csv \
  --mode train \
  --model mnist/mnist_csv_model

Verify that model has been saved.
```
hdfs dfs -ls -R mnist/mnist_csv_model
```

TensorBoard

DC/OS Data Science Engine comes with TensorBoard installed. It can be found at http://<dcos-url>/service/data-science-engine/tensorboard/.

Log directory

TensorBoard reads log data from specific directory, with the default being /mnt/mesos/sandbox. It can be changed with advanced.tensorboard_logdir option. HDFS paths are supported as well.

Here is an example:

Install HDFS:
```
dcos package install hdfs
```

Install data-science-engine with overridden log directory option:

dcos package install --options=options.json data-science-engine

With options.json having the following content:

{
  "advanced": {
    "tensorboard_logdir": "hdfs://tf_logs"
  }
}

Open TensorBoard at https://<dcos-url>/service/data-science-engine/tensorboard/ and confirm the change.

Disabling TensorBoard

DC/OS Data Science Engine can be installed with TensorBoard disabled by using the following configuration:

{
  "advanced": {
    "start_tensorboard": false
  }
}

TensorFlow

Using TensorFlow with DC/OS Data Science Engine

Using TensorFlow with Python

TensorFlow on Spark

TensorBoard

Log directory

Disabling TensorBoard