In planning system resources for DC/OS, you should pay particular attention to how disks are partitioned. To prevent DC/OS from running low on disk space or having disk space contention adversely affect cluster operations, you should create separate partitions to isolate I/O-intensive services such as the journald
logging facility process and Mesos sandbox from critical infrastructure services such as ZooKeeper and CockroachDB. Using separate partitions helps to ensure fault-tolerant cluster operations and limits the scope of disk space errors (ENOSPC - no space left on device errors) to recoverable issues such as tasks failing to deploy.
Recommended partition layout
You can use the following guidelines to plan disk partitioning for replicated state stores and persistent configuration override locations under /var/lib/dcos
.
Master nodes
For the master nodes, the recommended practice is to host the /var/lib/dcos
directory on a separate partition backed by fast, locally-attached storage (SSD/NVMe). Using this separate partition on the master nodes enables the following replicated state stores and persistent configuration override files to be stored under the /var/lib/dcos
directory:
- Mesos Paxos replicated log: /var/lib/dcos/mesos/master/replicated_log
- Navstar Overlay replicated log: /var/lib/dcos/mesos/master/overlay_replicated_log
- CockroachDB distributed database: /var/lib/dcos/cockroach
- Navstar Mnesia distributed database: /var/lib/dcos/navstar/mnesia
- Navstar Lashup distributed database: /var/lib/dcos/navstar/lashup
- Secrets vault: /var/lib/dcos/secrets/vault
- Zookeeper distributed database: /var/lib/dcos/exhibitor/zookeeper
Agent nodes
On the agent nodes, you should use separate partitions for the following directories under /var/lib/mesos
:
-
/var/lib/mesos
- You should always host the/var/lib/mesos
directory on a separate partition. Keep in mind that the disk space that Apache Mesos advertises in its UI is the sum of the space provided by the file system(s) underpinning the/var/lib/mesos
directory, including any MOUNT volumes (/dcos/volume<n>
). -
/var/lib/mesos/slave/slaves
- This directory hosts the sandbox directories for tasks. You should use a separate partition for this directory, if possible. -
/var/lib/mesos/slave/volumes
- This directory is used by frameworks that consume ROOT persistent volumes. You should use a separate partition for this directory, if possible. -
/var/lib/mesos/slave/store/docker
- This directory stores Docker image layers that are used to provision UCR containers. You should use a separate partition for this directory, if possible.
In most cases, agents do not require a separate partition for the persistent configuration override files stored in the /var/lib/dcos
directory. You should, however, be sure to allow enough disk space partitions for the following configuration files on agent nodes:
/var/lib/dcos/mesos-slave-common
/var/lib/dcos/mesos-resources