A DC/OS cluster consists of two types of nodes master nodes and agent nodes. The agent nodes can be either public agent nodes or private agent nodes. Public agent nodes provide north-south (external to internal) access to services in the cluster through load balancers. Private agents host the containers and services that are deployed on the cluster. In addition to the master and agent cluster nodes, each DC/OS installation includes a separate bootstrap node for DC/OS installation and upgrade files. Some of the hardware and software requirements apply to all nodes. Other requirements are specific to the type of node being deployed.
Hardware prerequisites
The hardware prerequisites are a single bootstrap node, Mesos master nodes, and Mesos agent nodes.
Bootstrap node
- DC/OS installation is run on a single Bootstrap node with two cores, 16 GB RAM, and 60 GB HDD.
- The bootstrap node is only used during the installation and upgrade process, so there are no specific recommendations for high performance storage or separated mount points.
All master and agent nodes in the cluster
The DC/OS cluster nodes are designated Mesos masters and agents during installation. The supported operating systems and environments are listed on the version policy page.
When you install DC/OS on the cluster nodes, the required files are installed in the /opt/mesosphere
directory. You can create the /opt/mesosphere
directory prior to installing DC/OS, but it must be either an empty directory or a link to an empty directory. DC/OS can be installed on a separate volume mount by creating an empty directory on the mounted volume, creating a link at /opt/mesosphere
that targets the empty directory, and then installing DC/OS.
You should verify the following requirements for all master and agent nodes in the cluster:
- Every node must have network access to a public Docker repository or to an internal Docker registry.
- If the node operating system is RHEL 7 or CentOS 7, the
firewalld
daemon must be stopped and disabled. For more information, see Disabling the firewall daemon on Red Hat or CentOS. - The DNSmasq process must be stopped and disabled so that DC/OS has access to port 53. For more information, see Stopping the DNSmasq process.
- You are not using
noexec
to mount the/tmp
directory on any system where you intend to use the DC/OS CLI. - You have sufficient disk to store persistent information for the cluster in the
var/lib/mesos
directory. - You should not remotely mount the
/var/lib/mesos
or Docker storage/var/lib/docker
directory.
Disabling the firewall daemon on Red Hat or CentOS
There is a known Docker issue that the firewalld
process interacts poorly with Docker. For more information about this issue, see the Docker CentOS firewalld documentation.
To stop and disable the firewalld
, run the following command:
sudo systemctl stop firewalld && sudo systemctl disable firewalld
Stopping the DNSmasq process
The DC/OS cluster requires access to port 53. To prevent port conflicts, you should stop and disable the dnsmasq
process by running the following command:
sudo systemctl stop dnsmasq && sudo systemctl disable dnsmasq.service
Master node requirements
The following table lists the master node hardware requirements:
Minimum | Recommended | |
---|---|---|
Nodes | 1* | 3 or 5 |
Processor | 4 cores | 4 cores |
Memory | 32 GB RAM | 32 GB RAM |
Hard disk | 120 GB | 120 GB |
* For business critical deployments, three master nodes are required rather than one master node.
There are many mixed workloads on the masters. Workloads that are expected to be continuously available or considered business-critical should only be run on a DC/OS cluster with at least three masters. For more information about high availability requirements see the High Availability documentation.
Examples of mixed workloads on the masters are Mesos replicated logs and ZooKeeper. In some cases, mixed workloads require synchronizing with fsync
periodically, which can generate a lot of expensive random I/O. We recommend the following:
-
Solid-state drive (SSD) or non-volatile memory express (NVMe) devices for fast, locally-attached storage. To reduce the likelihood of I/O latency issues, solid-state drives should be locally attached to the physical machine, if possible. You should also be sure that solid-state drive (SSD) or non-volatile memory express (NVMe) devices are used for the file systems hosting master node replicated logs.
In planning your storage requirements, keep in mind that you should avoid using a single storage area network (SAN) device and NFS to connect to the nodes in the cluster. This type of architecture introduces a higher possibility of latency than using local storage and introduces a single point of failure in what should otherwise be a distributed system. Network latency and bandwidth issues can cause client sessions to time out and adversely affect [DC/OS] cluster performance and reliability.
-
RAID controllers with a battery backup unit (BBU).
-
RAID controller cache configured in writeback mode.
-
If separation of storage mount points is possible, the following storage mount points are recommended on the master node. These recommendations will optimize the performance of a busy DC/OS cluster by isolating the I/O of various services.
Directory Path Description /var/lib/dcos A majority of the I/O on the master nodes will occur within this directory structure. If you are planning a cluster with hundreds of nodes or intend to have a high rate of deploying and deleting workloads, isolating this directory to dedicated SSD storage on a separate device is recommended. -
Further breaking down this directory structure into individual mount points for specific services is recommended for a cluster which will grow to thousands of nodes.
Directory Path Description /var/lib/dcos/mesos/master logging directories /var/lib/dcos/cockroach CockroachDB Enterprise /var/lib/dcos/navstar for Mnesia database /var/lib/dcos/secrets secrets vault Enterprise /var/lib/dcos/exec Temporary files required by various DC/OS services. The /var/lib/dcos/exec directory must not be on a volume which is mounted with the noexec
option./var/lib/dcos/exhibitor Zookeeper database /var/lib/dcos/exhibitor/zookeeper/transactions The ZooKeeper transaction logs are very sensitive to delays in disk writes. If you can only provide limited SSD space, this is the directory to place there. A minimum of 2 GB must be available for these logs.
Agent node requirements
The table below shows the agent node hardware requirements.
Minimum | Recommended | |
---|---|---|
Nodes | 1 | 6 or more |
Processor | 2 cores | 2 cores |
Memory | 16 GB RAM | 16 GB RAM |
Hard disk | 60 GB | 60 GB |
In planning memory requirements for agent nodes, you should ensure that agents are configured minimize the use of swap space. The recommended best practice is optimize cluster performance and reduce potential resource consumption issues to disable memory swapping for all agents in the cluster, if possible.
In addition to the requirements described in All master and agent nodes in the cluster, the agent nodes must have:
-
A
/var
directory with 20 GB or more of free space. This directory is used by the sandbox for both Docker and DC/OS Universal container runtime. -
Do not use
noexec
to mount the/tmp
directory on any system where you intend to use the DC/OS CLI unless a TMPDIR environment variable is set to something other than/tmp/
. Mounting the/tmp
directory using thenoexec
option could break CLI functionality. -
If you are planning a cluster with hundreds of agent nodes or intend to have a high rate of deploying and deleting services, isolating this directory to dedicated SSD storage is recommended.
Directory Path Description /var/lib/mesos/ Most of the I/O from the Agent nodes will be directed at this directory. Also, The disk space that Apache Mesos advertises in its UI is the sum of the space advertised by filesystem(s) underpinning /var/lib/mesos -
Further breaking down this directory structure into individual mount points for specific services is recommended for a cluster which will grow to thousands of nodes.
Directory path Description /var/lib/mesos/slave/slaves Sandbox directories for tasks /var/lib/mesos/slave/volumes Used by frameworks that consume ROOT persistent volumes /var/lib/mesos/docker/store Stores Docker image layers that are used to provision URC containers /var/lib/docker Stores Docker image layers that are used to provision Docker containers
Port and protocol configuration
- Secure shell (SSH) must be enabled on all nodes.
- Internet Control Message Protocol (ICMP) must be enabled on all nodes.
- All hostnames (FQDN and short hostnames) must be resolvable in DNS; both forward and reverse lookups must succeed. Enterprise
- All DC/OS node host names should resolve to locally bindable IP addresses. Most applications require host names to resolve by binding to a local IP address to function correctly. Applications that cannot resolve the host name of a node by binding to a local IP address might fail to function or behave in unexpected ways. Enterprise
- Each node is network accessible from the bootstrap node.
- Each node has unfettered IP-to-IP connectivity from itself to all nodes in the DC/OS cluster.
- All ports should be open for communication from the master nodes to the agent nodes and vice versa. Enterprise
- UDP must be open for ingress to port 53 on the masters. To attach to a cluster, the Mesos agent node service (
dcos-mesos-slave
) uses this port to findleader.mesos
.
Requirements for intermediaries (e.g., reverse proxies performing SSL termination) between DC/OS users and the master nodes:
- No intermediary must buffer the entire response before sending any data to the client.
- Upon detecting that its client goes away, the intermediary should also close the corresponding upstream TCP connection (i.e., the intermediary should not reuse upstream HTTP connections).
High-speed internet access
High speed internet access is recommended for DC/OS installations. A minimum 10 MBit per second is required for DC/OS services. The installation of some DC/OS services will fail if the artifact download time exceeds the value of MESOS_EXECUTOR_REGISTRATION_TIMEOUT within the file /opt/mesosphere/etc/mesos-slave-common
. The default value for MESOS_EXECUTOR_REGISTRATION_TIMEOUT is 10 minutes.
Software prerequisites
-
Refer to the install_prereqs.sh script for an example of how to install the software requirements for DC/OS masters and agents on a CentOS 7 host.Enterprise
-
When using OverlayFS over XFS, the XFS volume should be created with the -n ftype=1 flag. Please see the Red Hat and Mesos documentation for more details.
Docker requirements
Docker must be installed on all bootstrap and cluster nodes. The supported Docker versions are listed on version policy page.
Recommendations
-
Do not use Docker
devicemapper
storage driver inloop-lvm
mode. For more information, see Docker and the Device Mapper storage driver. -
Prefer
OverlayFS
ordevicemapper
indirect-lvm
mode when choosing a production storage driver. For more information, see Docker’s Select a Storage Driver. -
Manage Docker on CentOS with
systemd
. Thesystemd
handles will start Docker and helps to restart Dcoker, when it crashes. -
Run Docker commands as the root user (with
sudo
) or as a user in the docker user group.
Distribution-specific installation
Each Linux distribution requires Docker to be installed in a specific way:
- CentOS/RHEL/Oracle Linux - Install Docker from Docker’s yum repository.
- Ubuntu - Install Docker using the apt command.
For more more information, see Docker’s distribution-specific installation instructions.
Disable sudo password prompts
To disable the sudo
password prompt, you must add the following line to your /etc/sudoers
file.
%wheel ALL=(ALL) NOPASSWD: ALL
Alternatively, you can SSH as the root
user.
Synchronize time for all nodes in the cluster
You must enable Network Time Protocol (NTP) on all nodes in the cluster for clock synchronization. By default, during DC/OS startup you will receive an error if this is not enabled. You can check if NTP is enabled by running one of these commands, depending on your OS and configuration:
ntptime
adjtimex -p
timedatectl
Bootstrap node
Before installing DC/OS, you must ensure that your bootstrap node has the following prerequisites.
- The bootstrap node must be separate from your cluster nodes.
DC/OS configuration file
-
Download and save the dcos_generate_config file to your bootstrap node. This file is used to create your customized DC/OS build file. Contact your sales representative or sales@mesosphere.com for access to this file. Enterprise
-
Download and save the dcos_generate_config file to your bootstrap node. This file is used to create your customized DC/OS build file. Open Source
Docker NGINX (production installation)
For production installations only, install the Docker NGINX image with this command:
sudo docker pull nginx
Cluster nodes
For production installations only, your cluster nodes must have the following prerequisites. The cluster nodes are designated as Mesos masters and agents during installation.
Data compression (production installation)
You must have the UnZip, GNU tar, and XZ Utils data compression utilities installed on your cluster nodes.
To install these utilities on CentOS7 and RHEL7:
sudo yum install -y tar xz unzip curl ipset
Cluster permissions (production installation)
On each of your cluster nodes, follow the below instructions:
-
Make sure that SELinux is in one of the supported modes.
To review the current SELinux status and configuration run the following command:
sudo sestatus
DC/OS supports the following SELinux configurations:
- Current mode:
disabled
- Current mode:
permissive
- Current mode:
enforcing
, given thatLoaded policy name
istargeted
To change the mode from
enforcing
topermissive
run the following command:sudo sed -i 's/SELINUX=enforcing/SELINUX=permissive/g' /etc/selinux/config
Or, if
sestatus
shows a “Current mode” which isenforcing
with aLoaded policy name
which is nottargeted
, run the following command to change theLoaded policy name
totargeted
:sudo sed -i 's/SELINUXTYPE=.*/SELINUXTYPE=targeted/g' /etc/selinux/config
- Current mode:
-
Add
nogroup
anddocker
groups:sudo groupadd nogroup && sudo groupadd docker
-
Reboot your cluster for the changes to take effect.
sudo reboot
Locale requirements
You must set the LC_ALL
and LANG
environment variables to en_US.utf-8
.
-
For information on how to set these variables in Red Hat, see How to change system locale on RHEL
-
On Linux:
localectl set-locale LANG=en_US.utf8
- For information on how to set on these variables in CentOS7, see How to set up system locale on CentOS 7.
Next steps
Installing Docker on CentOS/RHEL
Requirements, recommendations and procedures for installing Docker CE on CentOS/RHEL…Read More
ZooKeeper resources
Requirements and recommendations for ZooKeeper in a DC/OS cluster…Read More
DC/OS Ports
Understanding configured ports for DC/OS deployment…Read More
Disk Partitions
Planning disk partitioning for a DC/OS cluster…Read More
Installing using a Custom AMI
Using AWS machine images to launch DC/OS…Read More
Azure Recommendations
Recommendations for Azure…Read More