DC/OS 1.13.0 was released on May 8, 2019.
Registered DC/OS Enterprise customers can access the DC/OS Enterprise configuration file from the support website. For new customers, contact your sales representative or sales@mesosphere.io before attempting to download and install DC/OS Enterprise.
Release summary
DC/OS is a distributed operating system that enables you to manage resources, application deployment, data services, networking, and security in an on-premise, cloud, or hybrid cluster environment.
This release provides new features and enhancements to improve the user experience, fix reported issues, integrate changes from previous releases, and maintain compatibility and support for other packages–such as Marathon and Metronome–used in DC/OS.
If you have DC/OS deployed in a production environment, see Known issues and limitations to see if any potential operational changes for specific scenarios apply to your environment.
New features and capabilities
DC/OS 1.13 includes new features and capabilities to enhance the installation and deployment experience, simplify cluster administration, increase operational productivity and efficiency, and provide additional monitoring, alerting, logging, and reporting for better visibility into cluster activity.
Highlights of what’s new
Some highlights for this release include:
- Unified service accounts and authentication architecture
- Monitoring and metrics for cluster operations
- Extended support for workloads that take advantage of accelerated processing provided by graphic processing units (GPU)
- Improvements to the Universal installer and the upgrade process
- New features and options for command-line programs
- New dashboard options for monitoring cluster performance
- Tighter integration between the Mesosphere Kubernetes Engine (MKE) and Edge-LB load balancing
Features and capabilities that are introduced in DC/OS 1.13 are grouped by functional area or component and include links to view additional documentation, if applicable.
Unified service accounts and authentication architecture
The core of the DC/OS Enterprise identity and access management service (IAM) has been open-sourced and added to DC/OS, replacing DC/OS OpenAuth (dcos-oauth
). This architectural change includes adding CockroachDB as the cluster high-availability database for identity and access management.
With this change, DC/OS also now supports unified service accounts. Service accounts allow individual programs and applications to interact with a DC/OS cluster using their own identity. A successful service account login results in authentication proof – the DC/OS authentication token. A valid DC/OS authentication token is required to access DC/OS services and components through the master node Admin Router.
This change also aligns the authentication architectures between DC/OS Enterprise and DC/OS Open Source. The HTTP API for service account management ans service authentication is now the same for both DC/OS Enterprise and DC/OS Open Source. For both DC/OS Enterprise and DC/OS Open Source clusters, the DC/OS authentication token is a JSON Web Token (JWT) of type RS256. This JWT authentication token can be validated by any component in the system after consulting the IAM services JSON Web Key Set (JWKS) endpoint.
Monitoring and metrics for cluster operations
This release extends DC/OS cluster monitoring capabilities and the metrics you can collect and report for DC/OS components. The enhancements to monitoring and metrics provide you with better visibility into cluster operations, activity, and performance through DC/OS itself and as input to Prometheus, Grafana, and other services.
Monitoring service
-
The DC/OS monitoring service (
dcos-monitoring
) can be configured to use DC/OS storage service (DSS) volumes to store time-series data.With this release, you can store the information collected by the DC/OS monitoring service (
dcos-monitoring
) in the profile-based storage provided by the DC/OS Storage Service. By using the DC/OS Storage Service to store the monitoring data used in Prometheus queries and Grafana dashboards, you can improve the performance and reliability of the Prometheus and Grafana monitoring components.When you install the DC/OS monitoring service, you can select the volume size and a volume profile for the file system where you want to store the Prometheus time-series database (
tsdb
). By specifying a volume managed by the DC/OS Storage Service, you can take advantage of the durability, performance, and flexibility DSS provides for your collected data.For more information about working with the DC/OS monitoring service, see DC/OS Monitoring Service. For more information about using the DC/OS storage service, see DC/OS Storage Service.
-
The DC/OS monitoring service enables you to import curated alerting rules.
With this release, deploying the DC/OS monitoring service enables you to import Mesosphere-provided Prometheus Alert Rules from a github repository. These predefined alert rules enable you to create meaningful alerts concerning the condition of the DC/OS cluster, including successful or failed operations and node activity.
Prometheus alert rules are automatically included as part of the DC/OS monitoring service. Each DC/OS component or framework available for monitoring should have a single rule file that contains all its alert rules. These alert rules are passed to Prometheus using the
rule_files
configuration parameter and are configured to specify one of the following severity levels:- Warning alerts identify issues that require notification, but do not require immediate action. For example, an alert identified as a warning might send email notification to an administrator but not require an immediate response.
- Critical alerts identify issues that require immediate attention. For example, a critical alert might trigger a pager notification to signal that immediate action is required.
-
Automatically create a curated collection of Prometheus-driven Grafana dashboards for DC/OS.
If you deploy DC/OS monitoring, you can leverage Mesosphere-provided Grafana-based dashboards. By installing and configuring the
dcos-monitoring
service, you can automatically create dashboards that enable you to quickly visualize the metrics that thedcos-monitoring
package is collecting from the DC/OS cluster and DC/OS-hosted applications. For more information about using Grafana dashboards, see the dashboard repository.
Metrics
DC/OS metrics are collected and managed through the Telegraf service. Telegraf provides an agent-based service that runs on each master and agent node in a DC/OS cluster. By default, Telegraf gathers metrics from all of the processes running on the same node, processes them, then sends the collected information to a central metrics database.
With this release, you can use Telegraf to collect and forward information for the following additional DC/OS cluster components:
- CockroachDB
- ZooKeeper
- Exhibitor
- Marathon
- Metronome
You can also collect information about the operation and performance of the Telegraf process itself. This information is stored along with other metrics and available for reporting using the DC/OS monitoring service or third-party monitoring services. For information about the Telegraf plugin and the metrics that Telegraf collects about its own performance, see the documentation for the Internal input plugin.
-
New volume and network metrics that are collected by the Mesos input plugin are enabled by default.
The metrics collection service,
dcos-telegraf
can now collect additional metrics for Mesos volumes and network information. For a complete list of the Mesos metrics you can collect and report, see the latest list of metrics.In DC/OS 1.13,
dcos-telegraf
automatically collects Mesos metrics by default. Previously, you were required to manually enable the metrics plugin by updating the agent configuration or by setting theenable_mesos_input_plugin
parameter in theconfig.yaml
file totrue
. With this release, manually enabling this feature is no longer required. Instead, the default value for the parameter is now set totrue
. You can set theenable_mesos_input_plugin
parameter in theconfig.yaml
file tofalse
if you want to disable the automatic collection of Mesos metrics. -
Collect and report metrics that track the health and performance of the DC/OS Telegraf plugin.
DC/OS metrics are collected and managed through the Telegraf service. Telegraf provides an agent-based service that runs on each master and agent node in a DC/OS cluster. By default, Telegraf gathers metrics from all of the processes running on the same node, processes them, then sends the collected information to a central metrics database.
With this release, the
dcos-telegraf
program collects and forwards information about the operation and performance of the Telegraf process itself. This information is stored along with other metrics and is available for reporting using the DC/OS monitoring service or third-party monitoring services. For information about the Telegraf plugin and the metrics that Telegraf collects about its own performance, see the documentation for the Internal input plugin. -
Expose task-related metrics using the Prometheus format.
You can expose metrics from tasks that run on Mesos in Prometheus format. When a port configuration belonging to a task is labelled appropriately, the metrics endpoint on that port is polled regularly over the lifetime of the task and metrics collected are added to the Telegraf pipeline.
For a detailed description of how to configure a task so that its metrics are collected in Prometheus format, see the Prometheus input plugin.
-
Add internal metrics for UDP activity to the Telegraf
statsd
input plugin.You can collect and report metrics for the number of incoming messages that have been dropped because of a full queue. This information is provided by the Telegraf
statsd
input plugin with theinternal_statsd_dropped_messages
metric. -
Add process-level metrics for DC/OS agents and masters.
You can collect and report process-level metrics for agent and master node processes. This information is provided by the Telegraf
procstat
input plugin. This plugin returns information about CPU and memory usage using theprocstat_cpu_usage
andprocstat_memory_rss
metrics. -
Add metrics for Admin Router instances running on DC/OS master nodes.
You can collect and report metrics for DC/OS Admin Router using NGINX Virtual Hosts metrics. This information is provided by Telegraf and NGINX input plugins and is enabled by default. You can view the NGINX instance metrics using the
/nginx/status
endpoint on each DC/OS master node. -
Add the fault domain region and zone information to metrics.
For more information about collecting metrics and configuring metrics plugins, see the following topics:
Command-line interface
-
Identify the public-facing IP address for public agent nodes through the DC/OS CLI.
With this release, you can retrieve the public-facing IP addresses for the nodes in a cluster by running the
dcos node list
command. For more information about using the new command for retrieving public IP addresses, see the dcos node command reference.You can look up the public agent IP address using the DC/OS web-based console, command-line interface, or API calls for DC/OS cluster nodes if DC/OS is deployed on a public cloud provider such as AWS, Google Cloud, or Azure. If DC/OS is installed on an internal network (on-premise) or a private cloud, nodes do not typically have separate public and private IP addresses. For nodes on an internal network or private cloud, the public IP address is most often the same as the IP address defined for the server in the DNS namespace.
-
Automatically install the DC/OS Enterprise command-line interface (CLI).
If you have deployed a DC/OS Enterprise cluster, ypu can now automatically install the DC/OS Enterprise CLI when you install the base CLI package. Previously, the DC/OS Enterprise CLI could only be installed manually after the successful installation of the base DC/OS CLI.
For more information about installing the command-line interface (CLI) and CLI plugins, see Installing the CLI and Installing the DC/OS Enterprise CLI.
-
Basic auto-completion using the TAB key.
You can now use the TAB key to provide automatic completion when typing DC/OS commands. Auto-completion enables you to execute commands in a shell terminal more quickly by attempting to predict the rest of a command or subcommand you are typing. If the suggested text matches the command you intended, you can press the TAB key to accept the suggestion and execute the command.
For more information about using auto-completion when you are working with the command-line interface (CLI) and CLI plugins, see Enabling autocompletion for the CLI.
-
Dynamic auto-completion of cluster names for
dcos cluster attach
anddcos cluster remove
commands.You can now use the TAB key to provide automatic completion for potential cluster names when you run the
dcos cluster attach
ordcos cluster remove
commands.For more information about using auto-completion when you are working with the command-line interface (CLI) and CLI plugins, see Enabling autocompletion for the CLI.
-
CLI support for macOS using Homebrew.
Homebrew is a software package management program you can use to install and configure packages for computers running macOS or Linux operating systems. With this release, you can install the DC/OS command-line interface (CLI) packages using the Mac OSX
homebrew
utility. Previously, you were required to download all DC/OS CLI plug-ins directly from the DC/OS cluster. By adding support for the Homebrew package manager, operators and developers can keep their CLI packages up-to-date using thebrew
command. For example, you can install the core CLI package by running the following command:brew install dcos-cli
For more information about installing and using Homebrew, see the Homebrew website or the GitHub repository.
Data services
-
Add a unique version number to Edge-LB pool packages.
You can run a command to return the version number for the Edge-LB pool package you have installed. Using the version number returned by the
edgelb version
command, you can verify whether the Edge-LB pool and the Edge-LB API server versions match. The Edge-LB API server and the Edge-LB pool version numbers should always match. For example, if you have the Edge-LB pool package version v1.3.0 installed, the API server version should be v1.3.0 as well.
-
Enable Edge-LB pool instances to be scaled up or down.
You can scale down the Edge-LB pool instances from a higher count to lower if you do not require all pool instances that are configured. To scale down, simply update the
count
variable in the Edge-LB pool configuration file to reflect the number of Edge-LB pool instances you need.
UI
-
Support for the independent upgrade of the DC/OS UI.
You can now install and update the DC/OS UI without having to upgrade the DC/OS cluster. This feature enables new updates for DC/OS to be published to the DC/OS catalog and also be available as
.dcos
files for on-premise customers. The ability to install and update the DC/OS UI without upgrading the DC/OS cluster enables you to easily get the latest fixes and capabilities available in the DC/OS UI without affecting cluster operations. You also now have the ability to roll back an update, enabling you to use the DC/OS UI version that was originally shipped with your version of DC/OS if you need to. -
Accurate status information for services.
DC/OS 1.13 UI now includes a new tab in the Details section of every SDK-based data service. This new tab provides a clear indication of the status and progress of SDK-based services during the service life cycle, including installation and upgrade activity. From the Details tab, you can see information about the specific operational plans that are currently running or have just completed. You can also view the execution of each task so that you can easily track the progress of the plans you have deployed.
For more information about viewing up-to-date status information for services and operational plans, see the Services documentation.
-
Identify the public-facing IP address for public agent nodes in the DC/OS UI.
With this release, you can view the public-facing IP addresses for agent nodes in the DC/OS UI. Previously, retrieving the public IP address for a node required writing a custom query. For more information about viewing public IP addresses in the DC/OS UI, see Finding the public IP address.
You can look up the public agent IP address using the DC/OS web-based console, command-line interface, or API calls for DC/OS cluster nodes if DC/OS is deployed on a public cloud provider such as AWS, Google Cloud, or Azure. If DC/OS is installed on an internal network (on-premise) or a private cloud, nodes do not typically have separate public and private IP addresses. For nodes on an internal network or private cloud, the public IP address is most often the same as the IP address defined for the server in the DNS namespace.
-
Add support for internationalization and localization (I18N and L10N - Chinese).
Mesosphere DC/OS 1.13 UI has now been translated into Mandarin Chinese. Mandarin-speaking customers and users can now easily switch the language displayed in the UI and be able to interact with DC/OS operations and functions in English or Chinese. The DC/OS documentation has also been translated to Chinese to support those customers. Support for additional languages can be provided if there is sufficient customer demand.
For information about changing the language displayed, see the UI documentation.
Installation
-
Multi-region support using the Universal Installer.
Multi-region deployments enable higher availability for DC/OS clusters and support for multiple regions is crucial for customers who want to maintain uptime without being susceptible for regional outages. For more information, see the documentation for multi-region deployment.
-
Dynamic masters on the Universal Installer.
Dynamic masters enable you to create, destroy, and recover master nodes. With this feature, you can use the Universal Installer to downscale or upscale your DC/OS clusters from not just the agent nodes (which is currently supported), but also replace master nodes – if you deem it necessary to do so. For more information, see replaceable masters.
-
Enable Universal Installer and on-premise DC/OS life cycle management with Ansible.
The DC/OS Ansible (
dcos-ansible
) component is a Mesosphere-provided version of the Ansible open-source provisioning, configuration management, and deployment tool that enables you to use supported Ansible roles for installing and upgrading DC/OS Open Source and DC/OS Enterprise clusters on the infrastructure you choose. For more information, see the documentation for Ansible.
Job management and scheduling
-
Enhance DC/OS job handling capabilities by adding support for the following:
- Graphic processing units (GPU) when creating new jobs in the DC/OS UI or with the new DC/OS configuration option
metronome_gpu_scheduling_behavior
. - Jobs running in universal container runtime (UCR) containers.
- File-based secrets.
- Hybrid cloud deployments.
- The
IS
constraint operator and the@region
and@zone
attributes.
- Graphic processing units (GPU) when creating new jobs in the DC/OS UI or with the new DC/OS configuration option
-
Provide an option to enable or disable offer suppression when agents are idle.
-
Collect metrics for the “root” Metronome process on DC/OS for better observability.
-
Add HTTP and uptime metrics for job management.
-
Set the default value for the
--gpu_scheduling_behavior
configuration option torestricted
to prevent jobs from being started on GPU-enabled agents if the job definition did not explicitly request GPU support.
Marathon
-
Enable secure computing (seccomp) and a default seccomp profile for UCR containers to prevent security exploits.
-
Replace Marathon-based health and readiness checks with generic DC/OS (Mesos-based) checks.
-
Collect metrics for the “root” Marathon framework on DC/OS for better observability.
-
Automatically replace instances when a DC/OS agent is decommissioned.
-
Set the default value for the
--gpu_scheduling_behavior
configuration option torestricted
to prevent tasks from being started on GPU-enabled agents if the app or pod definition did not explicitly request GPU support. -
Implement global throttling of Marathon-initiated health checks for better scalability.
-
Suppress offers by default when agents are idle for better scalability.
-
Close connections on slow event consumers to prevent excessive buffering and reduce the load on Marathon.
-
Aligning Ephemeral with Stateful Task Handling
Marathon 1.8 introduces handling ephemeral instances similar to how it handled stateful instances since version 1.0. Until now, Marathon expunged ephemeral instances once all of their tasks ended up in a terminal state, and eventually launched replacements as a result of those instances being expunged. Instances will now only be expunged from the state once their goal is set to
Decommissioned
and all their tasks are in a terminal state. If their goal is stillRunning
, they will be considered for scheduling and used to launch replacement tasks. This change not only merges two previously different code paths; this also simplifies debugging since users will be able to follow the task incarnations for a given instance throughout Marathons logs.This means that instance IDs are now stable for as long as an instance shall be kept running. New instances will be created only when replacing unreachable instances, and when replacing instances with new versions. Similar to the way we handle task IDs for stateful services, tasks of stateless services will now also provide an incarnation count, appended to the task Id. The first task created for an instance will be the .1, and subsequent replacements will increment that incarnation counter, e.g.
service-name.instance-c0caec0a-863a-11e9-915b-c610fee06dff._app.42
The above example denotes the 42nd generation of instance
c0caec0a-863a-11e9-915b-c610fee06dff
.When killing an instance using the
wipe=true
flag, its goal will be set toDecommission
and it will eventually be expunged when all tasks are terminal. Note that as long as its tasks are e.g. unreachable, it will not be expunged until they are reported terminal (in case they stay unreachable:GONE
,GONE_BY_OPERATOR
, orUNKNOWN
). When killing instances without thewipe=true
flag, Marathon will only issue kill requests to Mesos, but keep the current goal and will, therefore, launch replacements that are still associated with the existing instance.
Mesos platform and containerization
-
Update the Universal Container Runtime (UCR) to support Docker registry manifest specification v2_schema2 images.
DC/OS Universal Container Runtime (UCR) now fully supports Docker images that are formatted using the Docker v2_schema2 specification. The DC/OS Universal Container Runtime (UCR) also continues to support Docker images that use the v2_schema1 format.
For more information, see Universal Container Runtime.
-
Add a communication heartbeat to improve resiliency.
DC/OS clusters now include executor and agent communication channel heartbeats to ensure platform resiliency even if
IPFilter
is enabled withconntrack
, which usually times out a connection every five days. -
Support zero-downtime for tasks through layer-4 load balancing.
DC/OS cluster health checks now provide task-readiness information. This information enables zero-downtime for load balancing when services are scaled out. With this feature, load balanced traffic is not redirected to containers until the container health check returns a ‘ready’ status.
-
Add support for CUDA 10 image processing for applications that use graphics processing unit (GPU) resources and are based on the NVIDIA Container Runtime.
CUDA provides a parallel computing platform that enables you to use GPU resources for general purpose processing. The CUDA platform provides direct access to the GPU virtual instruction set using common programming languages such as C and C++. The NVIDIA Container Runtime is a container runtime that supports CUDA image processing and is compatible with the Open Containers Initiative (OCI) specification.
With this release, DC/OS adds support for CUDA, NVIDIA Container Runtime containers, and applications that use GPU resources to enable you to build and deploy containers for GPU-accelerated workloads.
Networking
-
Add a new networking API endpoint to retrieve the public-facing IP address for public agent nodes.
This release introduces a new API endpoint for accessing public-facing IP addresses for the nodes in a cluster. For more information about retrieving and viewing public IP addresses, see Finding the public IP address.
You can look up the public agent IP address using the DC/OS web-based console, command-line interface, or API calls for DC/OS cluster nodes if DC/OS is deployed on a public cloud provider such as AWS, Google Cloud, or Azure. If DC/OS is installed on an internal network (on-premise) or a private cloud, nodes do not typically have separate public and private IP addresses. For nodes on an internal network or private cloud, the public IP address is most often the same as the IP address defined for the server in the DNS namespace.
Security
-
Extend the DC/OS authentication architecture to apply to both DC/OS Open Source (OSS) and DC/OS Enterprise clusters.
You can now create unified service accounts that can be used across DC/OS OSS and DC/OS Enterprise clusters. By extending support for service accounts that can be used for all DC/OS clusters, you have the option to install, configure, and manage additional packages, including packages that require a service account when you are running DC/OS Enterprise DC/OS in
strict
mode.For more information about authentication and managing accounts, see Security and User account management.
-
Support secure computing mode (seccomp) profiles.
Secure computing mode (
seccomp
) is a feature provided by the Linux kernel. You can use secure computing mode to restrict the actions allowed within an app or pod container. You can enable secure computing mode using a default profile for Universal Container Runtime (UCR) containers if the operating system you are using supports it.With DC/OS, you can use a
seccomp
profile to deny access to specific system calls by default. The profile defines a default action and the rules for overriding that default action for specific system calls.Using a secure computing mode profile is an important option if you need to secure access to containers and operations using the principle of least privilege.
For more information about secure computing mode and the default secure computing profile, see Secure computing profiles.
Storage
-
Update Beta Rex-Ray to support NVMe EBS volumes.
REX-Ray is a container storage orchestration engine that enables persistence for cloud-native workloads. With Rex-Ray, you can manage native Docker Volume Driver operations through a command-line interface (CLI).
Amazon Elastic Block Store (Amazon EBS) provides block-level storage volumes for Amazon Elastic Cloud (EC2) instances. Amazon EBS volumes can be attached to any running EC2 instance hosted in the same Amazon availability zone to provide persistent storage that is independent of the deployed instance. EBS storage volumes can be exposed using NVMe (non-volatile memory express) as a host controller interface and storage protocol. NVMe devices enable you to accelerate the transfer of data between nodes and solid-state drives (SSDs) over a computer’s connection gateway.
With this release, DC/OS updates REX-Ray to support NVMe storage when the DC/OS cluster runs on an Amazon instance. To work with NVMe devices, however, you must provide your own
udev
rules andnvme-cli
package. For more information about using Rex-Ray, see the REX-Ray website and github repository. -
Provide a driver that enables AWS Elastic Block Store (EBS) volumes for the Mesosphere Kubernetes Engine (MKE).
You can use the AWS EBS Container Storage Interface (CSI) driver to manage storage volumes for the Mesosphere Kubernetes Engine (MKE). This driver enables MKE users to deploy stateful applications running in a DC/OS cluster on an AWS cloud instance.
-
Update support for the Container Storage Interface (CSI) specification.
With this release, DC/OS supports the Container Storage Interface (CSI) API, version 1 (v1), specification. You can deploy plugins that are compatible with either the Container Storage Interface (CSI) API, v0 or v1, specification to create persistent volumes through local storage resource providers. DC/OS automatically detects the CSI versions that are supported by the plugins you deploy.
Issues fixed in this release
The issues that have been fixed in DC/OS 1.13 are grouped by feature, functional area, or component. Most change descriptions include one or more issue tracking identifiers enclosed in parenthesis for reference.
Admin Router
-
Enable Admin Router to handle long server names (COPS-4286, DCOS-46277).
This release fixes an issue in Admin Router that prevented it from starting properly for some virtual machine configurations. For example, if you previously used a server name that exceeded the maximum size allowed, the
dcos-adminrouter
component might be unable to start the server. With this release, thepackages/adminrouter/extra/src/nginx.master.conf
file has been updated to support a server name hash bucket size of 64 characters. -
Change the master Admin Router service endpoint
/service/<service-name>
so that it does not remove theAccept-Encoding
header from requests, allowing services to serve compressed responses to user agents (DCOS_OSS-4906). -
Enable to master Admin Router to expose the DC/OS networking API through the
/net
endpoint path (DCOS_OSS-1837).This API can be used, for example, to return the public IP addresses of cluster nodes through the
/net/v1/nodes
endpoint. -
Enable Admin Router to return relative redirects to avoid relying on the
Host
header (DCOS-47845).
Command-line interface (CLI)
- Fix the CLI task metrics summary command which was occasionally failing to find metrics (DCOS_OSS-4679).
Diagnostics and logging
-
Enable DC/OS to create consolidated diagnostics bundles by applying a timeout when reading
systemd
journal entries (DCOS_OSS-5097). -
Add SELinux details to the DC/OS diagnostics bundle to provide additional information for troubleshooting and analysis (DCOS_OSS-4123).
-
Add external Mesos master and agent logs in the diagnostic bundle to provide additional information for troubleshooting and analysis (DCOS_OSS-4283).
-
Add logging for Docker-GC to the
journald
system logging facility (COPS-4044). -
Modify Admin Router to log information to a non-blocking domain socket (DCOS-43956).
Previously, if the
journald
logging facility failed to read the socket quickly enough, Admin Router would stop processing requests, causing log messages to be lost and blocking other processing activity. -
Allow the DC/OS Storage Service (DSS) endpoint for collecting diagnostics to be marked as optional (DCOS_OSS-5031).
The DC/OS Storage Service (DSS) provides an HTTP endpoint for collecting diagnostics. If you want the DC/OS diagnostics request to succeed when the storage service diagnostics endpoint is not available, you can configure the DC/OS diagnostics HTTP endpoint as optional. By specifying that the diagnostic endpoint is optional, you can ensure that failures to query the endpoint do not cause DC/OS diagnostics reporting to fail.
If the storage service diagnostics endpoint is optional when you generate a diagnostics report, DC/OS records a log message indicating that the endpoint is unavailable and ignored because it was marked as optional.
-
Prevent cloud provider access or account keys from being included in diagnostic reports (DCOS-51751).
With this release, the configuration parameters
aws_secret_access_key
andexhibitor_azure_account_key
are marked as secret and not visible in theuser.config.yaml
file on cluster nodes. This information is only visible inuser.config.full.yaml
file. This file has stricter read permissions and is not included in DC/OS Diagnostics bundles.
UI
- Change the default value for DC/OS UI X-Frame-Options from SAMEORIGIN to DENY. This setting is also now configurable using the
adminrouter_x_frame_options
configuration parameter (DCOS-49594).
Installation
-
Allow the DC/OS installer to be used when there is a space in its path (DCOS_OSS-4429).
-
Add a warning to the installer to let the user know if kernel modules required by the DC/OS storage service (DSS) are not loaded (DCOS-49088).
-
Improve the error messages returned if Docker is not running at the start of a DC/OS installation (DCOS-15890).
-
Stop requiring the
ssh_user
attribute to be set in theconfig.yaml
file when using parts of the deprecated CLI installer (DCOS_OSS-4613).
Job management and scheduling
- Job scheduling (Metronome) has been improved to handle the restart policy when a job fails. If a job fails to run, restarting the task should depend on the setting you have defined for the ON_FAILURE result (DCOS_OSS-4636).
Metrics
- Prefix illegal Prometheus metric names with an underscore (DCOS_OSS-4899).
Networking
-
Fix an issue that previously caused the
dcos-net-setup.py
script to fail if thesystemd
network directory did not exist (DCOS-49711). -
Add path-based routing to Admin Router to support routing of requests to the DC/OS networking (
dcos-net
) component (DCOS_OSS-1837). -
Mark the
dcos6
overlay network as disabled if theenable_ipv6
parameter is set to false (DCOS-40539). -
Enable IPv6 support for layer-4 load balancing (l4lb) by default (DCOS_OSS-1993).
-
Fix a race condition in the layer-4 load balancing (l4lb) network component (DCOS_OSS-4939).
-
Fix IPv6 virtual IP support in the layer-4 load balancing (l4lb) network component (DCOS-50427).
-
Update
iptable
rules to allow the same port to be used for port mapping and virtual IP addresses (DCOS_OSS-4970).DC/OS now allows you to use the same port for traffic routed to virtual IP addresses and to containers that use port mapping (for example, network traffice routed to a container using bridge networking). Previously, if you configured a virtual IP address listening on the same port as the host port specified for port mapping, the
iptable
rules identified the port conflict and prevented the virtual IP traffic from being routed to its intended destination. -
Update
lashup
to check that all master nodes are reachable (DCOS_OSS-4328).Lashup is an internal DC/OS building block for a distributed control operations. It is not an independent module, but used in conjunction with other components. This fix helps to ensure Lashup convergence to prevent connectivity issues and nodes creating multiple “sub-clusters” within a single DC/OS cluster.
-
Allow agents to store network information in a persistent location (COPS-4124, DCOS-46132, DCOS_OSS-4667).
A new agent option
--network_cni_root_dir_persist
allows the container node root directory to store network information in a persistent location. This option enables you to specify a containerwork_dir
root directory that persists network-related information. By persisting this information, the container network interface (CNI) isolator code can perform proper cleanup operations after rebooting.If rebooting a node does not delete old containers and IP/MAC addresses from
etcd
(which over time can cause pool exhaustion), you should set the--network_cni_root_dir_persist
agent option in theconfig.yaml
file totrue
. You should note that changing this flag requires rebooting the agent node or shutting down all container processes running on the node. Because a reboot or shutdown of containers is required, the default value for the--network_cni_root_dir_persist
agent option isfalse
.Before changing this option, you should plan for agent maintenance to minimize any service interruption. If you set this option and reboot a node, you should also unset the
CNI_NETNS
environment variable after rebooting using the CNI pluginDEL
command so that the plugin cleans up as many resources as possible (for example, by releasing IPAM allocations) and returns a successful response. -
Applications that use Docker containers with a virtual IP address resolve access to the application by using the
host_IP:port_number
instead of thecontainer_ip:port_number
for backend port mapping (COPS-4087). -
The distributed layer-4 load-balancer (
dcos-l4lb
) network component waits to route traffic until an application scale-up operation is complete or the application health check has passed (COPS-3924, DCOS_OSS-1954).The
dcos-l4lb
process does not prevent traffic from being routed if you are scaling down the number of application instances. Network traffic is only suspended if the status of the application is determined to be unhealthy or unknown.
Third-party updates and compatibility
-
Update support for REX-Ray to the most recent stable version (DCOS_OSS-4316,COPS-3961).
-
Upgrade the version of the Telegraf metrics plugin supported to leverage recent bug fixes and feature improvements (DCOS_OSS-4675).
-
Update the supported version of Java to 8u192 to address known critical and high security vulnerabilities (DCOS-43938, DCOS_OSS-4380).
-
Upgrade the support for the Erlang/OTP framework to Erlang/OTP version 21.3 (DCOS_OSS-4902).
Known issues and limitations
This section covers any known issues or limitations that don’t necessarily affect all customers, but might require changes to your environment to address specific scenarios. The issues are grouped by feature, functional area, or component. Where applicable, issue descriptions include one or more tracking identifiers enclosed in parenthesis for reference.
Using separate JSON files for job scheduling
In this release, jobs and job schedules are created in two separate steps. Because of this change, you must structure the job definition in the JSON editor in distinct sections similar to this:
- job: JSON definition that specifies the job identifier and job configuration details.
- schedule: JSON definition that specifies the schedule details for the job.
This two-step approach to creating JSON for jobs is different from previous releases in which jobs and schedules could be created in one step. In previous releases, the job could have its schedule embedded in its JSON configuration.
If you have an existing JSON configuration that has an embedded schedule and you want to view or modify that file using the job form JSON editor, you must:
-
Add the JSON object as the value for the
job
property in the editor.The job must be formatted according to the latest Jobs API specification. This API specification (v1) replaces the previous Jobs API specification (v0).
-
Copy the
schedules: [ scheduleJSON ]
from the existing job JSON configuration and add it at the same level after the job property asschedule: scheduleJSON
.The schedule must be formatted according to the Jobs API Schedule specification. This API specification (v1) replaces the previous Jobs API specification (v0).
-
Verify that the schedule section is not an array.
-
Remove the
schedules
property from the job’s JSON configuration settings.
The following example illustrates the changes required when you have job definition that includes an embedded schedule.
{
"id": "test-schedule",
"labels": {
},
"run": {
"cpus": 1,
"mem": 128,
"disk": 0,
"gpus": 0,
"cmd": "sleep 100",
"env": {
},
"placement": {
"constraints": [
]
},
"artifacts": [
],
"maxLaunchDelaySeconds": 300,
"volumes": [
],
"restart": {
"policy": "NEVER"
},
"secrets": {
}
},
"schedules": [
{
"id": "test",
"cron": "* * * * *",
"timezone": "UTC",
"startingDeadlineSeconds": 900,
"concurrencyPolicy": "ALLOW",
"enabled": true,
"nextRunAt": "2019-04-26T16:28:00.000+0000"
}
],
"activeRuns": [
],
"history": {
"successCount": 0,
"failureCount": 0,
"lastSuccessAt": null,
"lastFailureAt": null,
"successfulFinishedRuns": [
],
"failedFinishedRuns": [
]
}
}
To add this job definition to the JSON editor, you would modify the existing JSON as follows:
{
"job": {
"id": "test-schedule",
"labels": {
},
"run": {
"cpus": 1,
"mem": 128,
"disk": 0,
"gpus": 0,
"cmd": "sleep 100",
"env": {
},
"placement": {
"constraints": [
]
},
"artifacts": [ ],
"maxLaunchDelaySeconds": 300,
"volumes": [ ],
"restart": { "policy": "NEVER" },
"secrets": { }
}
},
"schedule": {
"id": "test",
"cron": "* * * * *",
"timezone": "UTC",
"startingDeadlineSeconds": 900,
"concurrencyPolicy": "ALLOW",
"enabled": true,
"nextRunAt": "2019-04-26T16:28:00.000+0000"
}
}
Authentication tokens after an upgrade
- Authentication tokens that are generated by DC/OS Open Authentication (
dcos-oauth
) before upgrading from DC/OS version 1.12.x to DC/OS version 1.13.x become invalid during the upgrade. To generate a new authentication token for access to DC/OS 1.13.x, log in using valid credentials after completing the upgrade.
Upgrading Marathon orchestration
- You can only upgrade to Marathon 1.8 from 1.6.x or 1.7.x. To upgrade from an earlier version of Marathon, you must first upgrade to Marathon 1.6.x or 1.7.x.
Restrictions for Marathon application names
-
You should not use restricted keywords in application names.
You should not add applications with names (identifiers) that end with restart, tasks, or versions. For example, the application names
/restart
and/foo/restart
are invalid and generate errors when you attempt to issue a GET /v2/apps request. If you have any existing apps with restricted names, attempting any operation–except delete–will result in an error. You should ensure that application names comply with the validation rules before upgrading Marathon.
Deprecated or decommissioned features
-
In DC/OS 1.13, the DC/OS history service has transitioned into the retired state. The history service is scheduled to be decommissioned in DC/OS 1.14. You can find the definitions for each of the feature maturity states documented in the Mesosphere DC/OS Feature Maturity Lifecycle.
-
Some of the configuration parameters previously used to install DC/OS cluster components are no longer valid. The following
dcos_generate_config.sh
command-line options have been deprecated and decommissioned:--set-superuser-password
--offline
--cli-telemetry-disabled
--validate
--preflight
--install-prereqs
--deploy
--postflight
If you attempt to use an option that is no longer valid, the installation script displays a warning message. You can also identify deprecated options by running the
dcos_generate_config.sh
script with the--help
option. The output for the--help
option displays [DEPRECATED] for the options that are no longer used.These options will be removed in DC/OS 1.14. If you have scripts or programs that use any of the deprecated options, you should update them.
-
The CLI command
dcos node
has been replaced by the new commanddcos node list
.Running the
dcos node
command after installing this release automatically redirects to the output of thedcos node list
command. Thedcos node list
command provides information similar to the output from thedcos node
command, but also includes an additional column that indicates the public IP address of each node.If you have scripts or programs that use output from the
dcos node
command, you should test the output provided by thedcos node list
command then update your scripts or programs, as needed. -
Marathon-based HTTP, HTTPS, TCP, and Readiness checks
Marathon.based HTTP, HTTPS, and TCP health checks have been deprecated since DC/OS 1.9. With this release, Marathon-based readiness checks have also been deprecated.
If you have not already done so, you should migrate services to use the Mesos Health and Generic checks in place of the Marathon-based checks. As part of this migration, you should keep in mind that you can only specify one Mesos-based Health check and one Mesos-based Generic check.
-
Marathon support for App Container (
appc
) images is decommissioned in 1.13.There has been no active development for AppC images since 2016. Support for AppC images will be removed in DC/OS 1.14.
-
Setting the
gpu_scheduling_behavior
configuration option toundefined
is no longer supported.With this release, the default value for the
gpu_scheduling_behavior
configuration option isrestricted
. The valueundefined
is decommissioned. This value will be removed in DC/OS 1.14.If you have scripts or programs that set the
gpu_scheduling_behavior
configuration option toundefined
, you should update them, as needed. -
Marathon no longer supports the
api_heavy_events
setting.With this release, the only response format allowed for
/v2/events
islight
(in accordance with the previously-published deprecation plan). If you attempt to start Marathon with the--deprecated_features=api_heavy_events
setting specified, the startup operation will fail with an error. -
Marathon no longer supports Kamon-based metrics and related command-line arguments.
The following command-line arguments that are related to outdated reporting tools have been removed:
--reporter_graphite
--reporter_datadog
--metrics_averaging_window
If you specify any of these flags, Marathon will fail to start.
-
Proxying server-sent events (sse) from standby Marathon instances is no longer supported.
DC/OS no longer allows a standby Marathon instance to proxy
/v2/events
from the Marathon leader. Previously, it was possible to use theproxy_events
flag to force Marathon to proxy the response from/v2/events
. This standby redirect functionality and the related flag are no longer valid in 1.13. -
Marathon no longer supports the
save_tasks_to_launch_timeout
setting.The
save_tasks_to_launch_timeout
option was deprecated in Marathon 1.5 and using it has had no effect on Marathon operations since that time. If you specify thesave_tasks_to_launch_timeout
setting, Marathon will fail to start.
Updated components change lists
For access to the logs that track specific changes to components that are included in the DC/OS distribution, see the following links:
- Apache Mesos 1.8.0 change log.
- Marathon 1.8.x change log.
- Metronome 0.6.18 change log.
- DC/OS 1.13 change log.
Previous releases
To review changes from a recent previous release, see the following links:
- Release version 1.10.11 - 12 February 2019.
- Release version 1.11.10 - 12 February 2019.
- Release version 1.12.3 - 14 March 2019.