This section will explain how to configure Prometheus for use with DC/OS.
Prerequisites
- A running DC/OS 1.11 cluster
Install
Prometheus can be installed via either the DC/OS Catalog web interface or by using the CLI. The following command will launch the install via the DC/OS CLI:
dcos package install prometheus --yes
Figure 1. Installing Prometheus
Install Grafana from the service catalog as well. It can be used as a graphing tool.
dcos package install grafana --yes
The framework provides options to enter the Prometheus, AlertManager and Rules config. The default Prometheus configuration scrapes a DC/OS master and agents in the clusters. The framework allows users to append any new config to the end for full extensibility.
Accessing the Prometheus UI with EdgeLB (Enterprise Feature)
Once the framework is up and running:
- Install EdgeLB.
- Create a file named
prometheus-edgelb.json
containing the followingedge-lb
configuration:
{
"apiVersion": "V2",
"name": "prometheus",
"count": 1,
"haproxy": {
"frontends": [
{
"bindPort": 9092,
"protocol": "HTTP",
"linkBackend": {
"defaultBackend": "prometheus"
}
},
{
"bindPort": 9093,
"protocol": "HTTP",
"linkBackend": {
"defaultBackend": "alertmanager"
}
},
{
"bindPort": 9094,
"protocol": "HTTP",
"linkBackend": {
"defaultBackend": "grafana"
}
},
{
"bindPort": 9091,
"protocol": "HTTP",
"linkBackend": {
"defaultBackend": "pushgateway"
}
}
],
"backends": [
{
"name": "prometheus",
"protocol": "HTTP",
"services": [{
"endpoint": {
"type": "ADDRESS",
"address": "prometheus.prometheus.l4lb.thisdcos.directory",
"port": 9090
}
}]
},
{
"name": "alertmanager",
"protocol": "HTTP",
"services": [{
"endpoint": {
"type": "ADDRESS",
"address": "alertmanager.prometheus.l4lb.thisdcos.directory",
"port": 9093
}
}]
},
{
"name": "grafana",
"protocol": "HTTP",
"services": [{
"endpoint": {
"type": "ADDRESS",
"address": "grafana.grafana.l4lb.thisdcos.directory",
"port": 3000
}
}]
},
{
"name": "pushgateway",
"protocol": "HTTP",
"services": [{
"endpoint": {
"type": "ADDRESS",
"address": "pushgateway.prometheus.l4lb.thisdcos.directory",
"port": 9091
}
}]
}
]
}
}
Description of the EdgeLB configuration above:
- Exposes Prometheus Pushgateway UI at
http://<public-agent-ip>:9091
- Exposes Prometheus UI at
http://<public-agent-ip>:9092
- Exposes Prometheus Alertmanager UI at
http://<public-agent-ip>:9093
- Exposes Grafana UI at
http://<public-agent-ip>:9094
- Deploy the
edge-lb
service configuration above using:
dcos edgelb create prometheus-edgelb.json
Accessing the Prometheus UI using MarathonLB (OSS)
- Install Marathon-LB
dcos package install marathon-lb --yes
- Create a file named
prometheus-marathonlb.json
containing the followingmarathon-lb
proxy configuration:
{
"id": "/prometheus-proxy",
"instances": 1,
"cpus": 0.001,
"mem": 16,
"cmd": "tail -F /dev/null",
"container": {
"type": "MESOS"
},
"portDefinitions": [
{
"protocol": "tcp",
"port": 0
},
{
"protocol": "tcp",
"port": 0
},
{
"protocol": "tcp",
"port": 0
},
{
"protocol": "tcp",
"port": 0
}
],
"labels": {
"HAPROXY_GROUP": "external",
"HAPROXY_0_MODE": "http",
"HAPROXY_0_PORT": "9092",
"HAPROXY_0_BACKEND_SERVER_OPTIONS": "server prometheus prometheus.prometheus.l4lb.thisdcos.directory:9090",
"HAPROXY_1_MODE": "http",
"HAPROXY_1_PORT": "9093",
"HAPROXY_1_BACKEND_SERVER_OPTIONS": "server alertmanager alertmanager.prometheus.l4lb.thisdcos.directory:9093",
"HAPROXY_2_MODE": "http",
"HAPROXY_2_PORT": "9091",
"HAPROXY_2_BACKEND_SERVER_OPTIONS": "server pushgateway pushgateway.prometheus.l4lb.thisdcos.directory:9091",
"HAPROXY_3_MODE": "http",
"HAPROXY_3_PORT": "9094",
"HAPROXY_3_BACKEND_SERVER_OPTIONS": "server grafana grafana.grafana.l4lb.thisdcos.directory:3000"
}
}
Description of the Marathon-LB configuration above:
- Exposes Prometheus Pushgateway UI at
http://<public-agent-ip>:9091
- Exposes Prometheus UI at
http://<public-agent-ip>:9092
- Exposes Prometheus Alertmanager UI at
http://<public-agent-ip>:9093
- Exposes Grafana UI at
http://<public-agent-ip>:9094
Navigate to the Service UI
Enter the following address in your browser:
http://<public-agent-ip>:9092
You should see the Prometheus UI: Figure 2. Prometheus dashboard
You can also verify that Prometheus is serving metrics about itself by navigating to its metrics endpoint:
http://<public-agent-ip>:9092/metrics
Using the Expression browser
Go back to the console view, and enter this into the expression console:
prometheus_target_interval_length_seconds
This should return a number of different time series (along with the latest value recorded for each), all with the metric name prometheus_target_interval_length_seconds
.
As another example, enter the following expression to graph the per-second rate of chunks being created in the self-scraped Prometheus:
rate(prometheus_tsdb_head_chunks_created_total[1m])
Figure 3. Prometheus graphing
Using Grafana with Prometheus
Navigate to the following URL using the credentials admin/admin
.
http://<public-agent-ip>:9094
This takes you to the Grafana console.
Figure 4. Grafana console.
You can add Prometheus as a data source:
The default installation URL is http://prometheus-0-server.prometheus.autoip.dcos.thisdcos.directory:1025
Note: your data source will not register without http://
in front of the URL
Figure 5. Grafana data source
Save and Test. Now you are ready to use Prometheus as a data source in Grafana.
Create a Dashboard
To create a graph, select your Prometheus
data source, and enter any Prometheus expression into the “Query” field, while using the “Metric” field to lookup metrics via autocompletion.
The following shows an example Prometheus graph configuration using the variable rate(prometheus_http_request_duration_seconds_count[5m])
Figure 6. Grafana Prometheus graph configuration
AlertManager
The AlertManager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.
AlertManager UI:
http://<public-agent-ip>:9093
Figure 7. AlertManager dashboard
AlertManager with Webhook
The default configuration for AlertManager (these configurations can be changed) in the framework is configured with a Webhook receiver:
route:
group_by: [cluster]
receiver: webh
group_interval: 1m
receivers:
- name: webh
webhook_configs:
- url: http://webhook.marathon.l4lb.thisdcos.directory:1234
Default rule defined in the framework:
groups:
- name: cpurule
rules:
- alert: highcpu
expr: cpu_total > 2
annotations:
DESCRIPTION: 'it happened yeah'
SUMMARY: 'it happened'
Next, run the following config as a Marathon app:
{
"container": {
"docker": {
"image": "python:latest"
},
"type": "MESOS"
},
"mem": 1024,
"portDefinitions": [
{
"labels": {
"VIP_0": "webhook:1234"
},
"protocol": "tcp",
"name": "web",
"port": 1234
}
],
"cmd": "env | sort\n\ncat > function.py << EOF\n\nimport sys\nimport cgi\nimport json\nimport pipes\nfrom BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer\n\n\nclass MyHandler(BaseHTTPRequestHandler):\n def do_POST(self):\n self.send_response(200)\n self.end_headers()\n #data = json.loads(self.rfile.read(int(self.headers['Content-Length'])))\n data = self.rfile.read(int(self.headers['Content-Length']))\n self.log_message('%s', data)\n\n\nhttpd = HTTPServer(('0.0.0.0', $PORT_WEB), MyHandler)\nhttpd.serve_forever()\nEOF\n\npython2 function.py\n",
"networks": [
{
"mode": "host"
}
],
"cpus": 0.1,
"id": "webhook"
}
Check the logs for this app. The AlertManager will send HTTP POST requests in the following JSON format:
{
"receiver": "webh",
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "highcpu",
"cluster_id": "4c7ab85b-ce28-4bdd-8a2d-87c71d02759e",
"hostname": "10.0.1.16",
"instance": "10.0.1.16:61091",
"job": "dcos-metrics",
"mesos_id": "29bac9b2-cbdb-4093-a907-6c4904a1360a-S5"
},
"annotations": {
"DESCRIPTION": "it happened yeah",
"SUMMARY": "it happened"
},
"startsAt": "2018-07-12T17:32:56.030479955Z",
"endsAt": "0001-01-01T00:00:00Z",
"generatorURL": "http://ip-10-0-1-16.us-west-2.compute.internal:1025/graph?g0.expr=cpu_total+%3E+2&g0.tab=1"
},
...
...
...
}
AlertManager with Slack
Slack notifications are sent via Slack webhooks. Update the AlertManager config to:
route:
group_by: [cluster]
# If an alert isn't caught by a route, send it slack.
receiver: slack_general
routes:
# Send severity=slack alerts to slack.
- match:
severity: slack
receiver: slack_general
receivers:
- name: slack_general
slack_configs:
- api_url: <Slack webhook URL>
channel: '#alerts_test'
Figure 8. Slack alerts
PushGateway
The Prometheus PushGateway exists to allow ephemeral and batch jobs to expose their metrics to Prometheus.
Pushing some metrics: Push a single sample into the group identified by {job=“some_job”}:
echo "some_metric 3.14" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job
Figure 9. PushGateway