This section describes various operations tasks you may need. DC/OS NiFi allows you to
- Update your configuration after launch
- Update your placement constraints
- Add, replace, restart or resize a node
- Back up your application
- Use the DC/OS NiFi Administration Toolkit
- User metrics to troubleshoot your nodes
Updating Configuration
You can make changes to the service after it has been launched. Configuration management is handled by the Scheduler process, which in turn handles DC/OS NiFi deployment itself.
After making a change, the Scheduler will be restarted, and it will automatically deploy any detected changes to the service, one node at a time. For example, a given change will first be applied to nifi-0
, then nifi-1
, and so on.
Nodes are configured with a “Readiness check” to ensure that the underlying service appears to be in a healthy state before continuing with applying a given change to the next node in the sequence.
Some changes, such as decreasing the number of nodes or changing volume requirements, are not supported after initial deployment. See Limitations.
The instructions below describe how to update the configuration for a running DC/OS service.
DC/OS Enterprise 1.10 and later
DC/OS Enterprise 1.10 introduces a convenient command line option that allows for easier updates to a service’s configuration, as well as allowing users to inspect the status of an update, to pause and resume updates, and to restart or complete steps if necessary.
Prerequisites
- DC/OS Enterprise 1.10 or later.
- Service with 1.5.0 version or later.
- The DC/OS CLI installed and available.
- The service’s subcommand available and installed on your local machine.
- You can install just the subcommand CLI by running
dcos package install --cli --yes nifi
- If you are running an older version of the subcommand CLI that doesn’t have the
update
command, uninstall and reinstall your CLI.dcos package uninstall --cli nifi dcos package install --cli nifi
- You can install just the subcommand CLI by running
Preparing configuration
If you installed this service with DC/OS Enterprise 1.10 or later, you can fetch the full configuration of a service (including any default values that were applied during installation). For example:
dcos nifi describe > options.json
Make any configuration changes to the options.json
file.
If you installed this service with a prior version of DC/OS, this configuration will not have been persisted by the DC/OS package manager. You can instead use the options.json
file that was used when installing the service.
options.json
(optional)
Recreating If the options.json
from the last service installation or update is not available, you will need to manually recreate it using the following steps.
First, we’ll fetch the default application’s environment, current application’s environment, and the actual nifi that maps config values to the environment:
- Ensure you have jq installed.
- Set the service name that you’re using, for example:
SERVICE_NAME=nifi
- Get the version of the package that is currently installed:
PACKAGE_VERSION=$(dcos package list | grep $SERVICE_NAME | awk '{print $2}')
- Then fetch and save the environment variables that have been set for the service:
dcos marathon app show $SERVICE_NAME | jq .env > current_env.json
- To identify customized values, we’ll get the default environment variables for this version of the service:
dcos package describe --package-version=$PACKAGE_VERSION --render --app $SERVICE_NAME | jq .env > default_env.json
- We’ll also get the entire application nifi:
dcos package describe $SERVICE_NAME --app > marathon.json.mustache
Now that you have these files, we’ll attempt to recreate the options.json
.
- Use JQ and
diff
to compare the two:diff <(jq -S . default_env.json) <(jq -S . current_env.json)
- Now compare these values to the values contained in the
env
section in application nifi:less marathon.json.mustache
- Use the variable names (e.g. {{service.name}} ) to create a new
options.json
file as described in Initial service configuration.
Starting the update
When you are ready to begin, initiate an update using the DC/OS CLI, passing in the updated options.json
file:
dcos nifi update start --options=options.json
You will receive an acknowledgement message and the DC/OS package manager will restart the Scheduler in Marathon.
See Advanced update actions for commands you can use to inspect and manipulate an update after it has started.
To see a full listing of available options, run
dcos package describe --config nifi
in the CLI, or browse the DC/OS nifi Service install dialog in the DC/OS Dashboard.
Updating Placement Constraints
Placement constraints may be updated after initial deployment using the following procedure. See Service Settings above for more information on placement constraints.
Let’s say we have the following deployment of our nodes
-
Placement constraint of:
hostname:LIKE:10.0.10.3|10.0.10.8|10.0.10.26|10.0.10.28|10.0.10.84
-
Tasks:
10.0.10.3: nifi-0 10.0.10.8: nifi-1 10.0.10.26: nifi-2 10.0.10.28: empty 10.0.10.84: empty
10.0.10.8
is being decommissioned and we should move away from it.
-
Remove the decommissioned IP and add a new IP to the placement rule whitelist by editing
placement_constraint
:hostname:LIKE:10.0.10.3|10.0.10.26|10.0.10.28|10.0.10.84|10.0.10.123
-
Redeploy
_NODEPOD_-1
from the decommissioned node to somewhere within the new whitelist:dcos nifi pod replace _NODEPOD_-1
-
Wait for
_NODEPOD_-1
to be up and healthy before continuing with any other replacement operations.
The placement constraints can be modified by configuring the “placement constraint” section of the config.json file:
"placement_constraint": {
"type": "string",
"title": "Placement Constraint",
"description": "Marathon-style placement constraint for nodes. Example: [[\"hostname\", \"UNIQUE\"]]",
"default": "[[\"hostname\", \"UNIQUE\"]]",
"media": {
"type": "application/x-zone-constraints+json"
}
Managing nodes
Adding a Node
The service deploys two nodes by default. You can customize this value at initial deployment or after the cluster is already running. Shrinking the cluster is not supported.
Modify the COUNT "node":{"count":3}
environment variable to update the node count. If you decrease this value, the Scheduler will prevent the configuration change until it is reverted back to its original value or larger.
Resizing a Node
The CPU and memory requirements of each node can be increased or decreased as follows:
- CPU:
"node":{
"cpus":<CPU Value>
}
- Memory (in MB):
"node":{
"mem":4096
}
Restarting a Node
This operation will restart a node, while keeping it at its current location and with its current persistent volume data. This may be thought of as similar to restarting a system process, but it also deletes any data that is not on a persistent volume.
Run
dcos nifi pod restart nifi-<NUM>`, e.g. `nifi_-2
Replacing a Node
This operation will move a node to a new agent and will discard the persistent volumes at the prior system to be rebuilt at the new system. Perform this operation if a given system is about to be offlined or has already been offlined.
-
Run
dcos nifi pod replace nifi-<NUM>
, e.g.nifi_-2
to halt the current instance with id<NUM>
(if still running) and launch a new instance on a different agent. For example, let’s saynifi-2
's host system has died andnifi-2
needs to be moved. -
Now that the node has been decommissioned (if needed by your service) start
nifi-2
at a new location in the cluster.dcos nifi pod replace nifi-2
Advanced update actions
The following sections describe advanced commands that be used to interact with an update in progress.
Monitoring the update
Once the Scheduler has been restarted, it will begin a new deployment plan as individual pods are restarted with the new configuration.
You can query the status of the update as follows:
dcos nifi update status
If the Scheduler is still restarting, DC/OS will not be able to route to it and this command will return an error message. Wait a short while and try again. You can also go to the Services tab of the DC/OS UI to check the status of the restart.
Pause
To pause an ongoing update, issue a pause
command:
dcos nifi update pause
You will receive an error message if the plan has already completed or has been paused. Once completed, the plan will enter the WAITING state.
Resume
If a plan is in a WAITING state, as a result of being paused or reaching a breakpoint that requires manual operator verification, you can use the resume
command to continue the plan:
dcos nifi update resume
You will receive an error message if you attempt to resume a plan that is already in progress or has already completed.
Force-Complete
In order to manually “complete” a step (such that the Scheduler stops attempting to launch a task), you can issue a force-complete
command. This will instruct to Scheduler to mark a specific step within a phase as complete. You need to specify both the phase and the step, for example:
dcos nifi update force-complete service-phase service-0:[node]
Force-Restart
Similarly to force-complete
, you can also force a restart. This can either be done for an entire plan, a phase, or just for a specific step.
To restart the entire plan:
dcos nifi update force-restart
Or for all steps in a single phase:
dcos nifi update force-restart service-phase
Or for a specific step within a specific phase:
dcos nifi update force-restart service-phase service-0:[node]
Disaster recovery
Backing up
The DC/OS NiFi framework allows you to back up your DC/OS NiFi application to Amazon S3. The following information/values are required for backup.
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_REGION
- S3_BUCKET_NAME
To enable backup, trigger the backup-S3 plan with the following plan parameters:
{
'AWS_ACCESS_KEY_ID': key_id,
'AWS_SECRET_ACCESS_KEY': aws_secret_access_key,
'AWS_REGION': 'us-east-1',
'S3_BUCKET_NAME': bucket_name
}
This plan can be executed with the following command:
{
dcos nifi --name=<service_name> plan start <plan_name> -p <plan_parameters>
}
or with a command, including plan parameters itself:
{
dcos nifi --name=<SERVICE_NAME> plan start backup-s3 \
-p AWS_ACCESS_KEY_ID=<ACCESS_KEY> \
-p AWS_SECRET_ACCESS_KEY=<SECRET_ACCESS_KEY> \
-p AWS_REGION=<REGION> \
-p S3_BUCKET_NAME=<BUCKET_NAME>
}
Once this plan execution is completed, the backup will be uploaded to S3. The DC/OS NiFi backup is taken using the DC/OS NiFi toolkit. The DC/OS NiFi backup will be done using three sidecar tasks:
-
Backup - Back up to local node (ROOT/MOUNT). The Backup task is responsible for backing up the local application to the local node, which may be on the ROOT or Mount Volume.
Figure 1. Backing up to local node
-
Upload_to_S3 - Upload the backup from the local node to S3. This sidecar task takes the backup created in Step 1, from the ROOT/Mount volume, and uploads it to Amazon S3 in the Bucket Name specified.
Figure 2. S3 upload
-
Cleanup - Remove the backup from local node. When Step 2 is complete and the backup has been uploaded to S3, a sidecar task known as Cleanup is triggered. This task cleans up/removes the backup folder from the local Root/Mount volumes.
Figure 3. Cleanup service
DC/OS NiFi Toolkit Commands
The Admin Toolkit contains command line utilities for administrators to support DC/OS NiFi maintenance in standalone and clustered environments. These utilities include:
- Notify — The notification tool allows administrators to send bulletins to the DC/OS NiFi UI using the command line.
- Node Manager — The node manager tool allows administrators to perform a status check on a node as well as to connect, disconnect, or remove nodes that are part of a cluster.
- File Manager — The file manager tool allows administrators to backup, install or restore a DC/OS NiFi installation from backup.
The Administration Toolkit is bundled with the nifi-toolkit
and can be executed with scripts found in the bin
folder. Further documentation is available at DC/OS NiFi Administration Toolkit.
To execute the DC/OS NiFi Administration Toolkit commands, run a dcos task exec
command to a DC/OS NiFi node.
-
Set the JAVA_HOME using the command:
export JAVA_HOME=$(ls -d $MESOS_SANDBOX/jdk*/jre*/) && export JAVA_HOME=${JAVA_HOME%/} && export PATH=$(ls -d $JAVA_HOME/bin):$PATH
-
Run the node manager commands from the
$MESOS_SANDBOX/nifi-toolkit-1.5.0/bin
directory:To connect, disconnect, or remove a node from a cluster:
node-manager.sh -d <NIFI_HOME> –b <nifi bootstrap file path> -o {remove|disconnect|connect|status} [-u {url list}] [-p {proxy name}] [-v]
To show help:
node-manager.sh -h
The following are available options:
-b,--bootstrapConf <arg> Existing Bootstrap Configuration file (required) -d,--nifiInstallDir <arg> nifi Root Folder (required) -h,--help Help Text (optional) -o, --operation <arg> Operations supported: status, connect (cluster), disconnect (cluster), remove (cluster) -p,--proxyDN <arg> Proxy or User DN (required for secured nodes doing connect, disconnect and remove operations) -u,--clusterUrls <arg> Comma delimited list of active urls for cluster (optional). Not required for disconnecting a node yet will be needed when connecting or removing from a cluster -v,--verbose Verbose messaging (optional)
Example
To check for dcos
tasks:
dcos task
NAME HOST USER STATE ID MESOS ID
nifi 10.0.0.196 root R nifi.9b11498f-415f-11e8-81a4-e25c6192ea05 1d166af3-8666-4f3e-8add-dcaad139c900-S3
nifi-0-metrics 10.0.0.199 nobody R nifi-0-metrics__958e2af9-c7d0-4cb9-b1fc-08c810b05254 1d166af3-8666-4f3e-8add-dcaad139c900-S1
nifi-0-node 10.0.0.199 nobody R nifi-0-node__68c0d8a0-4c36-4a86-a287-5dc67ce19fde 1d166af3-8666-4f3e-8add-dcaad139c900-S1
nifi-1-metrics 10.0.0.58 nobody R nifi-1-metrics__e58b8f2d-e19f-48f7-b154-6d11e65c54a9 1d166af3-8666-4f3e-8add-dcaad139c900-S5
nifi-1-node 10.0.0.58 nobody R nifi-1-node__1a3d71c6-3c23-4a96-bba3-859de2c0615d 1d166af3-8666-4f3e-8add-dcaad139c900-S5
To enter into a dcos
node
dcos task exec -ti nifi-0-node__68c0d8a0-4c36-4a86-a287-5dc67ce19fde bash
To set the Java Path
export JAVA_HOME=$(ls -d $MESOS_SANDBOX/jdk*/jre*/) && export JAVA_HOME=${JAVA_HOME%/} && export PATH=$(ls -d $JAVA_HOME/bin):$PATH
To check for Java Home, run the following command:
echo $JAVA_HOME
This returns the Java home path:
/var/lib/mesos/slave/slaves/1d166af3-8666-4f3e-8add-dcaad139c900-S1/frameworks/1d166af3-8666-4f3e-8add-dcaad139c900-0003/executors/nifi__78b829b7-3963-4083-b33b-4147fcab559f/runs/fb826e37-17e6-4349-b7c4-63060b51ff0a/containers/8bd354e5-a2a6-4185-9454-647b98b9b327/jdk1.8.0_162/jre
Example of Backup Command through Toolkit
sh $MESOS_SANDBOX/nifi-toolkit-${NIFI_VERSION}/bin/file-manager.sh -o backup -b nifi-backup -c $MESOS_SANDBOX/../../tasks/nifi-$POD_INSTANCE_INDEX-node*/nifi- -v;
Metrics
To check the metrics for the DC/OS NiFi instances on individual agent nodes, we need to do the following:
-
In the first step we need to obtain the
dcos auth
token by issuing the following command:dcos config show core.dcos_acs_token
Keep a copy of this token for later use.
-
In the next step we need to ssh into the private agent on which we have the tasks running:
dcos node ssh --master-proxy --mesos-id=<agent-mesos-id>
-
Finally we need to make the following curl requests as per the security settings:
TLS and KDC Mode:
curl -k -H "Authorization: token=<acs_token>" https://localhost:61002/system/v1/metrics/v0/containers | jq
Non TLS and KDC Mode:
curl -k -H "Authorization: token=<acs_token>" http://localhost:61001/system/v1/metrics/v0/containers | jq
More details about Metrics can be found here.