DC/OS 1.10 or later, and package 2.0.0-x or later
If you are using DC/OS 1.10 or later and the installed service has a version later than 2.0.0-x, then uninstalling the service is simple.
From the DC/OS CLI, enter
dcos package uninstall --app-id=<service-name> hdfs
Uninstall process
Uninstalling the service consists of the following steps:
- The service scheduler is relaunched with the environment variable
SDK_UNINSTALL
set to “true”. This puts the service into an uninstall mode. - The service scheduler performs the uninstall with the following actions:
- All running tasks for the service are terminated.
- All reserved resources are unreserved by the scheduler.
- Once all known resources have been unreserved, the scheduler’s persistent state is deleted.
- The cluster automatically removes the scheduler task once it advertises the completion of the uninstall process.
Debugging an uninstall
There are certain situations under which the automated uninstall process cannot complete without manual intervention.
Zombie service scheduler
If the service scheduler reports in its stdout
that the uninstall is complete, but the scheduler is not removed from the system, you can safely delete it via Marathon.
Missing DC/OS agent
The service will not complete the uninstall if it is not able to unreserve all the resources it has previously reserved. A common cause of this is a DC/OS agent for which the service had reserved resources, which is no longer part of the DC/OS cluster.
-
If the agent is only temporarily gone due to maintenance, the service will unreserve the resources when the agent returns to the cluster.
-
If the agent has been permanently removed, then the uninstall will hang indefinitely. The easiest way to prevent this, is to perform
pod replace
on any pod deployed to the lost agent before initiating the uninstall. -
If the missing agent is only discovered after the uninstall is initiated, then the following process can be used to allow the uninstall to complete.
First determine what resources the service is waiting on.
- CLI:
dcos hdfs --name=hdfs plan show deploy
(after runningdcos package install --cli hdfs
if needed) - HTTP:
https://yourcluster.com/service/hdfs/v1/plans/deploy
dcos hdfs --name=hdfs plan show deploy
deploy (IN_PROGRESS)
├─ kill-tasks (COMPLETE)
│ ├─ kill-task-node-0-server__1a4114bc-48bb-47f6-be99-1b5ca6d55c4e (COMPLETE)
│ ├─ kill-task-node-1-server__0c42118e-04fd-40e1-b49d-0d3f71d2d243 (COMPLETE)
│ └─ kill-task-node-2-server__e00cad38-f27f-4332-b1df-5118ca480d50 (COMPLETE)
├─ unreserve-resources (IN_PROGRESS)
│ ├─ unreserve-f41351a2-b478-4e13-a94c-705f530989ef (COMPLETE)
│ ├─ unreserve-48f64612-8427-4cde-86f4-4edeb9efff37 (COMPLETE)
│ ├─ unreserve-402d51f5-6014-4ca3-bd13-324dae62b888 (PENDING)
│ ├─ unreserve-cb95e869-277f-48b9-954f-08c0d7a26bcf (PENDING)
│ ├─ unreserve-cbd748d0-df7b-4d01-b0b7-6acf915d8f98 (COMPLETE)
│ ├─ unreserve-00ed63d6-427c-4492-9713-772390cc5241 (COMPLETE)
│ ├─ unreserve-5dd56b1d-4522-4bbd-88b5-de9fa0f181f2 (PENDING)
│ └─ unreserve-c9915f07-f446-4e14-a6b4-12c8dd2f914b (COMPLETE)
└─ deregister-service (PENDING)
└─ deregister (PENDING)
As we can see above, some of the resources to unreserve are stuck in a PENDING
state. We can force them into a COMPLETE
state, and thereby allow the scheduler to finish the uninstall operation. This may be done using either of the following methods:
- CLI:
dcos hdfs --name=hdfs plan show deploy
- HTTP:
https://yourcluster.com/service/hdfs/v1/plans/deploy/forceComplete?phase=unreserve-resources&step=unreserve-<UUID>
dcos hdfs --name=hdfs plan force-complete deploy unreserve-resources unreserve-402d51f5-6014-4ca3-bd13-324dae62b888
dcos hdfs --name=hdfs plan force-complete deploy unreserve-resources unreserve-cb95e869-277f-48b9-954f-08c0d7a26bcf
dcos hdfs --name=hdfs plan force-complete deploy unreserve-resources unreserve-5dd56b1d-4522-4bbd-88b5-de9fa0f181f2
At this point the scheduler should show a COMPLETE
state for these steps in the plan, allowing it to proceed normally with the uninstall operation:
dcos hdfs --name=hdfs plan show deploy
deploy (IN_PROGRESS)
├─ kill-tasks (COMPLETE)
│ ├─ kill-task-node-0-server__1a4114bc-48bb-47f6-be99-1b5ca6d55c4e (COMPLETE)
│ ├─ kill-task-node-1-server__0c42118e-04fd-40e1-b49d-0d3f71d2d243 (COMPLETE)
│ └─ kill-task-node-2-server__e00cad38-f27f-4332-b1df-5118ca480d50 (COMPLETE)
├─ unreserve-resources (COMPLETE)
│ ├─ unreserve-f41351a2-b478-4e13-a94c-705f530989ef (COMPLETE)
│ ├─ unreserve-48f64612-8427-4cde-86f4-4edeb9efff37 (COMPLETE)
│ ├─ unreserve-402d51f5-6014-4ca3-bd13-324dae62b888 (COMPLETE)
│ ├─ unreserve-cb95e869-277f-48b9-954f-08c0d7a26bcf (COMPLETE)
│ ├─ unreserve-cbd748d0-df7b-4d01-b0b7-6acf915d8f98 (COMPLETE)
│ ├─ unreserve-00ed63d6-427c-4492-9713-772390cc5241 (COMPLETE)
│ ├─ unreserve-5dd56b1d-4522-4bbd-88b5-de9fa0f181f2 (COMPLETE)
│ └─ unreserve-c9915f07-f446-4e14-a6b4-12c8dd2f914b (COMPLETE)
└─ deregister-service (PENDING)
└─ deregister (PENDING)
Manual uninstall
If it is not possible to recover in other ways, then the service can be manually uninstalled.
- Delete the uninstalling scheduler from Marathon.
- Unregister the service from Mesos using its UUID as follows:
dcos service --inactive | grep hdfs hdfs False 3 3.3 6240.0 15768.0 <UUID> dcos service shutdown <UUID>
- Run
janitor.py
to clean up the remaining reserved resources, as described in the DC/OS 1.9 instructions below.
DC/OS 1.9 and earlier, or packages earlier than 2.0.0-x
If you are running DC/OS 1.9 or earlier, or a version of the Apache HDFS service that is earlier than 2.0.0-x, follow these steps:
- Stop the service. From the DC/OS CLI, enter
dcos package uninstall --app-id=<service-name> hdfs
. - Clean up remaining reserved resources with the framework cleaner script,
janitor.py
.
MY_SERVICE_NAME=<service-name>
dcos package uninstall --app-id=$MY_SERVICE_NAME hdfs`.
dcos node ssh --master-proxy --leader "docker run mesosphere/janitor /janitor.py \
-r $MY_SERVICE_NAME-role \
-p $MY_SERVICE_NAME-principal \
-z dcos-service-$MY_SERVICE_NAME"
If you use a strict-mode clusters should also include either of the following credentials as janitor.py
arguments:
-a <token>
: Token to be used for authentication, such as the one printed by runningdcos config show core.dcos_acs_token
in the DC/OS CLI-u <username> -p <password>
: Username/password for a user which may be used to clean up the resources in question.
See DC/OS documentation for more information about the framework cleaner script.