• United States+1
  • United Kingdom+44
  • Afghanistan (‫افغانستان‬‎)+93
  • Albania (Shqipëri)+355
  • Algeria (‫الجزائر‬‎)+213
  • American Samoa+1684
  • Andorra+376
  • Angola+244
  • Anguilla+1264
  • Antigua and Barbuda+1268
  • Argentina+54
  • Armenia (Հայաստան)+374
  • Aruba+297
  • Australia+61
  • Austria (Österreich)+43
  • Azerbaijan (Azərbaycan)+994
  • Bahamas+1242
  • Bahrain (‫البحرين‬‎)+973
  • Bangladesh (বাংলাদেশ)+880
  • Barbados+1246
  • Belarus (Беларусь)+375
  • Belgium (België)+32
  • Belize+501
  • Benin (Bénin)+229
  • Bermuda+1441
  • Bhutan (འབྲུག)+975
  • Bolivia+591
  • Bosnia and Herzegovina (Босна и Херцеговина)+387
  • Botswana+267
  • Brazil (Brasil)+55
  • British Indian Ocean Territory+246
  • British Virgin Islands+1284
  • Brunei+673
  • Bulgaria (България)+359
  • Burkina Faso+226
  • Burundi (Uburundi)+257
  • Cambodia (កម្ពុជា)+855
  • Cameroon (Cameroun)+237
  • Canada+1
  • Cape Verde (Kabu Verdi)+238
  • Caribbean Netherlands+599
  • Cayman Islands+1345
  • Central African Republic (République centrafricaine)+236
  • Chad (Tchad)+235
  • Chile+56
  • China (中国)+86
  • Christmas Island+61
  • Cocos (Keeling) Islands+61
  • Colombia+57
  • Comoros (‫جزر القمر‬‎)+269
  • Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)+243
  • Congo (Republic) (Congo-Brazzaville)+242
  • Cook Islands+682
  • Costa Rica+506
  • Côte d’Ivoire+225
  • Croatia (Hrvatska)+385
  • Cuba+53
  • Curaçao+599
  • Cyprus (Κύπρος)+357
  • Czech Republic (Česká republika)+420
  • Denmark (Danmark)+45
  • Djibouti+253
  • Dominica+1767
  • Dominican Republic (República Dominicana)+1
  • Ecuador+593
  • Egypt (‫مصر‬‎)+20
  • El Salvador+503
  • Equatorial Guinea (Guinea Ecuatorial)+240
  • Eritrea+291
  • Estonia (Eesti)+372
  • Ethiopia+251
  • Falkland Islands (Islas Malvinas)+500
  • Faroe Islands (Føroyar)+298
  • Fiji+679
  • Finland (Suomi)+358
  • France+33
  • French Guiana (Guyane française)+594
  • French Polynesia (Polynésie française)+689
  • Gabon+241
  • Gambia+220
  • Georgia (საქართველო)+995
  • Germany (Deutschland)+49
  • Ghana (Gaana)+233
  • Gibraltar+350
  • Greece (Ελλάδα)+30
  • Greenland (Kalaallit Nunaat)+299
  • Grenada+1473
  • Guadeloupe+590
  • Guam+1671
  • Guatemala+502
  • Guernsey+44
  • Guinea (Guinée)+224
  • Guinea-Bissau (Guiné Bissau)+245
  • Guyana+592
  • Haiti+509
  • Honduras+504
  • Hong Kong (香港)+852
  • Hungary (Magyarország)+36
  • Iceland (Ísland)+354
  • India (भारत)+91
  • Indonesia+62
  • Iran (‫ایران‬‎)+98
  • Iraq (‫العراق‬‎)+964
  • Ireland+353
  • Isle of Man+44
  • Israel (‫ישראל‬‎)+972
  • Italy (Italia)+39
  • Jamaica+1876
  • Japan (日本)+81
  • Jersey+44
  • Jordan (‫الأردن‬‎)+962
  • Kazakhstan (Казахстан)+7
  • Kenya+254
  • Kiribati+686
  • Kosovo+383
  • Kuwait (‫الكويت‬‎)+965
  • Kyrgyzstan (Кыргызстан)+996
  • Laos (ລາວ)+856
  • Latvia (Latvija)+371
  • Lebanon (‫لبنان‬‎)+961
  • Lesotho+266
  • Liberia+231
  • Libya (‫ليبيا‬‎)+218
  • Liechtenstein+423
  • Lithuania (Lietuva)+370
  • Luxembourg+352
  • Macau (澳門)+853
  • Macedonia (FYROM) (Македонија)+389
  • Madagascar (Madagasikara)+261
  • Malawi+265
  • Malaysia+60
  • Maldives+960
  • Mali+223
  • Malta+356
  • Marshall Islands+692
  • Martinique+596
  • Mauritania (‫موريتانيا‬‎)+222
  • Mauritius (Moris)+230
  • Mayotte+262
  • Mexico (México)+52
  • Micronesia+691
  • Moldova (Republica Moldova)+373
  • Monaco+377
  • Mongolia (Монгол)+976
  • Montenegro (Crna Gora)+382
  • Montserrat+1664
  • Morocco (‫المغرب‬‎)+212
  • Mozambique (Moçambique)+258
  • Myanmar (Burma) (မြန်မာ)+95
  • Namibia (Namibië)+264
  • Nauru+674
  • Nepal (नेपाल)+977
  • Netherlands (Nederland)+31
  • New Caledonia (Nouvelle-Calédonie)+687
  • New Zealand+64
  • Nicaragua+505
  • Niger (Nijar)+227
  • Nigeria+234
  • Niue+683
  • Norfolk Island+672
  • North Korea (조선 민주주의 인민 공화국)+850
  • Northern Mariana Islands+1670
  • Norway (Norge)+47
  • Oman (‫عُمان‬‎)+968
  • Pakistan (‫پاکستان‬‎)+92
  • Palau+680
  • Palestine (‫فلسطين‬‎)+970
  • Panama (Panamá)+507
  • Papua New Guinea+675
  • Paraguay+595
  • Peru (Perú)+51
  • Philippines+63
  • Poland (Polska)+48
  • Portugal+351
  • Puerto Rico+1
  • Qatar (‫قطر‬‎)+974
  • Réunion (La Réunion)+262
  • Romania (România)+40
  • Russia (Россия)+7
  • Rwanda+250
  • Saint Barthélemy (Saint-Barthélemy)+590
  • Saint Helena+290
  • Saint Kitts and Nevis+1869
  • Saint Lucia+1758
  • Saint Martin (Saint-Martin (partie française))+590
  • Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)+508
  • Saint Vincent and the Grenadines+1784
  • Samoa+685
  • San Marino+378
  • São Tomé and Príncipe (São Tomé e Príncipe)+239
  • Saudi Arabia (‫المملكة العربية السعودية‬‎)+966
  • Senegal (Sénégal)+221
  • Serbia (Србија)+381
  • Seychelles+248
  • Sierra Leone+232
  • Singapore+65
  • Sint Maarten+1721
  • Slovakia (Slovensko)+421
  • Slovenia (Slovenija)+386
  • Solomon Islands+677
  • Somalia (Soomaaliya)+252
  • South Africa+27
  • South Korea (대한민국)+82
  • South Sudan (‫جنوب السودان‬‎)+211
  • Spain (España)+34
  • Sri Lanka (ශ්‍රී ලංකාව)+94
  • Sudan (‫السودان‬‎)+249
  • Suriname+597
  • Svalbard and Jan Mayen+47
  • Swaziland+268
  • Sweden (Sverige)+46
  • Switzerland (Schweiz)+41
  • Syria (‫سوريا‬‎)+963
  • Taiwan (台灣)+886
  • Tajikistan+992
  • Tanzania+255
  • Thailand (ไทย)+66
  • Timor-Leste+670
  • Togo+228
  • Tokelau+690
  • Tonga+676
  • Trinidad and Tobago+1868
  • Tunisia (‫تونس‬‎)+216
  • Turkey (Türkiye)+90
  • Turkmenistan+993
  • Turks and Caicos Islands+1649
  • Tuvalu+688
  • U.S. Virgin Islands+1340
  • Uganda+256
  • Ukraine (Україна)+380
  • United Arab Emirates (‫الإمارات العربية المتحدة‬‎)+971
  • United Kingdom+44
  • United States+1
  • Uruguay+598
  • Uzbekistan (Oʻzbekiston)+998
  • Vanuatu+678
  • Vatican City (Città del Vaticano)+39
  • Venezuela+58
  • Vietnam (Việt Nam)+84
  • Wallis and Futuna+681
  • Western Sahara (‫الصحراء الغربية‬‎)+212
  • Yemen (‫اليمن‬‎)+967
  • Zambia+260
  • Zimbabwe+263
  • Åland Islands+358
Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

Mesosphere DC/OS Masterclass : Tips and Tricks to Make Life Easier

DC/OS is an open-source operating system and distributed system for data center built on Apache Mesos distributed system kernel. As a distributed system, it is a cluster of master nodes and private/public nodes, where each node also has host operating system which manages the underlying machine. 

It enables the management of multiple machines as if they were a single computer. It automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate remote management and monitoring of the cluster and its services.

  • Distributed System DC/OS is distributed system with group of private and public nodes which are coordinated by master nodes.
  • Cluster Manager : DC/OS  is responsible for running tasks on agent nodes and providing required resources to them. DC/OS uses Apache Mesos to provide cluster management functionality.
  • Container Platform : All DC/OS tasks are containerized. DC/OS uses two different container runtimes, i.e. docker and mesos. So that containers can be started from docker images or they can be native executables (binaries or scripts) which are containerized at runtime by mesos.
  • Operating System :  As name specifies, DC/OS is an operating system which abstracts cluster h/w and s/w resources and provide common services to applications.

Unlike Linux, DC/OS is not a host operating system. DC/OS spans multiple machines, but relies on each machine to have its own host operating system and host kernel.

The high level architecture of DC/OS can be seen below :

DC/OS Architecture Layers

For the detailed architecture and components of DC/OS, please click here.

Adoption and usage of Mesosphere DC/OS:

Mesosphere customers include :

  • 30% of the Fortune 50 U.S. Companies
  • 5 of the top 10 North American Banks
  • 7 of the top 12 Worldwide Telcos
  • 5 of the top 10 Highest Valued Startups

Some companies using DC/OS are :

  • Cisco
  • Yelp
  • Tommy Hilfiger
  • Uber
  • Netflix
  • Verizon
  • Cerner
  • NIO

Installing and using DC/OS

A guide to installing DC/OS can be found here. After installing DC/OS on any platform, install dcos cli by following documentation found here.

Using dcos cli, we can manager cluster nodes, manage marathon tasks and services, install/remove packages from universe and it provides great support for automation process as each cli command can be output to json.

NOTE: The tasks below are executed with and tested on below tools:

  • DC/OS 1.11 Open Source
  • DC/OS cli 0.6.0
  • jq:1.5-1-a5b5cbe

DC/OS commands and scripts

Setup DC/OS cli with DC/OS cluster

dcos cluster setup <CLUSTER URL>
view raw cluster-setup hosted with ❤ by GitHub

Example :

dcos cluster setup http://dcos-cluster.com
view raw setup_cluster hosted with ❤ by GitHub

The above command will give you the link for oauth authentication and prompt for auth token. You can authenticate yourself with any of Google, Github or Microsoft account. Paste the token generated after authentication to cli prompt. (Provided oauth is enabled).

DC/OS authentication token

docs config show core.dcos_acs_token

DC/OS cluster url

dcos config show core.dcos_url
view raw cluster_show hosted with ❤ by GitHub

DC/OS cluster name

dcos config show cluster.name
view raw config_show.js hosted with ❤ by GitHub

Access Mesos UI

<DC/OS_CLUSTER_URL>/mesos

Example:

http://dcos-cluster.com/mesos
view raw cluster.js hosted with ❤ by GitHub

Access Marathon UI

<DC/OS_CLUSTER_URL>/service/marathon
view raw marathon.js hosted with ❤ by GitHub

Example:

http://dcos-cluster.com/service/marathon

Access any DC/OS service, like Marathon, Kafka, Elastic, Spark etc.[DC/OS Services]

<DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

Example:

http://dcos-cluster.com/service/marathon
http://dcos-cluster.com/service/kafka
view raw dcos_service.js hosted with ❤ by GitHub

Access DC/OS slaves info in json using Mesos API [Mesos Endpoints]

curl -H "Authorization: Bearer $(dcos config show
core.dcos_acs_token)" $(dcos config show
core.dcos_url)/mesos/slaves | jq
view raw dcos_slaves.sh hosted with ❤ by GitHub

Access DC/OS slaves info in json using DC/OS cli

dcos node --json
view raw dcos_cli.js hosted with ❤ by GitHub

Note : DC/OS cli ‘dcos node --json’ is equivalent to running mesos slaves endpoint (/mesos/slaves)

Access DC/OS private slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

Access DC/OS public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r
view raw public_slaves hosted with ❤ by GitHub

Access DC/OS private and public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort
view raw dcos_info.js hosted with ❤ by GitHub

Get public IP of all public agents

#!/bin/bash
for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id');
do
dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
done 2>/dev/null

Note: As ‘dcos node ssh’ requires private key to be added to ssh. Make sure you add your private key as ssh identity using :

ssh-add </path/to/private/key/file/.pem>
view raw dcos_node.js hosted with ❤ by GitHub

Get public IP of master leader

dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null
view raw master.js hosted with ❤ by GitHub

Get all master nodes and their private ip

dcos node --json | jq '.[] | select(.type | contains("master"))
| .ip + " = " + .type' -r
view raw master_node.js hosted with ❤ by GitHub

Get list of all users who have access to DC/OS cluster

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
"$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r
view raw users.sh hosted with ❤ by GitHub

Add users to cluster using Mesosphere script (Run this on master)

Users to add are given in list.txt, each user on new line

for i in `cat list.txt`; do echo $i;
sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

Add users to cluster using DC/OS API

#!/bin/bash
# Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
for i in `cat users.list`;
do
echo $i
curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done
view raw dcosAddUsers.sh hosted with ❤ by GitHub

Delete users from DC/OS cluster organization

#!/bin/bash
# Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>
for i in `cat users.list`;
do
echo $i
curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

Offers/resources from individual DC/OS agent

In recent versions of the many dcos services, a scheduler endpoint at                

http://yourcluster.com/service/<service-name>/v1/debug/offers
view raw dcos_agent.js hosted with ❤ by GitHub

will display an HTML table containing a summary of recently-evaluated offers. This table’s contents are currently very similar to what can be found in logs, but in a slightly more accessible format. Alternately, we can look at the scheduler’s logs in stdout. An offer is a set of resources all from one individual DC/OS agent.

<DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers
view raw dcos_cluster hosted with ❤ by GitHub

Example:

http://dcos-cluster.com/service/kafka/v1/debug/offers
http://dcos-cluster.com/service/elastic/v1/debug/offers
view raw debug_offer hosted with ❤ by GitHub

Save JSON configs of all running Marathon apps

#!/bin/bash
# Save marathon configs in json format for all marathon apps
# Usage : saveMarathonConfig.sh
for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
done

Get report of Marathon apps with details like container type, Docker image, tag or service version used by Marathon app.

#!/bin/bash
TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
#dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION else "[ CMD ]" end end' -r > $TMP_CSV_FILE
sed -i "s|^/||g" $TMP_CSV_FILE
sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
cnt=1
printf '%.0s=' {1..150}
printf "\n %-5s%-35s%-23s%-40s%-20s\n" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
printf '%.0s=' {1..150}
while IFS=, read -r app typ image;
do
tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
image=`echo $image | awk -F':' '{print $1}'`
printf "\n %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
cnt=$((cnt + 1))
sleep 0.3
done < $TMP_CSV_FILE_SORT
printf "\n"
printf '%.0s=' {1..150}
printf "\n"

Get DC/OS nodes with more information like node type, node ip, attributes, number of running tasks, free memory, free cpu etc.

#!/bin/bash
printf "\n %-15s %-18s%-18s%-10s%-15s%-10s\n" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
printf '%.0s=' {1..90}
printf "\n"
TAB=`echo -e "\t"`
dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "\t"+ if(.type |contains("master")) then .ip else .hostname end + "\t" + (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "\t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "\t" + ((.resources.mem - .used_resources.mem)| tostring) + "\t\t" + ((.resources.cpus - .used_resources.cpus)| tostring) else "\t\tNA\tNA\t\tNA" end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
printf '%.0s=' {1..90}
printf "\n"
view raw getDcosNodes.sh hosted with ❤ by GitHub

Framework Cleaner

Uninstall framework and clean reserved resources if any after framework is deleted/uninstalled. (applicable if running DC/OS 1.9 or older, if higher than 1.10, then only uninstall cli is sufficient)

SERVICE_NAME=
dcos package uninstall $SERVICE_NAME
dcos node ssh --option StrictHostKeyChecking=no --master-proxy
--leader "docker run mesosphere/janitor /janitor.py -r
${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"
view raw framework.js hosted with ❤ by GitHub

Get DC/OS apps and their placement constraints

dcos marathon app list --json | jq '.[] |
if (.constraints != null) then .id, .constraints else empty end'
view raw placement.js hosted with ❤ by GitHub

Run shell command on all slaves

#!/bin/bash
# Run any shell command on all slave nodes (private and public)
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do
echo -e "\n###> Running command [ $CMD ] on $i"
dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
echo -e "======================================\n"
done

Run shell command on master leader

CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
LogLevel=quiet --master-proxy --leader "$CMD"

Run shell command on all master nodes

#!/bin/bash
# Run any shell command on all master nodes
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'`
do
echo -e "\n###> Running command [ $CMD ] on $i"
dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
echo -e "======================================\n"
done

Add node attributes to dcos nodes and run apps on nodes with required attributes using placement constraints

#!/bin/bash
#1. SSH on node
#2. Create or edit file /var/lib/dcos/mesos-slave-common
#3. Add contents as :
# MESOS_ATTRIBUTES=<key>:<value>
# Example:
# MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
#4. Stop dcos-mesos-slave service
# systemctl stop dcos-mesos-slave
#5. Remove link for latest slave metadata
# rm -f /var/lib/mesos/slave/meta/slaves/latest
#6. Start dcos-mesos-slave service
# systemctl start dcos-mesos-slave
#7. Wait for some time, node will be in HEALTHY state again.
#8. Add app placement constraint with field = key and value = value
#9. Verify attributes, run on any node
# curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
# OR Check DCOS cluster UI
# Nodes => Select any Node => Details Tab
tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")
# key:value paired attribues, separated by ;
ATTRIBUTES=NODE_TYPE:GPU_NODE
cat <<EOF > ${tmpScript}
echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
sudo systemctl stop dcos-mesos-slave
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl start dcos-mesos-slave
EOF
# Add the private ip of nodes on which you want to add attrubutes, one ip per line.
for i in `cat nodes.txt`; do
echo $i
dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
sleep 10
done

Install DC/OS Datadog metrics plugin on all DC/OS nodes

#!/bin/bash
# Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>
DDAPI=$1
if [[ -z $DDAPI ]]; then
echo "[Datadog Plugin] Need datadog API key as parameter."
echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
fi
tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
declare agent=$tmpScriptAgent
declare master=$tmpScriptMaster
for role in "agent" "master"
do
cat <<EOF > ${!role}
curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
export DD_API_KEY=$DDAPI
export AGENT_ROLE=$role
sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
sudo systemctl daemon-reload
sudo systemctl start dcos-metrics-datadog.service
echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
#sudo systemctl status dcos-metrics-datadog.service | head -3
#sudo journalctl -u dcos-metrics-datadog
EOF
done
echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`
do
echo -e "\n###> Node - $i"
dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
echo -e "======================================================="
done
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'`
do
echo -e "\n###> Master Node - $i"
dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
echo -e "======================================================="
done
# Check status of dcos-metrics-datadog.service on all nodes.
#for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

Get app / node metrics fetched by dcos-metrics component using metrics API

  • Get DC/OS node id [dcos node]
  • Get Node metrics (CPU, memory, local filesystems, networks, etc) :  <dc os_cluster_url="">/system/v1/agent/<agent_id>/metrics/v0/node</agent_id></dc>
  • Get id of all containers running on that agent : <dc os_cluster_url="">/system/v1/agent/<agent_id>/metrics/v0/containers</agent_id></dc>
  • Get Resource allocation and usage for the given container ID. : <dc os_cluster_url="">/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id></container_id></agent_id></dc>
  • Get Application-level metrics from the container (shipped in StatsD format using the listener available at STATSD_UDP_HOST and STATSD_UDP_PORT) : <dc os_cluster_url="">/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id>/app     </container_id></agent_id></dc>

Get app / node metrics fetched by dcos-metrics component using dcos cli

  • Summary of container metrics for a specific task

dcos task metrics summary <task-id>
view raw summary.js hosted with ❤ by GitHub

  • All metrics in details for a specific task

dcos task metrics details <task-id>
view raw metrics.js hosted with ❤ by GitHub

  • Summary of Node metrics for a specific node

dcos task metrics summary <mesos-node-id>
view raw node_metrics.js hosted with ❤ by GitHub

  • All Node metrics in details for a specific node

dcos node metrics details <mesos-node-id>
view raw mesos_node.js hosted with ❤ by GitHub

NOTE - All above commands have ‘--json’ flag to use them programmatically.  

Launch / run command inside container for a task

DC/OS task exec cli only supports Mesos containers, this script supports both Mesos and Docker containers.

#!/bin/bash
echo "DCOS Task Exec 2.0"
if [ "$#" -eq 0 ]; then
echo "Need task name or id as input. Exiting."
exit 1
fi
taskName=$1
taskCmd=${2:-bash}
TMP_TASKLIST_JSON=/tmp/dcostasklist.json
dcos task --json > $TMP_TASKLIST_JSON
taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
if [[ $taskExist -eq 0 ]]; then
echo "No task with name $taskName exists."
echo "Do you mean ?"
dcos task | grep $taskName | awk '{print $1}'
exit 1
fi
taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
if [[ $taskExist -ne 1 ]]; then
echo -e "More than one instances. Please select task ID for executing command.\n"
#allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
echo ""
read TaskId
fi
if [[ $taskType != "DOCKER" ]]; then
echo "Task [ $taskName ] is of type MESOS Container."
execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
echo "Running [$execCmd]"
$execCmd
else
echo "Task [ $taskName ] is of type DOCKER Container."
taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
taskContID=`echo $taskContID | tr -d '\r'`
echo "Task Docker Container ID : [ $taskContID ]"
echo "Running [ docker exec -it $taskContID $taskCmd ]"
dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
fi
view raw dcosTaskExec.sh hosted with ❤ by GitHub

Get DC/OS tasks by node

#!/bin/bash
function tasksByNodeAPI
{
echo "DC/OS Tasks By Node"
if [ "$#" -eq 0 ]; then
echo "Need node ip as input. Exiting."
exit 1
fi
nodeIp=$1
mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
if [ -z "mesosId" ]; then
echo "No node found with ip $nodeIp. Exiting."
exit 1
fi
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "\t\t\t" + .id' -r
}
function tasksByNodeCLI
{
echo "DC/OS Tasks By Node"
if [ "$#" -eq 0 ]; then
echo "Need node ip as input. Exiting."
exit 1
fi
nodeIp=$1
dcos task | egrep "HOST|$nodeIp"
}

Get cluster metadata - cluster Public IP and cluster ID

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"          
$(dcos config show core.dcos_url)/metadata

Sample Output:

{
"PUBLIC_IPV4": "123.456.789.012",
"CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
}
view raw cluster_output hosted with ❤ by GitHub

Get DC/OS metadata - DC/OS version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq

Sample Output:

{
"version": "1.11.0",
"dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
"bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
}
view raw dcos_output hosted with ❤ by GitHub

Get Mesos version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/version

Sample Output:

{
"build_date": "2018-02-27 21:31:27",
"build_time": 1519767087.0,
"build_user": "",
"git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
"version": "1.5.0"
}
view raw mesos_output hosted with ❤ by GitHub

Access DC/OS cluster exhibitor UI (Exhibitor supervises ZooKeeper and provides a management web interface)

<CLUSTER_URL>/exhibitor

Access DC/OS cluster data from cluster zookeeper using Zookeeper Python client - Run inside any node / container

from kazoo.client import KazooClient
zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
zk.start()
clusterId = ""
# Here we can give znode path to retrieve its decoded data,
# for ex to get cluster-id, use
# data, stat = zk.get("/cluster-id")
# clusterId = data.decode("utf-8")
# Get cluster Id
if zk.exists("/cluster-id"):
data, stat = zk.get("/cluster-id")
clusterId = data.decode("utf-8")
zk.stop()
print (clusterId)

Access dcos cluster data from cluster zookeeper using exhibitor rest API

# Get znode data using endpoint :
# /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
# Example : Get znode data for path = /cluster-id
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id
view raw access_dcos.sh hosted with ❤ by GitHub

Sample Output:

{
"bytes": "3333-XXXXXX",
"str": "abcde-abcde-abcde-abcde-abcde-",
"stat": "XXXXXX"
}
view raw access_output hosted with ❤ by GitHub

Get cluster name using Mesos API

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

Mark Mesos node as decommissioned

Some times instances which are running as DC/OS node gets terminated and can not come back online, like AWS EC2 instances, once terminated due to any reason, can not start back. When Mesos detects that a node has stopped, it puts the node in the UNREACHABLE state because Mesos does not know if the node is temporarily stopped and will come back online, or if it is permanently stopped. In such case, we can explicitly tell Mesos to put a node in the GONE state if we know a node will not come back.

dcos node decommission <mesos-agent-id>
view raw decommission.sh hosted with ❤ by GitHub

Conclusion

We learned about Mesosphere DC/OS, its functionality and roles. We also learned how to setup and use DC/OS cli and use http authentication to access DC/OS APIs as well as using DC/OS cli for automating tasks.

We went through different API endpoints like Mesos, Marathon, DC/OS metrics, exhibitor, DC/OS cluster organization etc. Finally, we looked at different tricks and scripts to automate DC/OS, like DC/OS node details, task exec, Docker report, DC/OS API http authentication etc.

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

Mesosphere DC/OS Masterclass : Tips and Tricks to Make Life Easier

DC/OS is an open-source operating system and distributed system for data center built on Apache Mesos distributed system kernel. As a distributed system, it is a cluster of master nodes and private/public nodes, where each node also has host operating system which manages the underlying machine. 

It enables the management of multiple machines as if they were a single computer. It automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate remote management and monitoring of the cluster and its services.

  • Distributed System DC/OS is distributed system with group of private and public nodes which are coordinated by master nodes.
  • Cluster Manager : DC/OS  is responsible for running tasks on agent nodes and providing required resources to them. DC/OS uses Apache Mesos to provide cluster management functionality.
  • Container Platform : All DC/OS tasks are containerized. DC/OS uses two different container runtimes, i.e. docker and mesos. So that containers can be started from docker images or they can be native executables (binaries or scripts) which are containerized at runtime by mesos.
  • Operating System :  As name specifies, DC/OS is an operating system which abstracts cluster h/w and s/w resources and provide common services to applications.

Unlike Linux, DC/OS is not a host operating system. DC/OS spans multiple machines, but relies on each machine to have its own host operating system and host kernel.

The high level architecture of DC/OS can be seen below :

DC/OS Architecture Layers

For the detailed architecture and components of DC/OS, please click here.

Adoption and usage of Mesosphere DC/OS:

Mesosphere customers include :

  • 30% of the Fortune 50 U.S. Companies
  • 5 of the top 10 North American Banks
  • 7 of the top 12 Worldwide Telcos
  • 5 of the top 10 Highest Valued Startups

Some companies using DC/OS are :

  • Cisco
  • Yelp
  • Tommy Hilfiger
  • Uber
  • Netflix
  • Verizon
  • Cerner
  • NIO

Installing and using DC/OS

A guide to installing DC/OS can be found here. After installing DC/OS on any platform, install dcos cli by following documentation found here.

Using dcos cli, we can manager cluster nodes, manage marathon tasks and services, install/remove packages from universe and it provides great support for automation process as each cli command can be output to json.

NOTE: The tasks below are executed with and tested on below tools:

  • DC/OS 1.11 Open Source
  • DC/OS cli 0.6.0
  • jq:1.5-1-a5b5cbe

DC/OS commands and scripts

Setup DC/OS cli with DC/OS cluster

dcos cluster setup <CLUSTER URL>
view raw cluster-setup hosted with ❤ by GitHub

Example :

dcos cluster setup http://dcos-cluster.com
view raw setup_cluster hosted with ❤ by GitHub

The above command will give you the link for oauth authentication and prompt for auth token. You can authenticate yourself with any of Google, Github or Microsoft account. Paste the token generated after authentication to cli prompt. (Provided oauth is enabled).

DC/OS authentication token

docs config show core.dcos_acs_token

DC/OS cluster url

dcos config show core.dcos_url
view raw cluster_show hosted with ❤ by GitHub

DC/OS cluster name

dcos config show cluster.name
view raw config_show.js hosted with ❤ by GitHub

Access Mesos UI

<DC/OS_CLUSTER_URL>/mesos

Example:

http://dcos-cluster.com/mesos
view raw cluster.js hosted with ❤ by GitHub

Access Marathon UI

<DC/OS_CLUSTER_URL>/service/marathon
view raw marathon.js hosted with ❤ by GitHub

Example:

http://dcos-cluster.com/service/marathon

Access any DC/OS service, like Marathon, Kafka, Elastic, Spark etc.[DC/OS Services]

<DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

Example:

http://dcos-cluster.com/service/marathon
http://dcos-cluster.com/service/kafka
view raw dcos_service.js hosted with ❤ by GitHub

Access DC/OS slaves info in json using Mesos API [Mesos Endpoints]

curl -H "Authorization: Bearer $(dcos config show
core.dcos_acs_token)" $(dcos config show
core.dcos_url)/mesos/slaves | jq
view raw dcos_slaves.sh hosted with ❤ by GitHub

Access DC/OS slaves info in json using DC/OS cli

dcos node --json
view raw dcos_cli.js hosted with ❤ by GitHub

Note : DC/OS cli ‘dcos node --json’ is equivalent to running mesos slaves endpoint (/mesos/slaves)

Access DC/OS private slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

Access DC/OS public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r
view raw public_slaves hosted with ❤ by GitHub

Access DC/OS private and public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort
view raw dcos_info.js hosted with ❤ by GitHub

Get public IP of all public agents

#!/bin/bash
for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id');
do
dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
done 2>/dev/null

Note: As ‘dcos node ssh’ requires private key to be added to ssh. Make sure you add your private key as ssh identity using :

ssh-add </path/to/private/key/file/.pem>
view raw dcos_node.js hosted with ❤ by GitHub

Get public IP of master leader

dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null
view raw master.js hosted with ❤ by GitHub

Get all master nodes and their private ip

dcos node --json | jq '.[] | select(.type | contains("master"))
| .ip + " = " + .type' -r
view raw master_node.js hosted with ❤ by GitHub

Get list of all users who have access to DC/OS cluster

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
"$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r
view raw users.sh hosted with ❤ by GitHub

Add users to cluster using Mesosphere script (Run this on master)

Users to add are given in list.txt, each user on new line

for i in `cat list.txt`; do echo $i;
sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

Add users to cluster using DC/OS API

#!/bin/bash
# Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
for i in `cat users.list`;
do
echo $i
curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done
view raw dcosAddUsers.sh hosted with ❤ by GitHub

Delete users from DC/OS cluster organization

#!/bin/bash
# Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>
for i in `cat users.list`;
do
echo $i
curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

Offers/resources from individual DC/OS agent

In recent versions of the many dcos services, a scheduler endpoint at                

http://yourcluster.com/service/<service-name>/v1/debug/offers
view raw dcos_agent.js hosted with ❤ by GitHub

will display an HTML table containing a summary of recently-evaluated offers. This table’s contents are currently very similar to what can be found in logs, but in a slightly more accessible format. Alternately, we can look at the scheduler’s logs in stdout. An offer is a set of resources all from one individual DC/OS agent.

<DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers
view raw dcos_cluster hosted with ❤ by GitHub

Example:

http://dcos-cluster.com/service/kafka/v1/debug/offers
http://dcos-cluster.com/service/elastic/v1/debug/offers
view raw debug_offer hosted with ❤ by GitHub

Save JSON configs of all running Marathon apps

#!/bin/bash
# Save marathon configs in json format for all marathon apps
# Usage : saveMarathonConfig.sh
for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
done

Get report of Marathon apps with details like container type, Docker image, tag or service version used by Marathon app.

#!/bin/bash
TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
#dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION else "[ CMD ]" end end' -r > $TMP_CSV_FILE
sed -i "s|^/||g" $TMP_CSV_FILE
sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
cnt=1
printf '%.0s=' {1..150}
printf "\n %-5s%-35s%-23s%-40s%-20s\n" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
printf '%.0s=' {1..150}
while IFS=, read -r app typ image;
do
tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
image=`echo $image | awk -F':' '{print $1}'`
printf "\n %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
cnt=$((cnt + 1))
sleep 0.3
done < $TMP_CSV_FILE_SORT
printf "\n"
printf '%.0s=' {1..150}
printf "\n"

Get DC/OS nodes with more information like node type, node ip, attributes, number of running tasks, free memory, free cpu etc.

#!/bin/bash
printf "\n %-15s %-18s%-18s%-10s%-15s%-10s\n" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
printf '%.0s=' {1..90}
printf "\n"
TAB=`echo -e "\t"`
dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "\t"+ if(.type |contains("master")) then .ip else .hostname end + "\t" + (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "\t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "\t" + ((.resources.mem - .used_resources.mem)| tostring) + "\t\t" + ((.resources.cpus - .used_resources.cpus)| tostring) else "\t\tNA\tNA\t\tNA" end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
printf '%.0s=' {1..90}
printf "\n"
view raw getDcosNodes.sh hosted with ❤ by GitHub

Framework Cleaner

Uninstall framework and clean reserved resources if any after framework is deleted/uninstalled. (applicable if running DC/OS 1.9 or older, if higher than 1.10, then only uninstall cli is sufficient)

SERVICE_NAME=
dcos package uninstall $SERVICE_NAME
dcos node ssh --option StrictHostKeyChecking=no --master-proxy
--leader "docker run mesosphere/janitor /janitor.py -r
${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"
view raw framework.js hosted with ❤ by GitHub

Get DC/OS apps and their placement constraints

dcos marathon app list --json | jq '.[] |
if (.constraints != null) then .id, .constraints else empty end'
view raw placement.js hosted with ❤ by GitHub

Run shell command on all slaves

#!/bin/bash
# Run any shell command on all slave nodes (private and public)
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do
echo -e "\n###> Running command [ $CMD ] on $i"
dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
echo -e "======================================\n"
done

Run shell command on master leader

CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
LogLevel=quiet --master-proxy --leader "$CMD"

Run shell command on all master nodes

#!/bin/bash
# Run any shell command on all master nodes
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'`
do
echo -e "\n###> Running command [ $CMD ] on $i"
dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
echo -e "======================================\n"
done

Add node attributes to dcos nodes and run apps on nodes with required attributes using placement constraints

#!/bin/bash
#1. SSH on node
#2. Create or edit file /var/lib/dcos/mesos-slave-common
#3. Add contents as :
# MESOS_ATTRIBUTES=<key>:<value>
# Example:
# MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
#4. Stop dcos-mesos-slave service
# systemctl stop dcos-mesos-slave
#5. Remove link for latest slave metadata
# rm -f /var/lib/mesos/slave/meta/slaves/latest
#6. Start dcos-mesos-slave service
# systemctl start dcos-mesos-slave
#7. Wait for some time, node will be in HEALTHY state again.
#8. Add app placement constraint with field = key and value = value
#9. Verify attributes, run on any node
# curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
# OR Check DCOS cluster UI
# Nodes => Select any Node => Details Tab
tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")
# key:value paired attribues, separated by ;
ATTRIBUTES=NODE_TYPE:GPU_NODE
cat <<EOF > ${tmpScript}
echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
sudo systemctl stop dcos-mesos-slave
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl start dcos-mesos-slave
EOF
# Add the private ip of nodes on which you want to add attrubutes, one ip per line.
for i in `cat nodes.txt`; do
echo $i
dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
sleep 10
done

Install DC/OS Datadog metrics plugin on all DC/OS nodes

#!/bin/bash
# Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>
DDAPI=$1
if [[ -z $DDAPI ]]; then
echo "[Datadog Plugin] Need datadog API key as parameter."
echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
fi
tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
declare agent=$tmpScriptAgent
declare master=$tmpScriptMaster
for role in "agent" "master"
do
cat <<EOF > ${!role}
curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
export DD_API_KEY=$DDAPI
export AGENT_ROLE=$role
sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
sudo systemctl daemon-reload
sudo systemctl start dcos-metrics-datadog.service
echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
#sudo systemctl status dcos-metrics-datadog.service | head -3
#sudo journalctl -u dcos-metrics-datadog
EOF
done
echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`
do
echo -e "\n###> Node - $i"
dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
echo -e "======================================================="
done
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'`
do
echo -e "\n###> Master Node - $i"
dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
echo -e "======================================================="
done
# Check status of dcos-metrics-datadog.service on all nodes.
#for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

Get app / node metrics fetched by dcos-metrics component using metrics API

  • Get DC/OS node id [dcos node]
  • Get Node metrics (CPU, memory, local filesystems, networks, etc) :  <dc os_cluster_url="">/system/v1/agent/<agent_id>/metrics/v0/node</agent_id></dc>
  • Get id of all containers running on that agent : <dc os_cluster_url="">/system/v1/agent/<agent_id>/metrics/v0/containers</agent_id></dc>
  • Get Resource allocation and usage for the given container ID. : <dc os_cluster_url="">/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id></container_id></agent_id></dc>
  • Get Application-level metrics from the container (shipped in StatsD format using the listener available at STATSD_UDP_HOST and STATSD_UDP_PORT) : <dc os_cluster_url="">/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id>/app     </container_id></agent_id></dc>

Get app / node metrics fetched by dcos-metrics component using dcos cli

  • Summary of container metrics for a specific task

dcos task metrics summary <task-id>
view raw summary.js hosted with ❤ by GitHub

  • All metrics in details for a specific task

dcos task metrics details <task-id>
view raw metrics.js hosted with ❤ by GitHub

  • Summary of Node metrics for a specific node

dcos task metrics summary <mesos-node-id>
view raw node_metrics.js hosted with ❤ by GitHub

  • All Node metrics in details for a specific node

dcos node metrics details <mesos-node-id>
view raw mesos_node.js hosted with ❤ by GitHub

NOTE - All above commands have ‘--json’ flag to use them programmatically.  

Launch / run command inside container for a task

DC/OS task exec cli only supports Mesos containers, this script supports both Mesos and Docker containers.

#!/bin/bash
echo "DCOS Task Exec 2.0"
if [ "$#" -eq 0 ]; then
echo "Need task name or id as input. Exiting."
exit 1
fi
taskName=$1
taskCmd=${2:-bash}
TMP_TASKLIST_JSON=/tmp/dcostasklist.json
dcos task --json > $TMP_TASKLIST_JSON
taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
if [[ $taskExist -eq 0 ]]; then
echo "No task with name $taskName exists."
echo "Do you mean ?"
dcos task | grep $taskName | awk '{print $1}'
exit 1
fi
taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
if [[ $taskExist -ne 1 ]]; then
echo -e "More than one instances. Please select task ID for executing command.\n"
#allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
echo ""
read TaskId
fi
if [[ $taskType != "DOCKER" ]]; then
echo "Task [ $taskName ] is of type MESOS Container."
execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
echo "Running [$execCmd]"
$execCmd
else
echo "Task [ $taskName ] is of type DOCKER Container."
taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
taskContID=`echo $taskContID | tr -d '\r'`
echo "Task Docker Container ID : [ $taskContID ]"
echo "Running [ docker exec -it $taskContID $taskCmd ]"
dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
fi
view raw dcosTaskExec.sh hosted with ❤ by GitHub

Get DC/OS tasks by node

#!/bin/bash
function tasksByNodeAPI
{
echo "DC/OS Tasks By Node"
if [ "$#" -eq 0 ]; then
echo "Need node ip as input. Exiting."
exit 1
fi
nodeIp=$1
mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
if [ -z "mesosId" ]; then
echo "No node found with ip $nodeIp. Exiting."
exit 1
fi
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "\t\t\t" + .id' -r
}
function tasksByNodeCLI
{
echo "DC/OS Tasks By Node"
if [ "$#" -eq 0 ]; then
echo "Need node ip as input. Exiting."
exit 1
fi
nodeIp=$1
dcos task | egrep "HOST|$nodeIp"
}

Get cluster metadata - cluster Public IP and cluster ID

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"          
$(dcos config show core.dcos_url)/metadata

Sample Output:

{
"PUBLIC_IPV4": "123.456.789.012",
"CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
}
view raw cluster_output hosted with ❤ by GitHub

Get DC/OS metadata - DC/OS version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq

Sample Output:

{
"version": "1.11.0",
"dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
"bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
}
view raw dcos_output hosted with ❤ by GitHub

Get Mesos version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/version

Sample Output:

{
"build_date": "2018-02-27 21:31:27",
"build_time": 1519767087.0,
"build_user": "",
"git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
"version": "1.5.0"
}
view raw mesos_output hosted with ❤ by GitHub

Access DC/OS cluster exhibitor UI (Exhibitor supervises ZooKeeper and provides a management web interface)

<CLUSTER_URL>/exhibitor

Access DC/OS cluster data from cluster zookeeper using Zookeeper Python client - Run inside any node / container

from kazoo.client import KazooClient
zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
zk.start()
clusterId = ""
# Here we can give znode path to retrieve its decoded data,
# for ex to get cluster-id, use
# data, stat = zk.get("/cluster-id")
# clusterId = data.decode("utf-8")
# Get cluster Id
if zk.exists("/cluster-id"):
data, stat = zk.get("/cluster-id")
clusterId = data.decode("utf-8")
zk.stop()
print (clusterId)

Access dcos cluster data from cluster zookeeper using exhibitor rest API

# Get znode data using endpoint :
# /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
# Example : Get znode data for path = /cluster-id
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id
view raw access_dcos.sh hosted with ❤ by GitHub

Sample Output:

{
"bytes": "3333-XXXXXX",
"str": "abcde-abcde-abcde-abcde-abcde-",
"stat": "XXXXXX"
}
view raw access_output hosted with ❤ by GitHub

Get cluster name using Mesos API

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

Mark Mesos node as decommissioned

Some times instances which are running as DC/OS node gets terminated and can not come back online, like AWS EC2 instances, once terminated due to any reason, can not start back. When Mesos detects that a node has stopped, it puts the node in the UNREACHABLE state because Mesos does not know if the node is temporarily stopped and will come back online, or if it is permanently stopped. In such case, we can explicitly tell Mesos to put a node in the GONE state if we know a node will not come back.

dcos node decommission <mesos-agent-id>
view raw decommission.sh hosted with ❤ by GitHub

Conclusion

We learned about Mesosphere DC/OS, its functionality and roles. We also learned how to setup and use DC/OS cli and use http authentication to access DC/OS APIs as well as using DC/OS cli for automating tasks.

We went through different API endpoints like Mesos, Marathon, DC/OS metrics, exhibitor, DC/OS cluster organization etc. Finally, we looked at different tricks and scripts to automate DC/OS, like DC/OS node details, task exec, Docker report, DC/OS API http authentication etc.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings