• United States+1
  • United Kingdom+44
  • Afghanistan (‫افغانستان‬‎)+93
  • Albania (Shqipëri)+355
  • Algeria (‫الجزائر‬‎)+213
  • American Samoa+1684
  • Andorra+376
  • Angola+244
  • Anguilla+1264
  • Antigua and Barbuda+1268
  • Argentina+54
  • Armenia (Հայաստան)+374
  • Aruba+297
  • Australia+61
  • Austria (Österreich)+43
  • Azerbaijan (Azərbaycan)+994
  • Bahamas+1242
  • Bahrain (‫البحرين‬‎)+973
  • Bangladesh (বাংলাদেশ)+880
  • Barbados+1246
  • Belarus (Беларусь)+375
  • Belgium (België)+32
  • Belize+501
  • Benin (Bénin)+229
  • Bermuda+1441
  • Bhutan (འབྲུག)+975
  • Bolivia+591
  • Bosnia and Herzegovina (Босна и Херцеговина)+387
  • Botswana+267
  • Brazil (Brasil)+55
  • British Indian Ocean Territory+246
  • British Virgin Islands+1284
  • Brunei+673
  • Bulgaria (България)+359
  • Burkina Faso+226
  • Burundi (Uburundi)+257
  • Cambodia (កម្ពុជា)+855
  • Cameroon (Cameroun)+237
  • Canada+1
  • Cape Verde (Kabu Verdi)+238
  • Caribbean Netherlands+599
  • Cayman Islands+1345
  • Central African Republic (République centrafricaine)+236
  • Chad (Tchad)+235
  • Chile+56
  • China (中国)+86
  • Christmas Island+61
  • Cocos (Keeling) Islands+61
  • Colombia+57
  • Comoros (‫جزر القمر‬‎)+269
  • Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)+243
  • Congo (Republic) (Congo-Brazzaville)+242
  • Cook Islands+682
  • Costa Rica+506
  • Côte d’Ivoire+225
  • Croatia (Hrvatska)+385
  • Cuba+53
  • Curaçao+599
  • Cyprus (Κύπρος)+357
  • Czech Republic (Česká republika)+420
  • Denmark (Danmark)+45
  • Djibouti+253
  • Dominica+1767
  • Dominican Republic (República Dominicana)+1
  • Ecuador+593
  • Egypt (‫مصر‬‎)+20
  • El Salvador+503
  • Equatorial Guinea (Guinea Ecuatorial)+240
  • Eritrea+291
  • Estonia (Eesti)+372
  • Ethiopia+251
  • Falkland Islands (Islas Malvinas)+500
  • Faroe Islands (Føroyar)+298
  • Fiji+679
  • Finland (Suomi)+358
  • France+33
  • French Guiana (Guyane française)+594
  • French Polynesia (Polynésie française)+689
  • Gabon+241
  • Gambia+220
  • Georgia (საქართველო)+995
  • Germany (Deutschland)+49
  • Ghana (Gaana)+233
  • Gibraltar+350
  • Greece (Ελλάδα)+30
  • Greenland (Kalaallit Nunaat)+299
  • Grenada+1473
  • Guadeloupe+590
  • Guam+1671
  • Guatemala+502
  • Guernsey+44
  • Guinea (Guinée)+224
  • Guinea-Bissau (Guiné Bissau)+245
  • Guyana+592
  • Haiti+509
  • Honduras+504
  • Hong Kong (香港)+852
  • Hungary (Magyarország)+36
  • Iceland (Ísland)+354
  • India (भारत)+91
  • Indonesia+62
  • Iran (‫ایران‬‎)+98
  • Iraq (‫العراق‬‎)+964
  • Ireland+353
  • Isle of Man+44
  • Israel (‫ישראל‬‎)+972
  • Italy (Italia)+39
  • Jamaica+1876
  • Japan (日本)+81
  • Jersey+44
  • Jordan (‫الأردن‬‎)+962
  • Kazakhstan (Казахстан)+7
  • Kenya+254
  • Kiribati+686
  • Kosovo+383
  • Kuwait (‫الكويت‬‎)+965
  • Kyrgyzstan (Кыргызстан)+996
  • Laos (ລາວ)+856
  • Latvia (Latvija)+371
  • Lebanon (‫لبنان‬‎)+961
  • Lesotho+266
  • Liberia+231
  • Libya (‫ليبيا‬‎)+218
  • Liechtenstein+423
  • Lithuania (Lietuva)+370
  • Luxembourg+352
  • Macau (澳門)+853
  • Macedonia (FYROM) (Македонија)+389
  • Madagascar (Madagasikara)+261
  • Malawi+265
  • Malaysia+60
  • Maldives+960
  • Mali+223
  • Malta+356
  • Marshall Islands+692
  • Martinique+596
  • Mauritania (‫موريتانيا‬‎)+222
  • Mauritius (Moris)+230
  • Mayotte+262
  • Mexico (México)+52
  • Micronesia+691
  • Moldova (Republica Moldova)+373
  • Monaco+377
  • Mongolia (Монгол)+976
  • Montenegro (Crna Gora)+382
  • Montserrat+1664
  • Morocco (‫المغرب‬‎)+212
  • Mozambique (Moçambique)+258
  • Myanmar (Burma) (မြန်မာ)+95
  • Namibia (Namibië)+264
  • Nauru+674
  • Nepal (नेपाल)+977
  • Netherlands (Nederland)+31
  • New Caledonia (Nouvelle-Calédonie)+687
  • New Zealand+64
  • Nicaragua+505
  • Niger (Nijar)+227
  • Nigeria+234
  • Niue+683
  • Norfolk Island+672
  • North Korea (조선 민주주의 인민 공화국)+850
  • Northern Mariana Islands+1670
  • Norway (Norge)+47
  • Oman (‫عُمان‬‎)+968
  • Pakistan (‫پاکستان‬‎)+92
  • Palau+680
  • Palestine (‫فلسطين‬‎)+970
  • Panama (Panamá)+507
  • Papua New Guinea+675
  • Paraguay+595
  • Peru (Perú)+51
  • Philippines+63
  • Poland (Polska)+48
  • Portugal+351
  • Puerto Rico+1
  • Qatar (‫قطر‬‎)+974
  • Réunion (La Réunion)+262
  • Romania (România)+40
  • Russia (Россия)+7
  • Rwanda+250
  • Saint Barthélemy (Saint-Barthélemy)+590
  • Saint Helena+290
  • Saint Kitts and Nevis+1869
  • Saint Lucia+1758
  • Saint Martin (Saint-Martin (partie française))+590
  • Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)+508
  • Saint Vincent and the Grenadines+1784
  • Samoa+685
  • San Marino+378
  • São Tomé and Príncipe (São Tomé e Príncipe)+239
  • Saudi Arabia (‫المملكة العربية السعودية‬‎)+966
  • Senegal (Sénégal)+221
  • Serbia (Србија)+381
  • Seychelles+248
  • Sierra Leone+232
  • Singapore+65
  • Sint Maarten+1721
  • Slovakia (Slovensko)+421
  • Slovenia (Slovenija)+386
  • Solomon Islands+677
  • Somalia (Soomaaliya)+252
  • South Africa+27
  • South Korea (대한민국)+82
  • South Sudan (‫جنوب السودان‬‎)+211
  • Spain (España)+34
  • Sri Lanka (ශ්‍රී ලංකාව)+94
  • Sudan (‫السودان‬‎)+249
  • Suriname+597
  • Svalbard and Jan Mayen+47
  • Swaziland+268
  • Sweden (Sverige)+46
  • Switzerland (Schweiz)+41
  • Syria (‫سوريا‬‎)+963
  • Taiwan (台灣)+886
  • Tajikistan+992
  • Tanzania+255
  • Thailand (ไทย)+66
  • Timor-Leste+670
  • Togo+228
  • Tokelau+690
  • Tonga+676
  • Trinidad and Tobago+1868
  • Tunisia (‫تونس‬‎)+216
  • Turkey (Türkiye)+90
  • Turkmenistan+993
  • Turks and Caicos Islands+1649
  • Tuvalu+688
  • U.S. Virgin Islands+1340
  • Uganda+256
  • Ukraine (Україна)+380
  • United Arab Emirates (‫الإمارات العربية المتحدة‬‎)+971
  • United Kingdom+44
  • United States+1
  • Uruguay+598
  • Uzbekistan (Oʻzbekiston)+998
  • Vanuatu+678
  • Vatican City (Città del Vaticano)+39
  • Venezuela+58
  • Vietnam (Việt Nam)+84
  • Wallis and Futuna+681
  • Western Sahara (‫الصحراء الغربية‬‎)+212
  • Yemen (‫اليمن‬‎)+967
  • Zambia+260
  • Zimbabwe+263
  • Åland Islands+358
Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

Demystifying High Availability in Kubernetes Using Kubeadm

Introduction

The rise of containers has reshaped the way we develop, deploy and maintain the software. Containers allow us to package the different services that constitute an application into separate containers, and to deploy those containers across a set of virtual and physical machines. This gives rise to container orchestration tool to automate the deployment, management, scaling and availability of a container-based application. Kubernetes allows deployment and management of container-based applications at scale. Learn more about backup and disaster recovery for your Kubernetes clusters.

One of the main advantages of Kubernetes is how it brings greater reliability and stability to the container-based distributed application, through the use of dynamic scheduling of containers. But, how do you make sure Kubernetes itself stays up when a component or its master node goes down?
 

High Availability in Kubernetes

 

Why we need Kubernetes High Availability?

Kubernetes High-Availability is about setting up Kubernetes, along with its supporting components in a way that there is no single point of failure. A single master cluster can easily fail, while a multi-master cluster uses multiple master nodes, each of which has access to same worker nodes. In a single master cluster the important component like API server, controller manager lies only on the single master node and if it fails you cannot create more services, pods etc. However, in case of Kubernetes HA environment, these important components are replicated on multiple masters(usually three masters) and if any of the masters fail, the other masters keep the cluster up and running.

Advantages of multi-master

In the Kubernetes cluster, the master node manages the etcd database, API server, controller manager, and scheduler, along with all the worker nodes. What if we have only a single master node and if that node fails, all the worker nodes will be unscheduled and the cluster will be lost.

In a multi-master setup, by contrast, a multi-master provides high availability for a single cluster by running multiple apiserver, etcd, controller-manager, and schedulers. This does not only provides redundancy but also improves network performance because all the masters are dividing the load among themselves.

A multi-master setup protects against a wide range of failure modes, from a loss of a single worker node to the failure of the master node’s etcd service. By providing redundancy, a multi-master cluster serves as a highly available system for your end-users.

Steps to Achieve Kubernetes HA

Before moving to steps to achieve high-availability, let us understand what we are trying to achieve through a diagram:

Kubernetes High Availability Steps

(Image Source: Kubernetes Official Documentation)

Master Node: Each master node in a multi-master environment run its’ own copy of Kube API server. This can be used for load balancing among the master nodes. Master node also runs its copy of the etcd database, which stores all the data of cluster. In addition to API server and etcd database, the master node also runs k8s controller manager, which handles replication and scheduler, which schedules pods to nodes.

Worker Node: Like single master in the multi-master cluster also the worker runs their own component mainly orchestrating pods in the Kubernetes cluster. We need 3 machines which satisfy the Kubernetes master requirement and 3 machines which satisfy the Kubernetes worker requirement.

For each master, that has been provisioned, follow the installation guide to install kubeadm and its dependencies. In this blog we will use k8s 1.10.4 to implement HA.

Note: Please note that cgroup driver for docker and kubelet differs in some version of k8s, make sure you change cgroup driver to cgroupfs for docker and kubelet. If cgroup driver for kubelet and docker differs then the master doesn’t come up when rebooted.

Setup etcd cluster

1. Install cfssl and cfssljson

$ curl -o /usr/local/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
$ curl -o /usr/local/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
$ chmod +x /usr/local/bin/cfssl*
$ export PATH=$PATH:/usr/local/bin

2 . Generate certificates on master-0

$ mkdir -p /etc/kubernetes/pki/etcd
$ cd /etc/kubernetes/pki/etcd
view raw cd_cmd hosted with ❤ by GitHub

3. Create config.json file in /etc/kubernetes/pki/etcd folder with following content.

{
"signing": {
"default": {
"expiry": "43800h"
},
"profiles": {
"server": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
},
"client": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
},
"peer": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
view raw ca-config.json hosted with ❤ by GitHub

4. Create ca-csr.json file in /etc/kubernetes/pki/etcd folder with following content.

{
"CN": "etcd",
"key": {
"algo": "rsa",
"size": 2048
}
}
view raw ca-csr.json hosted with ❤ by GitHub

5. Create client.json file in /etc/kubernetes/pki/etcd folder with following content.

{
"CN": "client",
"key": {
"algo": "ecdsa",
"size": 256
}
}
view raw client.json hosted with ❤ by GitHub

$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client
view raw cert_cmd hosted with ❤ by GitHub

6. Create a directory  /etc/kubernetes/pki/etcd on master-1 and master-2 and copy all the generated certificates into it.

7. On all masters, now generate peer and etcd certs in /etc/kubernetes/pki/etcd. To generate them, we need the previous CA certificates on all masters.

$ export PEER_NAME=$(hostname)
$ export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet \K[\d.]+')
$ cfssl print-defaults csr > config.json
$ sed -i 's/www\.example\.net/'"$PRIVATE_IP"'/' config.json
$ sed -i 's/example\.net/'"$PEER_NAME"'/' config.json
$ sed -i '0,/CN/{s/example\.net/'"$PEER_NAME"'/}' config.json
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server config.json | cfssljson -bare server
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer config.json | cfssljson -bare peer
view raw server_cert_cmd hosted with ❤ by GitHub

This will replace the default configuration with your machine’s hostname and IP address, so in case if you encounter any problem just check the hostname and IP address are correct and rerun cfssl command.

8. On all masters, Install etcd and set it’s environment file.

$ yum install etcd -y
$ touch /etc/etcd.env
$ echo "PEER_NAME=$PEER_NAME" >> /etc/etcd.env
$ echo "PRIVATE_IP=$PRIVATE_IP" >> /etc/etcd.env
view raw etcd_cmd hosted with ❤ by GitHub

9. Now, we will create a 3 node etcd cluster on all 3 master nodes. Starting etcd service on all three nodes as systemd. Create a file /etc/systemd/system/etcd.service on all masters.

[Unit]
Description=etcd
Documentation=https://github.com/coreos/etcd
Conflicts=etcd.service
Conflicts=etcd2.service
[Service]
EnvironmentFile=/etc/etcd.env
Type=notify
Restart=always
RestartSec=5s
LimitNOFILE=40000
TimeoutStartSec=0
ExecStart=/bin/etcd --name <host_name> --data-dir /var/lib/etcd --listen-client-urls http://<host_private_ip>:2379,http://127.0.0.1:2379 --advertise-client-urls http://<host_private_ip>:2379 --listen-peer-urls http://<host_private_ip>:2380 --initial-advertise-peer-urls http://<host_private_ip>:2380 --cert-file=/etc/kubernetes/pki/etcd/server.pem --key-file=/etc/kubernetes/pki/etcd/server-key.pem --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --initial-cluster master-0=http://<master0_private_ip>:2380,master-1=http://<master1_private_ip>:2380,master-2=http://<master2_private_ip>:2380 --initial-cluster-token my-etcd-token --initial-cluster-state new --client-cert-auth=false --peer-client-cert-auth=false
[Install]
WantedBy=multi-user.target
view raw etcd.service hosted with ❤ by GitHub

10. Ensure that you will replace the following placeholder with

  • <host_name> : Replace as the master’s hostname</host_name>
  • <host_private_ip>: Replace as the current host private IP</host_private_ip>
  • <master0_private_ip>: Replace as the master-0 private IP</master0_private_ip>
  • <master1_private_ip>: Replace as the master-1 private IP</master1_private_ip>
  • <master2_private_ip>: Replace as the master-2 private IP</master2_private_ip>

11. Start the etcd service on all three master nodes and check the etcd cluster health:

$ systemctl daemon-reload
$ systemctl enable etcd
$ systemctl start etcd
$ etcdctl cluster-health

This will show the cluster healthy and connected to all three nodes.

Setup load balancer

There are multiple cloud provider solutions for load balancing like AWS elastic load balancer, GCE load balancing etc. There might not be a physical load balancer available, we can setup a virtual IP load balancer to healthy node master. We are using keepalived for load balancing, install keepalived on all master nodes

$ yum install keepalived -y

Create the following configuration file /etc/keepalived/keepalived.conf on all master nodes:

! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
state <state>
interface <interface>
virtual_router_id 51
priority <priority>
authentication {
auth_type PASS
auth_pass velotiotechnologies
}
virtual_ipaddress {
<virtual ip>
}
track_script {
check_apiserver
}
}
view raw keepalived.conf hosted with ❤ by GitHub

  • state is either MASTER (on the first master nodes) or BACKUP (the other master nodes).
  • Interface is generally the primary interface, in my case it is eth0
  • Priority should be higher for master node e.g 101 and lower for others e.g 100
  • Virtual_ip should contain the virtual ip of master nodes

Install the following health check script to /etc/keepalived/check_apiserver.sh on all master nodes:

#!/bin/sh
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
if ip addr | grep -q <VIRTUAL-IP>; then
curl --silent --max-time 2 --insecure https://<VIRTUAL-IP>:6443/ -o /dev/null || errorExit "Error GET https://<VIRTUAL-IP>:6443/"
fi

$ systemctl restart keepalived
view raw restart.js hosted with ❤ by GitHub

Setup three master node cluster

Run kubeadm init on master0:

Create config.yaml file with following content.

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
advertiseAddress: <master-private-ip>
etcd:
endpoints:
- http://<master0-ip-address>:2379
- http://<master1-ip-address>:2379
- http://<master2-ip-address>:2379
caFile: /etc/kubernetes/pki/etcd/ca.pem
certFile: /etc/kubernetes/pki/etcd/client.pem
keyFile: /etc/kubernetes/pki/etcd/client-key.pem
networking:
podSubnet: <podCIDR>
apiServerCertSANs:
- <load-balancer-ip>
apiServerExtraArgs:
endpoint-reconciler-type: lease
view raw config.yaml hosted with ❤ by GitHub

Please ensure that the following placeholders are replaced:

  • <master-private-ip> with the private IPv4 of the master server on which config file resides.</master-private-ip>
  • <master0-ip-address>, <master1-ip-address> and <master-2-ip-address> with the IP addresses of your three master nodes</master-2-ip-address></master1-ip-address></master0-ip-address>
  • <podcidr> with your Pod CIDR. Please read the </podcidr>CNI network section of the docs for more information. Some CNI providers do not require a value to be set. I am using weave-net as pod network, hence podCIDR will be 10.32.0.0/12
  • <load-balancer-ip> with the virtual IP set up in the load balancer in the previous section.</load-balancer-ip>

$ kubeadm init --config=config.yaml

10. Run kubeadm init on master1 and master2:

First of all copy /etc/kubernetes/pki/ca.crt, /etc/kubernetes/pki/ca.key, /etc/kubernetes/pki/sa.key, /etc/kubernetes/pki/sa.pub to master1’s and master2’s /etc/kubernetes/pki folder.

Note: Copying this files is crucial, otherwise the other two master nodes won’t go into the ready state.

Copy the config file config.yaml from master0 to master1 and master2. We need to change <master-private-ip> to current master host’s private IP.</master-private-ip>

$ kubeadm init --config=config.yaml

11. Now you can install pod network on all three masters to bring them in the ready state. I am using weave-net pod network, to apply weave-net run:

export kubever=$(kubectl version | base64 | tr -d '\n') kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"


12. By default, k8s doesn’t schedule any workload on the master, so if you want to schedule workload on master node as well, taint all the master nodes using the command:

$ kubectl taint nodes --all node-role.kubernetes.io/master-
view raw node_command.js hosted with ❤ by GitHub

13. Now that we have functional master nodes, we can join some worker nodes:

Use the join string you got at the end of kubeadm init command

$ kubeadm join 10.0.1.234:6443 --token llb1kx.azsbunpbg13tgc8k --discovery-token-ca-cert-hash sha256:1ad2a436ce0c277d0c5bd3826091e72badbd8417ffdbbd4f6584a2de588bf522
view raw kubeadm_join.js hosted with ❤ by GitHub

High Availability in action

The Kubernetes HA cluster will look like:

[root@master-0 centos]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-0 NotReady master 4h v1.10.4
master-1 Ready master 4h v1.10.4
master-2 Ready master 4h v1.10.4
view raw notready_nodes hosted with ❤ by GitHub

[root@master-0 centos]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-master-0 1/1 Unknown 0 4h
kube-apiserver-master-1 1/1 Running 0 4h
kube-apiserver-master-2 1/1 Running 0 4h
kube-controller-manager-master-0 1/1 Unknown 0 4h
kube-controller-manager-master-1 1/1 Running 0 4h
kube-controller-manager-master-2 1/1 Running 0 4h
kube-dns-86f4d74b45-wh795 3/3 Running 0 4h
kube-proxy-9ts6r 1/1 Running 0 4h
kube-proxy-hkbn7 1/1 NodeLost 0 4h
kube-proxy-sq6l6 1/1 Running 0 4h
kube-scheduler-master-0 1/1 Unknown 0 4h
kube-scheduler-master-1 1/1 Running 0 4h
kube-scheduler-master-2 1/1 Running 0 4h
weave-net-6nzbq 2/2 NodeLost 0 4h
weave-net-ndx2q 2/2 Running 0 4h
weave-net-w2mfz 2/2 Running 0 4h
view raw notready_pods hosted with ❤ by GitHub

After failing over one master node the Kubernetes cluster is still accessible.

[root@master-0 centos]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-0 NotReady master 4h v1.10.4
master-1 Ready master 4h v1.10.4
master-2 Ready master 4h v1.10.4
view raw notready_nodes hosted with ❤ by GitHub

[root@master-0 centos]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-master-0 1/1 Unknown 0 4h
kube-apiserver-master-1 1/1 Running 0 4h
kube-apiserver-master-2 1/1 Running 0 4h
kube-controller-manager-master-0 1/1 Unknown 0 4h
kube-controller-manager-master-1 1/1 Running 0 4h
kube-controller-manager-master-2 1/1 Running 0 4h
kube-dns-86f4d74b45-wh795 3/3 Running 0 4h
kube-proxy-9ts6r 1/1 Running 0 4h
kube-proxy-hkbn7 1/1 NodeLost 0 4h
kube-proxy-sq6l6 1/1 Running 0 4h
kube-scheduler-master-0 1/1 Unknown 0 4h
kube-scheduler-master-1 1/1 Running 0 4h
kube-scheduler-master-2 1/1 Running 0 4h
weave-net-6nzbq 2/2 NodeLost 0 4h
weave-net-ndx2q 2/2 Running 0 4h
weave-net-w2mfz 2/2 Running 0 4h
view raw notready_pods hosted with ❤ by GitHub

Even after one node failed, all the important components are up and running. The cluster is still accessible and you can create more pods, deployment services etc.

[root@master-1 centos]# kubectl create -f nginx.yaml
deployment.apps "nginx-deployment" created
[root@master-1 centos]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-deployment-75675f5897-884kc 1/1 Running 0 10s 10.117.113.98 master-2
nginx-deployment-75675f5897-crgxt 1/1 Running 0 10s 10.117.113.2 master-1

Conclusion

High availability is an important part of reliability engineering, focused on making system reliable and avoid any single point of failure of the complete system. At first glance, its implementation might seem quite complex, but high availability brings tremendous advantages to the system that requires increased stability and reliability. Using highly available cluster is one of the most important aspects of building a solid infrastructure.

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

Demystifying High Availability in Kubernetes Using Kubeadm

Introduction

The rise of containers has reshaped the way we develop, deploy and maintain the software. Containers allow us to package the different services that constitute an application into separate containers, and to deploy those containers across a set of virtual and physical machines. This gives rise to container orchestration tool to automate the deployment, management, scaling and availability of a container-based application. Kubernetes allows deployment and management of container-based applications at scale. Learn more about backup and disaster recovery for your Kubernetes clusters.

One of the main advantages of Kubernetes is how it brings greater reliability and stability to the container-based distributed application, through the use of dynamic scheduling of containers. But, how do you make sure Kubernetes itself stays up when a component or its master node goes down?
 

High Availability in Kubernetes

 

Why we need Kubernetes High Availability?

Kubernetes High-Availability is about setting up Kubernetes, along with its supporting components in a way that there is no single point of failure. A single master cluster can easily fail, while a multi-master cluster uses multiple master nodes, each of which has access to same worker nodes. In a single master cluster the important component like API server, controller manager lies only on the single master node and if it fails you cannot create more services, pods etc. However, in case of Kubernetes HA environment, these important components are replicated on multiple masters(usually three masters) and if any of the masters fail, the other masters keep the cluster up and running.

Advantages of multi-master

In the Kubernetes cluster, the master node manages the etcd database, API server, controller manager, and scheduler, along with all the worker nodes. What if we have only a single master node and if that node fails, all the worker nodes will be unscheduled and the cluster will be lost.

In a multi-master setup, by contrast, a multi-master provides high availability for a single cluster by running multiple apiserver, etcd, controller-manager, and schedulers. This does not only provides redundancy but also improves network performance because all the masters are dividing the load among themselves.

A multi-master setup protects against a wide range of failure modes, from a loss of a single worker node to the failure of the master node’s etcd service. By providing redundancy, a multi-master cluster serves as a highly available system for your end-users.

Steps to Achieve Kubernetes HA

Before moving to steps to achieve high-availability, let us understand what we are trying to achieve through a diagram:

Kubernetes High Availability Steps

(Image Source: Kubernetes Official Documentation)

Master Node: Each master node in a multi-master environment run its’ own copy of Kube API server. This can be used for load balancing among the master nodes. Master node also runs its copy of the etcd database, which stores all the data of cluster. In addition to API server and etcd database, the master node also runs k8s controller manager, which handles replication and scheduler, which schedules pods to nodes.

Worker Node: Like single master in the multi-master cluster also the worker runs their own component mainly orchestrating pods in the Kubernetes cluster. We need 3 machines which satisfy the Kubernetes master requirement and 3 machines which satisfy the Kubernetes worker requirement.

For each master, that has been provisioned, follow the installation guide to install kubeadm and its dependencies. In this blog we will use k8s 1.10.4 to implement HA.

Note: Please note that cgroup driver for docker and kubelet differs in some version of k8s, make sure you change cgroup driver to cgroupfs for docker and kubelet. If cgroup driver for kubelet and docker differs then the master doesn’t come up when rebooted.

Setup etcd cluster

1. Install cfssl and cfssljson

$ curl -o /usr/local/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
$ curl -o /usr/local/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
$ chmod +x /usr/local/bin/cfssl*
$ export PATH=$PATH:/usr/local/bin

2 . Generate certificates on master-0

$ mkdir -p /etc/kubernetes/pki/etcd
$ cd /etc/kubernetes/pki/etcd
view raw cd_cmd hosted with ❤ by GitHub

3. Create config.json file in /etc/kubernetes/pki/etcd folder with following content.

{
"signing": {
"default": {
"expiry": "43800h"
},
"profiles": {
"server": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
},
"client": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
},
"peer": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
view raw ca-config.json hosted with ❤ by GitHub

4. Create ca-csr.json file in /etc/kubernetes/pki/etcd folder with following content.

{
"CN": "etcd",
"key": {
"algo": "rsa",
"size": 2048
}
}
view raw ca-csr.json hosted with ❤ by GitHub

5. Create client.json file in /etc/kubernetes/pki/etcd folder with following content.

{
"CN": "client",
"key": {
"algo": "ecdsa",
"size": 256
}
}
view raw client.json hosted with ❤ by GitHub

$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client
view raw cert_cmd hosted with ❤ by GitHub

6. Create a directory  /etc/kubernetes/pki/etcd on master-1 and master-2 and copy all the generated certificates into it.

7. On all masters, now generate peer and etcd certs in /etc/kubernetes/pki/etcd. To generate them, we need the previous CA certificates on all masters.

$ export PEER_NAME=$(hostname)
$ export PRIVATE_IP=$(ip addr show eth0 | grep -Po 'inet \K[\d.]+')
$ cfssl print-defaults csr > config.json
$ sed -i 's/www\.example\.net/'"$PRIVATE_IP"'/' config.json
$ sed -i 's/example\.net/'"$PEER_NAME"'/' config.json
$ sed -i '0,/CN/{s/example\.net/'"$PEER_NAME"'/}' config.json
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server config.json | cfssljson -bare server
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer config.json | cfssljson -bare peer
view raw server_cert_cmd hosted with ❤ by GitHub

This will replace the default configuration with your machine’s hostname and IP address, so in case if you encounter any problem just check the hostname and IP address are correct and rerun cfssl command.

8. On all masters, Install etcd and set it’s environment file.

$ yum install etcd -y
$ touch /etc/etcd.env
$ echo "PEER_NAME=$PEER_NAME" >> /etc/etcd.env
$ echo "PRIVATE_IP=$PRIVATE_IP" >> /etc/etcd.env
view raw etcd_cmd hosted with ❤ by GitHub

9. Now, we will create a 3 node etcd cluster on all 3 master nodes. Starting etcd service on all three nodes as systemd. Create a file /etc/systemd/system/etcd.service on all masters.

[Unit]
Description=etcd
Documentation=https://github.com/coreos/etcd
Conflicts=etcd.service
Conflicts=etcd2.service
[Service]
EnvironmentFile=/etc/etcd.env
Type=notify
Restart=always
RestartSec=5s
LimitNOFILE=40000
TimeoutStartSec=0
ExecStart=/bin/etcd --name <host_name> --data-dir /var/lib/etcd --listen-client-urls http://<host_private_ip>:2379,http://127.0.0.1:2379 --advertise-client-urls http://<host_private_ip>:2379 --listen-peer-urls http://<host_private_ip>:2380 --initial-advertise-peer-urls http://<host_private_ip>:2380 --cert-file=/etc/kubernetes/pki/etcd/server.pem --key-file=/etc/kubernetes/pki/etcd/server-key.pem --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --peer-cert-file=/etc/kubernetes/pki/etcd/peer.pem --peer-key-file=/etc/kubernetes/pki/etcd/peer-key.pem --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem --initial-cluster master-0=http://<master0_private_ip>:2380,master-1=http://<master1_private_ip>:2380,master-2=http://<master2_private_ip>:2380 --initial-cluster-token my-etcd-token --initial-cluster-state new --client-cert-auth=false --peer-client-cert-auth=false
[Install]
WantedBy=multi-user.target
view raw etcd.service hosted with ❤ by GitHub

10. Ensure that you will replace the following placeholder with

  • <host_name> : Replace as the master’s hostname</host_name>
  • <host_private_ip>: Replace as the current host private IP</host_private_ip>
  • <master0_private_ip>: Replace as the master-0 private IP</master0_private_ip>
  • <master1_private_ip>: Replace as the master-1 private IP</master1_private_ip>
  • <master2_private_ip>: Replace as the master-2 private IP</master2_private_ip>

11. Start the etcd service on all three master nodes and check the etcd cluster health:

$ systemctl daemon-reload
$ systemctl enable etcd
$ systemctl start etcd
$ etcdctl cluster-health

This will show the cluster healthy and connected to all three nodes.

Setup load balancer

There are multiple cloud provider solutions for load balancing like AWS elastic load balancer, GCE load balancing etc. There might not be a physical load balancer available, we can setup a virtual IP load balancer to healthy node master. We are using keepalived for load balancing, install keepalived on all master nodes

$ yum install keepalived -y

Create the following configuration file /etc/keepalived/keepalived.conf on all master nodes:

! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
state <state>
interface <interface>
virtual_router_id 51
priority <priority>
authentication {
auth_type PASS
auth_pass velotiotechnologies
}
virtual_ipaddress {
<virtual ip>
}
track_script {
check_apiserver
}
}
view raw keepalived.conf hosted with ❤ by GitHub

  • state is either MASTER (on the first master nodes) or BACKUP (the other master nodes).
  • Interface is generally the primary interface, in my case it is eth0
  • Priority should be higher for master node e.g 101 and lower for others e.g 100
  • Virtual_ip should contain the virtual ip of master nodes

Install the following health check script to /etc/keepalived/check_apiserver.sh on all master nodes:

#!/bin/sh
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
if ip addr | grep -q <VIRTUAL-IP>; then
curl --silent --max-time 2 --insecure https://<VIRTUAL-IP>:6443/ -o /dev/null || errorExit "Error GET https://<VIRTUAL-IP>:6443/"
fi

$ systemctl restart keepalived
view raw restart.js hosted with ❤ by GitHub

Setup three master node cluster

Run kubeadm init on master0:

Create config.yaml file with following content.

apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
api:
advertiseAddress: <master-private-ip>
etcd:
endpoints:
- http://<master0-ip-address>:2379
- http://<master1-ip-address>:2379
- http://<master2-ip-address>:2379
caFile: /etc/kubernetes/pki/etcd/ca.pem
certFile: /etc/kubernetes/pki/etcd/client.pem
keyFile: /etc/kubernetes/pki/etcd/client-key.pem
networking:
podSubnet: <podCIDR>
apiServerCertSANs:
- <load-balancer-ip>
apiServerExtraArgs:
endpoint-reconciler-type: lease
view raw config.yaml hosted with ❤ by GitHub

Please ensure that the following placeholders are replaced:

  • <master-private-ip> with the private IPv4 of the master server on which config file resides.</master-private-ip>
  • <master0-ip-address>, <master1-ip-address> and <master-2-ip-address> with the IP addresses of your three master nodes</master-2-ip-address></master1-ip-address></master0-ip-address>
  • <podcidr> with your Pod CIDR. Please read the </podcidr>CNI network section of the docs for more information. Some CNI providers do not require a value to be set. I am using weave-net as pod network, hence podCIDR will be 10.32.0.0/12
  • <load-balancer-ip> with the virtual IP set up in the load balancer in the previous section.</load-balancer-ip>

$ kubeadm init --config=config.yaml

10. Run kubeadm init on master1 and master2:

First of all copy /etc/kubernetes/pki/ca.crt, /etc/kubernetes/pki/ca.key, /etc/kubernetes/pki/sa.key, /etc/kubernetes/pki/sa.pub to master1’s and master2’s /etc/kubernetes/pki folder.

Note: Copying this files is crucial, otherwise the other two master nodes won’t go into the ready state.

Copy the config file config.yaml from master0 to master1 and master2. We need to change <master-private-ip> to current master host’s private IP.</master-private-ip>

$ kubeadm init --config=config.yaml

11. Now you can install pod network on all three masters to bring them in the ready state. I am using weave-net pod network, to apply weave-net run:

export kubever=$(kubectl version | base64 | tr -d '\n') kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"


12. By default, k8s doesn’t schedule any workload on the master, so if you want to schedule workload on master node as well, taint all the master nodes using the command:

$ kubectl taint nodes --all node-role.kubernetes.io/master-
view raw node_command.js hosted with ❤ by GitHub

13. Now that we have functional master nodes, we can join some worker nodes:

Use the join string you got at the end of kubeadm init command

$ kubeadm join 10.0.1.234:6443 --token llb1kx.azsbunpbg13tgc8k --discovery-token-ca-cert-hash sha256:1ad2a436ce0c277d0c5bd3826091e72badbd8417ffdbbd4f6584a2de588bf522
view raw kubeadm_join.js hosted with ❤ by GitHub

High Availability in action

The Kubernetes HA cluster will look like:

[root@master-0 centos]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-0 NotReady master 4h v1.10.4
master-1 Ready master 4h v1.10.4
master-2 Ready master 4h v1.10.4
view raw notready_nodes hosted with ❤ by GitHub

[root@master-0 centos]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-master-0 1/1 Unknown 0 4h
kube-apiserver-master-1 1/1 Running 0 4h
kube-apiserver-master-2 1/1 Running 0 4h
kube-controller-manager-master-0 1/1 Unknown 0 4h
kube-controller-manager-master-1 1/1 Running 0 4h
kube-controller-manager-master-2 1/1 Running 0 4h
kube-dns-86f4d74b45-wh795 3/3 Running 0 4h
kube-proxy-9ts6r 1/1 Running 0 4h
kube-proxy-hkbn7 1/1 NodeLost 0 4h
kube-proxy-sq6l6 1/1 Running 0 4h
kube-scheduler-master-0 1/1 Unknown 0 4h
kube-scheduler-master-1 1/1 Running 0 4h
kube-scheduler-master-2 1/1 Running 0 4h
weave-net-6nzbq 2/2 NodeLost 0 4h
weave-net-ndx2q 2/2 Running 0 4h
weave-net-w2mfz 2/2 Running 0 4h
view raw notready_pods hosted with ❤ by GitHub

After failing over one master node the Kubernetes cluster is still accessible.

[root@master-0 centos]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-0 NotReady master 4h v1.10.4
master-1 Ready master 4h v1.10.4
master-2 Ready master 4h v1.10.4
view raw notready_nodes hosted with ❤ by GitHub

[root@master-0 centos]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-master-0 1/1 Unknown 0 4h
kube-apiserver-master-1 1/1 Running 0 4h
kube-apiserver-master-2 1/1 Running 0 4h
kube-controller-manager-master-0 1/1 Unknown 0 4h
kube-controller-manager-master-1 1/1 Running 0 4h
kube-controller-manager-master-2 1/1 Running 0 4h
kube-dns-86f4d74b45-wh795 3/3 Running 0 4h
kube-proxy-9ts6r 1/1 Running 0 4h
kube-proxy-hkbn7 1/1 NodeLost 0 4h
kube-proxy-sq6l6 1/1 Running 0 4h
kube-scheduler-master-0 1/1 Unknown 0 4h
kube-scheduler-master-1 1/1 Running 0 4h
kube-scheduler-master-2 1/1 Running 0 4h
weave-net-6nzbq 2/2 NodeLost 0 4h
weave-net-ndx2q 2/2 Running 0 4h
weave-net-w2mfz 2/2 Running 0 4h
view raw notready_pods hosted with ❤ by GitHub

Even after one node failed, all the important components are up and running. The cluster is still accessible and you can create more pods, deployment services etc.

[root@master-1 centos]# kubectl create -f nginx.yaml
deployment.apps "nginx-deployment" created
[root@master-1 centos]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-deployment-75675f5897-884kc 1/1 Running 0 10s 10.117.113.98 master-2
nginx-deployment-75675f5897-crgxt 1/1 Running 0 10s 10.117.113.2 master-1

Conclusion

High availability is an important part of reliability engineering, focused on making system reliable and avoid any single point of failure of the complete system. At first glance, its implementation might seem quite complex, but high availability brings tremendous advantages to the system that requires increased stability and reliability. Using highly available cluster is one of the most important aspects of building a solid infrastructure.