• United States+1
  • United Kingdom+44
  • Afghanistan (‫افغانستان‬‎)+93
  • Albania (Shqipëri)+355
  • Algeria (‫الجزائر‬‎)+213
  • American Samoa+1684
  • Andorra+376
  • Angola+244
  • Anguilla+1264
  • Antigua and Barbuda+1268
  • Argentina+54
  • Armenia (Հայաստան)+374
  • Aruba+297
  • Australia+61
  • Austria (Österreich)+43
  • Azerbaijan (Azərbaycan)+994
  • Bahamas+1242
  • Bahrain (‫البحرين‬‎)+973
  • Bangladesh (বাংলাদেশ)+880
  • Barbados+1246
  • Belarus (Беларусь)+375
  • Belgium (België)+32
  • Belize+501
  • Benin (Bénin)+229
  • Bermuda+1441
  • Bhutan (འབྲུག)+975
  • Bolivia+591
  • Bosnia and Herzegovina (Босна и Херцеговина)+387
  • Botswana+267
  • Brazil (Brasil)+55
  • British Indian Ocean Territory+246
  • British Virgin Islands+1284
  • Brunei+673
  • Bulgaria (България)+359
  • Burkina Faso+226
  • Burundi (Uburundi)+257
  • Cambodia (កម្ពុជា)+855
  • Cameroon (Cameroun)+237
  • Canada+1
  • Cape Verde (Kabu Verdi)+238
  • Caribbean Netherlands+599
  • Cayman Islands+1345
  • Central African Republic (République centrafricaine)+236
  • Chad (Tchad)+235
  • Chile+56
  • China (中国)+86
  • Christmas Island+61
  • Cocos (Keeling) Islands+61
  • Colombia+57
  • Comoros (‫جزر القمر‬‎)+269
  • Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)+243
  • Congo (Republic) (Congo-Brazzaville)+242
  • Cook Islands+682
  • Costa Rica+506
  • Côte d’Ivoire+225
  • Croatia (Hrvatska)+385
  • Cuba+53
  • Curaçao+599
  • Cyprus (Κύπρος)+357
  • Czech Republic (Česká republika)+420
  • Denmark (Danmark)+45
  • Djibouti+253
  • Dominica+1767
  • Dominican Republic (República Dominicana)+1
  • Ecuador+593
  • Egypt (‫مصر‬‎)+20
  • El Salvador+503
  • Equatorial Guinea (Guinea Ecuatorial)+240
  • Eritrea+291
  • Estonia (Eesti)+372
  • Ethiopia+251
  • Falkland Islands (Islas Malvinas)+500
  • Faroe Islands (Føroyar)+298
  • Fiji+679
  • Finland (Suomi)+358
  • France+33
  • French Guiana (Guyane française)+594
  • French Polynesia (Polynésie française)+689
  • Gabon+241
  • Gambia+220
  • Georgia (საქართველო)+995
  • Germany (Deutschland)+49
  • Ghana (Gaana)+233
  • Gibraltar+350
  • Greece (Ελλάδα)+30
  • Greenland (Kalaallit Nunaat)+299
  • Grenada+1473
  • Guadeloupe+590
  • Guam+1671
  • Guatemala+502
  • Guernsey+44
  • Guinea (Guinée)+224
  • Guinea-Bissau (Guiné Bissau)+245
  • Guyana+592
  • Haiti+509
  • Honduras+504
  • Hong Kong (香港)+852
  • Hungary (Magyarország)+36
  • Iceland (Ísland)+354
  • India (भारत)+91
  • Indonesia+62
  • Iran (‫ایران‬‎)+98
  • Iraq (‫العراق‬‎)+964
  • Ireland+353
  • Isle of Man+44
  • Israel (‫ישראל‬‎)+972
  • Italy (Italia)+39
  • Jamaica+1876
  • Japan (日本)+81
  • Jersey+44
  • Jordan (‫الأردن‬‎)+962
  • Kazakhstan (Казахстан)+7
  • Kenya+254
  • Kiribati+686
  • Kosovo+383
  • Kuwait (‫الكويت‬‎)+965
  • Kyrgyzstan (Кыргызстан)+996
  • Laos (ລາວ)+856
  • Latvia (Latvija)+371
  • Lebanon (‫لبنان‬‎)+961
  • Lesotho+266
  • Liberia+231
  • Libya (‫ليبيا‬‎)+218
  • Liechtenstein+423
  • Lithuania (Lietuva)+370
  • Luxembourg+352
  • Macau (澳門)+853
  • Macedonia (FYROM) (Македонија)+389
  • Madagascar (Madagasikara)+261
  • Malawi+265
  • Malaysia+60
  • Maldives+960
  • Mali+223
  • Malta+356
  • Marshall Islands+692
  • Martinique+596
  • Mauritania (‫موريتانيا‬‎)+222
  • Mauritius (Moris)+230
  • Mayotte+262
  • Mexico (México)+52
  • Micronesia+691
  • Moldova (Republica Moldova)+373
  • Monaco+377
  • Mongolia (Монгол)+976
  • Montenegro (Crna Gora)+382
  • Montserrat+1664
  • Morocco (‫المغرب‬‎)+212
  • Mozambique (Moçambique)+258
  • Myanmar (Burma) (မြန်မာ)+95
  • Namibia (Namibië)+264
  • Nauru+674
  • Nepal (नेपाल)+977
  • Netherlands (Nederland)+31
  • New Caledonia (Nouvelle-Calédonie)+687
  • New Zealand+64
  • Nicaragua+505
  • Niger (Nijar)+227
  • Nigeria+234
  • Niue+683
  • Norfolk Island+672
  • North Korea (조선 민주주의 인민 공화국)+850
  • Northern Mariana Islands+1670
  • Norway (Norge)+47
  • Oman (‫عُمان‬‎)+968
  • Pakistan (‫پاکستان‬‎)+92
  • Palau+680
  • Palestine (‫فلسطين‬‎)+970
  • Panama (Panamá)+507
  • Papua New Guinea+675
  • Paraguay+595
  • Peru (Perú)+51
  • Philippines+63
  • Poland (Polska)+48
  • Portugal+351
  • Puerto Rico+1
  • Qatar (‫قطر‬‎)+974
  • Réunion (La Réunion)+262
  • Romania (România)+40
  • Russia (Россия)+7
  • Rwanda+250
  • Saint Barthélemy (Saint-Barthélemy)+590
  • Saint Helena+290
  • Saint Kitts and Nevis+1869
  • Saint Lucia+1758
  • Saint Martin (Saint-Martin (partie française))+590
  • Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)+508
  • Saint Vincent and the Grenadines+1784
  • Samoa+685
  • San Marino+378
  • São Tomé and Príncipe (São Tomé e Príncipe)+239
  • Saudi Arabia (‫المملكة العربية السعودية‬‎)+966
  • Senegal (Sénégal)+221
  • Serbia (Србија)+381
  • Seychelles+248
  • Sierra Leone+232
  • Singapore+65
  • Sint Maarten+1721
  • Slovakia (Slovensko)+421
  • Slovenia (Slovenija)+386
  • Solomon Islands+677
  • Somalia (Soomaaliya)+252
  • South Africa+27
  • South Korea (대한민국)+82
  • South Sudan (‫جنوب السودان‬‎)+211
  • Spain (España)+34
  • Sri Lanka (ශ්‍රී ලංකාව)+94
  • Sudan (‫السودان‬‎)+249
  • Suriname+597
  • Svalbard and Jan Mayen+47
  • Swaziland+268
  • Sweden (Sverige)+46
  • Switzerland (Schweiz)+41
  • Syria (‫سوريا‬‎)+963
  • Taiwan (台灣)+886
  • Tajikistan+992
  • Tanzania+255
  • Thailand (ไทย)+66
  • Timor-Leste+670
  • Togo+228
  • Tokelau+690
  • Tonga+676
  • Trinidad and Tobago+1868
  • Tunisia (‫تونس‬‎)+216
  • Turkey (Türkiye)+90
  • Turkmenistan+993
  • Turks and Caicos Islands+1649
  • Tuvalu+688
  • U.S. Virgin Islands+1340
  • Uganda+256
  • Ukraine (Україна)+380
  • United Arab Emirates (‫الإمارات العربية المتحدة‬‎)+971
  • United Kingdom+44
  • United States+1
  • Uruguay+598
  • Uzbekistan (Oʻzbekiston)+998
  • Vanuatu+678
  • Vatican City (Città del Vaticano)+39
  • Venezuela+58
  • Vietnam (Việt Nam)+84
  • Wallis and Futuna+681
  • Western Sahara (‫الصحراء الغربية‬‎)+212
  • Yemen (‫اليمن‬‎)+967
  • Zambia+260
  • Zimbabwe+263
  • Åland Islands+358
Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

How To Implement Chaos Engineering For Microservices Using Istio

“Embrace Failures. Chaos and failures are your friends, not enemies.” A microservice ecosystem is going to fail at some point. The issue is not if you fail, but when you fail, will you notice or not. It’s between whether it will affect your users because all of your services are down, or it will affect only a few users and you can fix it at your own time.

Chaos Engineering is a practice to intentionally introduce faults and failures into your microservice architecture to test the resilience and stability of your system. Istio can be a great tool to do so. Let's have a look at how Istio made it easy.

For more information on how to setup Istio and what are virtual service and Gateways, please have a look at the following blog, how to setup Istio on GKE.

Fault Injection With Istio

Fault injection is a testing method to introduce errors into your microservice architecture to ensure it can withstand the error conditions. Istio lets you injects errors at HTTP layer instead of delaying the packets or killing the pods at network layer. This way, you can generate various types of HTTP error codes and test the reaction of your services under those conditions. 

Generating HTTP 503 Error

Here we see that two pods are running two different versions of recommendation service using the recommended tutorial while installing the sample application.

Currently, the traffic on the recommendation service is automatically load balanced between those two pods.

kubectl get pods -l app=recommendation
NAME READY STATUS RESTARTS AGE
recommendation-v1-798bf87d96-d9d95 2/2 Running 0 1h
recommendation-v2-7bc4f7f696-d9j2m 2/2 Running 0 1h
view raw pod-gist hosted with ❤ by GitHub

Now let's apply a fault injection using virtual service which will send 503 HTTP error codes in 30% of the traffic serving the above pods.

CODE: https://gist.github.com/velotiotech/165e3bbf3030eb267e7d2769ccd30328.

To test whether it is working, check the output from the curl of customer service microservice endpoint. 

You will find the 503 error on approximately 30% of the request coming to recommendation service.

To restore normal operation, please delete the above virtual service using:

kubectl delete -f recommendation-fault.yaml
view raw delete-gist hosted with ❤ by GitHub

Delay

The most common failure we see in production is not the down service, rather a delay service. To inject network latency as a chaos experiment, you can create another virtual service. Sometimes, it happens that your application doesn’t respond on time and creates chaos in the complete ecosystem. How to simulate that behavior, let's have a look.

CODE: https://gist.github.com/velotiotech/deb94b26a9cfe34194ee0bc8d8d8a58c.

Now, if you hit the URL of endpoints of the above service in a loop, you will see the delays in some of the requests. 

Retry

In some of the production services, we expect that instead of failing instantly, it should retry N number of times to get the desired output. If not succeeded, then only a request should be considered as failed.

For that mechanism, you can insert retries on those services as follows:

CODE: https://gist.github.com/velotiotech/9608a8fd1c711937c23a7703f3184b58.

Now any request coming to recommendation will do 3 attempts before considering it as failed.

Timeout

In the real world, an application faces most failures due to timeouts. It can be because of more load on the application or any other latency in serving the request. Your application should have proper timeouts defined, before declaring any request as "Failed". You can use Istio to simulate the timeout mechanism and give our application a limited amount of time to respond before giving up.

Wait only for N seconds before failing and giving up.

kind: VirtualService
metadata:
name: recommendation
spec:
hosts:
- recommendation
http:
- route:
- destination:
host: recommendation
timeout: 1.000s
view raw timeout-gist hosted with ❤ by GitHub

Conclusion

Istio lets you inject faults at the HTTP layer for your application and improves its resilience and stability. But, the application must handle the failures and take appropriate course of action. Chaos Engineering is only effective when you know your application can take failures, otherwise, there is no point in testing for chaos if you know your application is definitely broken.

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

How To Implement Chaos Engineering For Microservices Using Istio

“Embrace Failures. Chaos and failures are your friends, not enemies.” A microservice ecosystem is going to fail at some point. The issue is not if you fail, but when you fail, will you notice or not. It’s between whether it will affect your users because all of your services are down, or it will affect only a few users and you can fix it at your own time.

Chaos Engineering is a practice to intentionally introduce faults and failures into your microservice architecture to test the resilience and stability of your system. Istio can be a great tool to do so. Let's have a look at how Istio made it easy.

For more information on how to setup Istio and what are virtual service and Gateways, please have a look at the following blog, how to setup Istio on GKE.

Fault Injection With Istio

Fault injection is a testing method to introduce errors into your microservice architecture to ensure it can withstand the error conditions. Istio lets you injects errors at HTTP layer instead of delaying the packets or killing the pods at network layer. This way, you can generate various types of HTTP error codes and test the reaction of your services under those conditions. 

Generating HTTP 503 Error

Here we see that two pods are running two different versions of recommendation service using the recommended tutorial while installing the sample application.

Currently, the traffic on the recommendation service is automatically load balanced between those two pods.

kubectl get pods -l app=recommendation
NAME READY STATUS RESTARTS AGE
recommendation-v1-798bf87d96-d9d95 2/2 Running 0 1h
recommendation-v2-7bc4f7f696-d9j2m 2/2 Running 0 1h
view raw pod-gist hosted with ❤ by GitHub

Now let's apply a fault injection using virtual service which will send 503 HTTP error codes in 30% of the traffic serving the above pods.

CODE: https://gist.github.com/velotiotech/165e3bbf3030eb267e7d2769ccd30328.

To test whether it is working, check the output from the curl of customer service microservice endpoint. 

You will find the 503 error on approximately 30% of the request coming to recommendation service.

To restore normal operation, please delete the above virtual service using:

kubectl delete -f recommendation-fault.yaml
view raw delete-gist hosted with ❤ by GitHub

Delay

The most common failure we see in production is not the down service, rather a delay service. To inject network latency as a chaos experiment, you can create another virtual service. Sometimes, it happens that your application doesn’t respond on time and creates chaos in the complete ecosystem. How to simulate that behavior, let's have a look.

CODE: https://gist.github.com/velotiotech/deb94b26a9cfe34194ee0bc8d8d8a58c.

Now, if you hit the URL of endpoints of the above service in a loop, you will see the delays in some of the requests. 

Retry

In some of the production services, we expect that instead of failing instantly, it should retry N number of times to get the desired output. If not succeeded, then only a request should be considered as failed.

For that mechanism, you can insert retries on those services as follows:

CODE: https://gist.github.com/velotiotech/9608a8fd1c711937c23a7703f3184b58.

Now any request coming to recommendation will do 3 attempts before considering it as failed.

Timeout

In the real world, an application faces most failures due to timeouts. It can be because of more load on the application or any other latency in serving the request. Your application should have proper timeouts defined, before declaring any request as "Failed". You can use Istio to simulate the timeout mechanism and give our application a limited amount of time to respond before giving up.

Wait only for N seconds before failing and giving up.

kind: VirtualService
metadata:
name: recommendation
spec:
hosts:
- recommendation
http:
- route:
- destination:
host: recommendation
timeout: 1.000s
view raw timeout-gist hosted with ❤ by GitHub

Conclusion

Istio lets you inject faults at the HTTP layer for your application and improves its resilience and stability. But, the application must handle the failures and take appropriate course of action. Chaos Engineering is only effective when you know your application can take failures, otherwise, there is no point in testing for chaos if you know your application is definitely broken.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings