Oops! Something went wrong while submitting the form.
We use cookies to improve your browsing experience on our website, to show you personalised content and to analize our website traffic. By browsing our website, you consent to our use of cookies. Read privacy policy.
In distributed systems like Kubernetes, logging is critical for monitoring and providing observability and insight into an application's operations. With the ever-increasing complexity of distributed systems and the proliferation of cloud-native solutions, monitoring and observability have become critical components in knowing how the systems are functioning.
Logs don’t lie! They have been one of our greatest companions when investigating a production incident.
How is logging in Kubernetes different?
Log aggregation in Kubernetes differs greatly from logging on traditional servers or virtual machines, owing to the way it manages its applications (pods).
When an app crashes on a virtual machine, its logs remain accessible until they are deleted. When pods are evicted, crashed, deleted, or scheduled on a different node in Kubernetes, the container logs are lost. The system is self-cleaning. As a result, you are left with no knowledge of why the anomaly occurred. Because default logging in Kubernetes is transient, a centralized log management solution is essential.
Kubernetes is highly distributed and dynamic in nature; hence, in production, you’ll most certainly be working with multiple machines that have multiple containers each, which can crash at any time. Kubernetes clusters add to the complexity by introducing new layers that must be monitored, each of which generates its own type of log.
We’ve curated some of the best tools to help you achieve this, alongside a simple guide on how to get started with each of them, as well as a comparison of these tools to match your use case.
PLG Stack
Introduction:
Promtail is an agent that ships the logs from the local system to the Loki cluster.
Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It indexes only metadata and doesn’t index the content of the log. This design decision makes it very cost-effective and easy to operate.
Grafana is the visualisation tool which consumes data from Loki data source
Loki is like Prometheus, but for logs: we prefer a multidimensional label-based approach to indexing and want a single-binary, easy to operate a system with no dependencies. Loki differs from Prometheus by focusing on logs instead of metrics, and delivering logs via push, instead of pull.
Click on Explore tab on the left side. Select Loki from the data source dropdown
EFK Stack
Introduction :
The Elastic Stack contains most of the tools required for log management
Elastic search is an open source, distributed, RESTful and scalable search engine. It is a NoSQL database, primarily to store logs and retrive logs from Fluentd.
Log shippers such as LogStash, Fluentd , Fluent-bit. It is an open source log collection agent which support multiple data sources and output formats. It can forward logs to solutions like Stackdriver, CloudWatch, Splunk, Bigquery, etc.
Kibana as the UI tool for querying, data visualisation and dashboards. It has ability to virtually build any type of dashboards using Kibana. Kibana Query Language (KQL) is used for querying elasticsearch data.
Fluentd ➖Deployed as daemonset as it need to collect the container logs from all the nodes. It connects to the Elasticsearch service endpoint to forward the logs.
ElasticSearch ➖ Deployed as statefulset as it holds the log data. A service endpoint is also exposed for Fluentd and Kibana to connect with it.
Kibana ➖ Deployed as deployment and connects to elasticsearch service endpoint.
Configuration Options :
Can be installed through helm chart as a Stack or as Individual components
More information related to deploying these Helm Charts can be found here
After installation is complete and Kibana Dashboard is accessible, We need to define index patterns to be able to see logs in Kibana.
From homepage, write Kibana / Index Patterns to search bar. Go to Index patterns page and click to Create index pattern on the right corner. You will see the list of index patterns here.
Add required patterns to your indices and From left side menu, click to discover and check your logs :)
Query Methods :
Elastic Search can be queried directly on any indices.
Graylog is a leading centralised log management solution built to open standards for capturing, storing, and enabling real-time analysis of terabytes of machine. it supports the Master-Slave Architecture. The Graylog Stack — Graylog v3, Elasticsearch v6 along with MongoDB v3.
Graylog is an open-source log management tool, using Elasticsearch as its storage. Unlike the ELK stack, which is built from individual components (Elasticsearch, Logstash, Kibana), Graylog is built as a complete package that can do everything.
One package with all the essentials of log processing: collect, parse, buffer, index, search, analyze
Additional features that you don’t get with the open-source ELK stack, such as role-based access control and alerts
Fits the needs of most centralized log management use-cases in one package
Easily scale both the storage (Elasticsearch) and the ingestion pipeline
Graylog's extractors allow to extract fields out of log messages using a lot of methods such grok expression, regex and json
Cons :
Visualization capabilities are limited, at least compared to ELK’s Kibana
Can’t use the whole ELK ecosystem, because they wouldn’t directly access the Elasticsearch API. Instead, Graylog has its own API
It is Not implemented for kubernetes distribution directly rather supports logging via fluent-bit/logstash/fluentd
Configurations Options
Graylog is very flexible in such a way that it supports multiple inputs (data sources ) we can mention :
GELF TCP.
GELF Kafka.
AWS Logs.
as well as Outputs (how can Graylog nodes forward messages) — we can mention :
GELF Output.
STDOUT.- query via http / rest api
Connecting External GrayLog Stack:
Host & IP (12201) TCP input to push logs to graylog stack directly
range=3600 - replace 3600 with time range (in seconds)
limit=100 - replace 100 with number of returned results
sort=timestamp:desc - replace timestamp:desc with field you want to sort
Using Dashboard:
One can easily navigate the filter section and perform search with the help of labels generated by log collectors.
Splunk Stack
Introduction
Splunk is used for monitoring and searching through big data. It indexes and correlates information in a container that makes it searchable, and makes it possible to generate alerts, reports and visualisations.
Configuration Options
1. Helm based Installation as well as Operator based Installation is supported
2. Splunk Connect for Kubernetes provides a way to import and search your Kubernetes logging, object, and metrics data in your Splunk platform deployment. Splunk Connect for Kubernetes supports importing and searching your container logs on ECS, EKS, AKS, GKE and Openshift
3. Splunk Connect for Kubernetes supports installation using Helm.
4. Splunk Connect for Kubernetes deploys a DaemonSet on each node. And in the DaemonSet, a Fluentd container runs and does the collecting job. Splunk Connector for Kubernetes collects three types of data - Logs, Objects and Metrics
5. We need a minimum of two Splunk platform indexes
One events index, which will handle logs and objects (you may also create two separate indexes for logs and objects).
One metrics index. If you do not configure these indexes, Kubernetes Connect for Splunk uses the defaults created in your HTTP Event Collector (HEC) token.
6. An HEC token will be required, before moving on to installation
Some of the other tools that are interesting but aren’t open source—but are too good not to talk about and offer end-to-end functionality for all your logging needs:
Sumo Logic :
This log management tool can store logs as well as metrics. It has a powerful search syntax, where you can define operations similarly to UNIX pipes.
Powerful query language
Capability to detect common log patterns and trends
Centralized management of agents
Supports Log Archival & Retention
Ability to perform Audit Trails and Compliance Tracking
Configuration Options :
A subscription to Sumo Logic will be required
Helm installation
Provides options to install side-by-side existing Prometheus Operator
Performance can be bad for searches over large data sets or long timeframes.
Deployment only available on Cloud, SaaS, and Web-Based
Expensive - Pricing is per ingested byte, so it forces you to pick and choose what you log, rather than ingesting everything and figuring it out later
Datadog:
Datadog is a SaaS that started up as a monitoring (APM) tool and later added log management capabilities as well.
You can send logs via HTTP(S) or syslog, either via existing log shippers (rsyslog, syslog-ng, Logstash, etc.) or through Datadog’s own agent. With it, observe your logs in real-time using the Live Tail, without indexing them. You can also ingest all of the logs from your applications and infrastructure, decide what to index dynamically with filters, and then store them in an archive.
It features Logging without Limits™, which is a double-edged sword: it’s harder to predict and manage costs, but you get pay-as-you-use pricing combined with the fact that you can archive and restore from archive
Log processing pipelines have the ability to process millions of logs per minute or petabytes per month seamlessly.
Automatically detects common log patterns
Can archive logs to AWS/Azure/Google Cloud storage and rehydrate them later
Easy search with good autocomplete (based on facets)
Integration with Datadog metrics and traces
Affordable, especially for short retention and/or if you rely on the archive for a few searches going back
Configuration options :
Datadog Agent installation using Helm. A Datadog account will be required to get an API Key and App Key.Installation can get a bit complex. You can find more information at - https://docs.datadoghq.com/getting_started/agent/
It is a bit complicated to set up for the first time. Is not quite easy to use or know at first about all the available features that Datadog has. The interface is tricky and can be a hindrance sometimes. Following that, if application fields are not mapped in the right way, filters are not that useful.
Datadog per host pricing can be very expensive.
Conclusion :
As one can see, each software has its own benefits and downsides. Grafana’s Loki is more lightweight than Elastic Stack in overall performance, supporting Persistent Storage Options.
That being said, the right solution platform really depends on each administrator's needs.
That’s all! Thank you.
If you enjoyed this article, please like it.
Feel free to drop a comment too.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
In distributed systems like Kubernetes, logging is critical for monitoring and providing observability and insight into an application's operations. With the ever-increasing complexity of distributed systems and the proliferation of cloud-native solutions, monitoring and observability have become critical components in knowing how the systems are functioning.
Logs don’t lie! They have been one of our greatest companions when investigating a production incident.
How is logging in Kubernetes different?
Log aggregation in Kubernetes differs greatly from logging on traditional servers or virtual machines, owing to the way it manages its applications (pods).
When an app crashes on a virtual machine, its logs remain accessible until they are deleted. When pods are evicted, crashed, deleted, or scheduled on a different node in Kubernetes, the container logs are lost. The system is self-cleaning. As a result, you are left with no knowledge of why the anomaly occurred. Because default logging in Kubernetes is transient, a centralized log management solution is essential.
Kubernetes is highly distributed and dynamic in nature; hence, in production, you’ll most certainly be working with multiple machines that have multiple containers each, which can crash at any time. Kubernetes clusters add to the complexity by introducing new layers that must be monitored, each of which generates its own type of log.
We’ve curated some of the best tools to help you achieve this, alongside a simple guide on how to get started with each of them, as well as a comparison of these tools to match your use case.
PLG Stack
Introduction:
Promtail is an agent that ships the logs from the local system to the Loki cluster.
Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It indexes only metadata and doesn’t index the content of the log. This design decision makes it very cost-effective and easy to operate.
Grafana is the visualisation tool which consumes data from Loki data source
Loki is like Prometheus, but for logs: we prefer a multidimensional label-based approach to indexing and want a single-binary, easy to operate a system with no dependencies. Loki differs from Prometheus by focusing on logs instead of metrics, and delivering logs via push, instead of pull.
Click on Explore tab on the left side. Select Loki from the data source dropdown
EFK Stack
Introduction :
The Elastic Stack contains most of the tools required for log management
Elastic search is an open source, distributed, RESTful and scalable search engine. It is a NoSQL database, primarily to store logs and retrive logs from Fluentd.
Log shippers such as LogStash, Fluentd , Fluent-bit. It is an open source log collection agent which support multiple data sources and output formats. It can forward logs to solutions like Stackdriver, CloudWatch, Splunk, Bigquery, etc.
Kibana as the UI tool for querying, data visualisation and dashboards. It has ability to virtually build any type of dashboards using Kibana. Kibana Query Language (KQL) is used for querying elasticsearch data.
Fluentd ➖Deployed as daemonset as it need to collect the container logs from all the nodes. It connects to the Elasticsearch service endpoint to forward the logs.
ElasticSearch ➖ Deployed as statefulset as it holds the log data. A service endpoint is also exposed for Fluentd and Kibana to connect with it.
Kibana ➖ Deployed as deployment and connects to elasticsearch service endpoint.
Configuration Options :
Can be installed through helm chart as a Stack or as Individual components
More information related to deploying these Helm Charts can be found here
After installation is complete and Kibana Dashboard is accessible, We need to define index patterns to be able to see logs in Kibana.
From homepage, write Kibana / Index Patterns to search bar. Go to Index patterns page and click to Create index pattern on the right corner. You will see the list of index patterns here.
Add required patterns to your indices and From left side menu, click to discover and check your logs :)
Query Methods :
Elastic Search can be queried directly on any indices.
Graylog is a leading centralised log management solution built to open standards for capturing, storing, and enabling real-time analysis of terabytes of machine. it supports the Master-Slave Architecture. The Graylog Stack — Graylog v3, Elasticsearch v6 along with MongoDB v3.
Graylog is an open-source log management tool, using Elasticsearch as its storage. Unlike the ELK stack, which is built from individual components (Elasticsearch, Logstash, Kibana), Graylog is built as a complete package that can do everything.
One package with all the essentials of log processing: collect, parse, buffer, index, search, analyze
Additional features that you don’t get with the open-source ELK stack, such as role-based access control and alerts
Fits the needs of most centralized log management use-cases in one package
Easily scale both the storage (Elasticsearch) and the ingestion pipeline
Graylog's extractors allow to extract fields out of log messages using a lot of methods such grok expression, regex and json
Cons :
Visualization capabilities are limited, at least compared to ELK’s Kibana
Can’t use the whole ELK ecosystem, because they wouldn’t directly access the Elasticsearch API. Instead, Graylog has its own API
It is Not implemented for kubernetes distribution directly rather supports logging via fluent-bit/logstash/fluentd
Configurations Options
Graylog is very flexible in such a way that it supports multiple inputs (data sources ) we can mention :
GELF TCP.
GELF Kafka.
AWS Logs.
as well as Outputs (how can Graylog nodes forward messages) — we can mention :
GELF Output.
STDOUT.- query via http / rest api
Connecting External GrayLog Stack:
Host & IP (12201) TCP input to push logs to graylog stack directly
range=3600 - replace 3600 with time range (in seconds)
limit=100 - replace 100 with number of returned results
sort=timestamp:desc - replace timestamp:desc with field you want to sort
Using Dashboard:
One can easily navigate the filter section and perform search with the help of labels generated by log collectors.
Splunk Stack
Introduction
Splunk is used for monitoring and searching through big data. It indexes and correlates information in a container that makes it searchable, and makes it possible to generate alerts, reports and visualisations.
Configuration Options
1. Helm based Installation as well as Operator based Installation is supported
2. Splunk Connect for Kubernetes provides a way to import and search your Kubernetes logging, object, and metrics data in your Splunk platform deployment. Splunk Connect for Kubernetes supports importing and searching your container logs on ECS, EKS, AKS, GKE and Openshift
3. Splunk Connect for Kubernetes supports installation using Helm.
4. Splunk Connect for Kubernetes deploys a DaemonSet on each node. And in the DaemonSet, a Fluentd container runs and does the collecting job. Splunk Connector for Kubernetes collects three types of data - Logs, Objects and Metrics
5. We need a minimum of two Splunk platform indexes
One events index, which will handle logs and objects (you may also create two separate indexes for logs and objects).
One metrics index. If you do not configure these indexes, Kubernetes Connect for Splunk uses the defaults created in your HTTP Event Collector (HEC) token.
6. An HEC token will be required, before moving on to installation
Some of the other tools that are interesting but aren’t open source—but are too good not to talk about and offer end-to-end functionality for all your logging needs:
Sumo Logic :
This log management tool can store logs as well as metrics. It has a powerful search syntax, where you can define operations similarly to UNIX pipes.
Powerful query language
Capability to detect common log patterns and trends
Centralized management of agents
Supports Log Archival & Retention
Ability to perform Audit Trails and Compliance Tracking
Configuration Options :
A subscription to Sumo Logic will be required
Helm installation
Provides options to install side-by-side existing Prometheus Operator
Performance can be bad for searches over large data sets or long timeframes.
Deployment only available on Cloud, SaaS, and Web-Based
Expensive - Pricing is per ingested byte, so it forces you to pick and choose what you log, rather than ingesting everything and figuring it out later
Datadog:
Datadog is a SaaS that started up as a monitoring (APM) tool and later added log management capabilities as well.
You can send logs via HTTP(S) or syslog, either via existing log shippers (rsyslog, syslog-ng, Logstash, etc.) or through Datadog’s own agent. With it, observe your logs in real-time using the Live Tail, without indexing them. You can also ingest all of the logs from your applications and infrastructure, decide what to index dynamically with filters, and then store them in an archive.
It features Logging without Limits™, which is a double-edged sword: it’s harder to predict and manage costs, but you get pay-as-you-use pricing combined with the fact that you can archive and restore from archive
Log processing pipelines have the ability to process millions of logs per minute or petabytes per month seamlessly.
Automatically detects common log patterns
Can archive logs to AWS/Azure/Google Cloud storage and rehydrate them later
Easy search with good autocomplete (based on facets)
Integration with Datadog metrics and traces
Affordable, especially for short retention and/or if you rely on the archive for a few searches going back
Configuration options :
Datadog Agent installation using Helm. A Datadog account will be required to get an API Key and App Key.Installation can get a bit complex. You can find more information at - https://docs.datadoghq.com/getting_started/agent/
It is a bit complicated to set up for the first time. Is not quite easy to use or know at first about all the available features that Datadog has. The interface is tricky and can be a hindrance sometimes. Following that, if application fields are not mapped in the right way, filters are not that useful.
Datadog per host pricing can be very expensive.
Conclusion :
As one can see, each software has its own benefits and downsides. Grafana’s Loki is more lightweight than Elastic Stack in overall performance, supporting Persistent Storage Options.
That being said, the right solution platform really depends on each administrator's needs.
Velotio Technologies is an outsourced software product development partner for top technology startups and enterprises. We partner with companies to design, develop, and scale their products. Our work has been featured on TechCrunch, Product Hunt and more.
We have partnered with our customers to built 90+ transformational products in areas of edge computing, customer data platforms, exascale storage, cloud-native platforms, chatbots, clinical trials, healthcare and investment banking.
Since our founding in 2016, our team has completed more than 90 projects with 220+ employees across the following areas:
Building web/mobile applications
Architecting Cloud infrastructure and Data analytics platforms