Monitoring / Logging Standards

1

How to start working with us.

Geolance is a marketplace for remote freelancers who are looking for freelance work from clients around the world.

2

Create an account.

Simply sign up on our website and get started finding the perfect project or posting your own request!

3

Fill in the forms with information about you.

Let us know what type of professional you're looking for, your budget, deadline, and any other requirements you may have!

4

Choose a professional or post your own request.

Browse through our online directory of professionals and find someone who matches your needs perfectly, or post your own request if you don't see anything that fits!

Best practices for operating containers

This section provides tips and tricks to make the operation of a container more convenient. This practice covers several different issues from protection to monitoring. They aim at making applications more easily run on Kubernetes and container software. Some of the methods discussed here have inspired the 12-factor approach. The 12-factor methodology provides an excellent tool in developing cloud-native applications. This good practice does not equate to the same. Alternatively, you might run production workloads, with some exceptions. The importance is particularly subjective regarding security practices.

Metrics and logs

Allow you to monitor your system and detect early problems before they get out of control. Analyzing them requires a lot of expertise and knowledge. Unfortunately, software development efforts do not always provide the tools necessary to analyze metrics and logs in real-time. Analyzing data collected during the day is often done after business hours for security reasons. The analysis must be easy to perform by non-experts who may be using a different tool at each end.

To solve this problem, you need a platform that stores all the data collected during the day and provides an API for its analysis. An open-source solution that has been gaining influence recently is Prometheus. This article will describe how to add container support in Prometheus 2.1 alpha.

Container technology is revolutionizing the way applications are run.

Geolance provides tips and tricks to make the operation of a container more convenient. This practice covers several different issues from protection to monitoring. They aim at making applications more easily run on Kubernetes and container software. Some of the methods discussed here have inspired the 12-factor approach. The 12-factor methodology provides an excellent tool in developing cloud-native applications. This good practice does not equate to the same.

Follow this blog to learn more about how you can take advantage of container technology!

Logging in containers

The Docker daemon only provides the ability to log in text format when starting containers. You can not use command line parameters or environmental variables to configure how your logs are written. For example, you might want unstructured Syslog messages for general information and JSON-formatted messages for operational data. Also, there is no easy way of rotating logs efficiently with the current version of the Docker API.

Prometheus has a plugin system that allows it to scrape metrics from various sources, including Docker through like. This article describes how to write a Node.js exporter that can be used with Prometheus by extending its plugin system with support for container monitoring. It also provides some configuration examples for docker-py, the Python library used to interact with the Docker API.

Service discovery in containers

Container-to-container communication often requires services to register themselves as available. This registration process is done via a service registry, which Prometheus can scrape to monitor these microservices. Using this technique, you can also recreate container instances if they become unavailable without compromising availability. This article describes how to use etcd3 as a service registry for container applications using its Go client.

Monitoring Microservices Inside Containers

The first solution discussed here involves creating a NodeJS application that exposes their metrics and internally captures logs sent by klogd. The second solution uses rsyslog inside your containers so that Kubernetes nodes can forward log messages coming from containers.

Adding support for monitoring and logging in containers

The first solution to consider is a web application written in Node.js that listens on port 9125. This interface has metrics and logs collected by klogd inside your container, as shown here:

This allows you to scrape this information via Prometheus using the same configuration parameters as those used to monitor nodes. The service can also be provided by Docker with command-line options such as --metrics-addr=0.0.0.0:9125 or environment variables such as KUBE_METRICS_ADDR.

Structured logging with rsyslog

The second solution uses the rsyslog container image for containers. It is configured to forward container logs to a remote instance of rsyslogd running on the host by setting RSYSLOG_DESTINATION=192.168.0.1:514 in the container environment, which configures it to send all log messages with severity INFO or higher via TCP/UDP port 514 to 192.168.0.1. This article describes how to configure Docker-py so that logs are sent via this channel instead of being written locally into kmsg inside each container using standard UNIX domain sockets. This allows your application nodes and Kubernetes workers to obtain these messages without having them duplicated across multiple nodes.

This solution is better than the first because it does not require you to run a containerized server application that can be scraped by Prometheus and also allows for log forwarding from Kubernetes nodes without changing your infrastructure code. However, this solution requires an additional daemon running on each node to ensure the forwarding of logs from containers to the host machine, which may not be suitable for large deployments with tens or hundreds of machines.

What's next? The Docker community has been hard at work developing a complete platform for monitoring containers called cAdvisor. It already provides detailed information about CPU and memory usage inside containers as well as replica set/deployment information. Prometheus supports scraping this data so you only have to deploy one instance inside your cluster to manage the entire stack.

With our Node.js applications exposing metrics and logs through HTTP endpoints, Prometheus can scrape information about these processes using its built-in service discovery feature, which communicates with etcd3. This allows you to deploy all of your Prometheus instances inside Kubernetes - including on worker nodes - while still being able to monitor application containers in a self-contained way without requiring additional services to be deployed alongside them. Using this approach, you can create sophisticated monitoring systems capable of handling containerized microservices that expose their HTTP APIs for metrics and log collection or that implement more complex interactions such as calling an RPC interface exposed by another container.

This article describes how to build this kind of system based on Kubernetes and Docker using Node.js applications that expose their metrics and logs through HTTP endpoints. To accomplish this task you have to add the prom-client library to your project, configure a Docker client pointing at 127.0.0.1:2375, start exposing /metrics via a REST endpoint on port 9125 and implement a class that mixes in the LogHandler interface from Prometheus/client_golang.

Above all else this library lets you focus on building your application rather than worrying about providing access to its internal state over HTTP or writing code that interacts with a go-based Prometheus client library. This is particularly true for systems built from microservices where each container only exposes an RPC interface allowing it to be called by other containers. The method for this kind of communication is out-of-scope for this library, however, it would be very easy to create another that provides this functionality without losing the benefits mentioned above.

Carefully choose the image version

As you probably know, it is a terrible idea to hard code the version of any library or package used by your code because there's a very good chance that this version will be incompatible with the one being used inside your production Docker image. The best way to avoid this kind of issue is to link libraries dynamically so that their versions can be updated as needed. To achieve this goal, you need to create a minimal go-based client that exposes only those functions required by Prometheus and use it from your Node.js application.

In our sample project, we implemented two versions of each library: one called stretch for node_modules/Prometheus/client_golang/Prometheus directory and another called master for node_modules/Prometheus/client_golang. Once your project is configured to use the client library as discussed in the next section, you can choose which of these two directories to link against. If you want to use a newer version of Prometheus, such as 1. x, simply create a new directory and change the build script accordingly

Avoid running as root whenever possible

As you probably know, containers are supposed to be running with a nonprivileged user. As such, the prom clients InstallHandler function will try to switch to UID 1000 at launch time. Unfortunately, this is not always possible when using Docker images that were built without the --userspec flag set. This is why it might be necessary to call Docker Run with --user option or use your own custom Linux distribution image for your services to achieve this goal. The client library won't attempt this kind of operation if the Prometheus server being scraped belongs to another user/group other than 1000.

We have already discussed what an HTTP endpoint exposing metrics and logs should look like but here I'd like to show some code examples on how to implement it using the prom clients from Node.js.

First, you should create a PrometheusHandler class that implements both MetricsEndpoint and LogsEndpoint interfaces provided by prom clients:

LogHandler interface exports a LogMetric struct type which has a Name string field representing a metric name and an optional Value float64 field that represents its value. In addition, there's also a ResolvedValue bool field that can be either true or false depending on whether the log message contains enough data for the library to compute the correct value for this particular metric. If not, you might want to set it to false and provide more context in request headers such as X-Resolved-Value. I'll show you how to do this later in this article.

LogHandler interface exports a LogMessage struct type containing Name and Message string fields. In addition, there is also a ResolvedMessage bool field which can be true if the library was able to figure out enough metadata around this log message and attempt to resolve its value. If it cannot do that, then set this field to false.

Logging

A very common requirement in microservices architecture so you might want to create a Logger interface that provides an easy way to add all possible handlers for this purpose:

MetricsEndpoint interface exposes Metric structs containing Name and Help string fields, along with a Method string field containing the name of the Prometheus method being used. In addition, there is also a Labels map that can be used to attach additional metadata in the form of key/value pairs. Now let's take a look at how Node.js applications can implement these interfaces using prom clients:

First, we need to import both prom clients and initialize them by providing our HTTP server's URL address as well as some other options such as version or tracing function that will be executed in every request.

If you want to add a time function such as log timestamp, simply call MetricsEndpoint.SetTime with the appropriate value of time.Now().Format() method of Go language's time package:

Note that you can also use SetLabelValues to attach additional metadata here but keep in mind that this will result in HTTP GET requests which are not atomic, meaning two or more metrics might be returned on the same line if they contain different values for the same label key/value pairs. Prometheus web UI might not handle this case gracefully so you might want to provide more context by using X-Resolved-By header with some string representing your application name or version number instead.

TSDB as Prometheus Long Term Storage (LTS)

Since Prometheus doesn't support writing samples to v2 TSDB (Time Series Database) format yet, you might want to consider using Hypertable instead. It can provide higher throughput for a small cost of reduced data resolution and retention time.

Time Series Databases (TSDB) to the rescue

We have already discussed a lot of challenges introduced by microservices architecture on monitoring and logging so you might think that using multiple TSDBs from different services is the way to go. We've been using MongoDB as storage for some time now but I think it's about time we replaced it with something more suitable for this task.

It is worth noting that Prometheus can also be used as a data storage backend via its native Client HTTP API, but we're going to use Hypertable instead since it is much higher-performance and provides additional features such as hot/warm restarts and backup support.

To use Prometheus instead, you would need to implement a custom storage backend using Client HTTP API or simply use TSDB Proxy which I'll discuss in the next part of this article series.

TSDb Proxy

If your services are unable to expose either HTTP Client or HTPP Server endpoints due to Security Policy reasons, then you might want to consider implementing a tiny proxy that can be used by any Go application.

The basic idea is that it will intercept all incoming /metrics requests and parse them into a Hypertable row key and value. This means the proxy will need to know about all Prometheus servers and their HTTP endpoints to make this process possible.

HTTP Proxy Configuration

Since we've already imported both Prometheus and cli packages, we can now initialize them by using the default ClientHttpSettings struct provided by the Prometheus package:

The second parameter is used to provide a custom URL address which could be fetched from Consul or some other service discovery tool if necessary. If you don't do this, then the Prometheus server location will be derived from the remote_write field of ClientHttpSettings struct which defaults to http://prometheus:9090.

Use native logging mechanisms of containers and microservices

Logging is another crucial part of any application and it becomes even more interesting when we're working with containers or polyglot microservices. I'm not going to dive into the details here as dozens of articles online discuss this topic in great detail such as this one.

Instead, let's simply list some available tools and their pros/cons:

NATS Streaming Server - This can be useful if your cloud platform already provides NATS messaging service via its API. If you're using AWS, then consider using NATS Kestrel which is an enhanced version of NATS that also supports HTTP endpoints via an HTTP client library.

Go-kit / Akka-logger - These are probably the most popular choices for microservices written in Go. I'm not going to discuss them in-depth here since the information is already online and fairly easy to find.

Logrus - a logging library by a Docker team that can be integrated with Prometheus using the Prometheus Logs Exporter project.

Tibco Log Service - This is an enterprise-grade product by Tibco for collecting, enriching, and managing logs from multiple sources. It supports dozens of platforms including cloud platforms such as AWS and it also has several supported integrations for some monitoring tools out of the box so you might consider giving it a try if your company already uses their Enterprise Suite.

One thing that many people don't know is that you can use Docker labels when specifying Kubernetes nodes in a Deployment file. For example, if you want to run multiple instances of the same container on all nodes with Label "role=frontend" then your deployment definition would look like this:

Deployment Specification Definition

This will ensure that each node will have at least one frontend service instance running on it. If the Label "role=frontend" does not match any Kubernetes node, then it will be skipped.

Please note that you cannot specify more than one label since they are mutually exclusive so you might also want to set other labels via Node Selector or Taints & Tolerations.

This leads us to the next question: how can we take advantage of this feature? If you're using Helm Charts then it's very easy because they provide a way to define node labels and selectors for newly created Deployments, Services, etc.

If you don't use Helm or prefer to manage things manually (e.g., via YAML files), then your only alternative is to create a NodeSelector object and specify tolerations as necessary.

Avoid privileged containers

Privileged containers allow you to run a container with all capabilities enabled and in many cases, this is what we need when we're building our artifacts. However, in some cases, privileged containers can pose serious security risks which is why it's highly recommended that you avoid them.

Ignoring the best practices and running your application in a privileged container within production might lead to:

Leaked internal cluster URLs such as AWS EC2 VPC endpoint since there are no firewall restrictions inside the Kubernetes cluster when using Docker for Mac or Docker for Windows (as of version 17.12). This will allow hackers to access Pods directly so even if everything else fails they could achieve persistence by creating a secret object and mounting it into another namespace where a Kubernetes service is running.

Elevated privileges to the host file-system via a vulnerability in containerized processes so they can install rootkits, keyloggers, etc.

The privilege escalation example above is of course exaggerated since you need to be authenticated on all nodes first and then escalate from within a pod using ptrace but it demonstrates how easy things can go wrong when you're not careful.

In most cases, there are only two scenarios where running privileged containers makes sense:

Running a privileged container for performing operations that require special privileges such as setting up iptables rules or mounting disk images with large partitions. This happens automatically on your CI pipeline if you enable true multi-tenancy since all nodes are created with the privileged option set to true.

Running a built-in Docker container (e.g., gcr.io/google_containers/kube-proxy ) since they assume that the node has privileged capabilities by default. This is not an issue though because most hosted CI platforms run these for you when creating new build machines so it's far from being a risk in itself - whether or not your team decides to use it is up to you and often depends on the architecture of your cluster.

Avoid using Elasticsearch, Logstash, Fluentd, and Kibana for logging purposes

There's nothing wrong with ELK per se but I find there are better options out there that are more Kubernetes-native and provide built-in support for monitoring containers.

When hosting your applications with IaaS vendors like AWS, GCE, Azure, etc., then they usually provide their logging options which you can use instead of ELK. Please note that I'm not saying ELK does not work because it most certainly does but there are some concerns about availability under load so you might want to consider alternatives before making a decision based on this article alone.

The problem with using the default Elasticsearch/Logstash/Fluentd/Kibana combo is that at some point you will have to worry about sharding since if your log volume reaches multi-terabytes then it doesn't scale very well.

If you are not hosted on any IaaS provider then the only other option is to run Elasticsearch, Logstash, and Fluentd in a container or host it yourself which adds operational complexity to your stack. In my mind, one of the easiest ways around this is by using Sysdig Falco which is a behavior-monitoring agent for Linux containers that includes native support for monitoring Kubernetes pods. It provides an easy way of automatically monitoring microservice deployments and can be used with structured logging systems like Elasticsearch/Fluentd in addition to standard Syslog capabilities so you don't have to bother choosing between either one - they both work just fine at the same time.

Solutions such as Fluentd, Logstash, and Elasticsearch all require at least three n-1 containers to run in a cluster. Take Fluentd for example: when running it in high availability mode with 3 nodes you still need at least 5 containers to handle the load which is not very efficient if you ask me.

Option 2 is interesting because these tools are integrated into Kubelet so they don't require any additional configuration on your part outside of something like fluent-elasticsearch Docker image that already contains everything needed to get started. However, keep in mind that you need to initialize Elasticsearch/Fluentd yourself after starting up Kubelet since they are not automatically initialized by default (at the time of writing this).

Geolance is an on-demand staffing platform

We're a new kind of staffing platform that simplifies the process for professionals to find work. No more tedious job boards, we've done all the hard work for you.


Geolance is a search engine that combines the power of machine learning with human input to make finding information easier.

© Copyright 2024 Geolance. All rights reserved.