Monitoring services with Stackdriver – Stack Doctor
Articles,  Blog

Monitoring services with Stackdriver – Stack Doctor


YURI GRINSHTEYN: It’s more
important than ever for teams to take ownership of the
reliability of their services. This is especially true in
distributed environments, like an e-commerce
website that depends on services like
inventory, shopping cart, and notifications. Today, we’ll look at how to
use data from a service mesh to understand traffic,
error rates, and latencies in microservice applications
with Stackdriver. This is “The Stack Doctor.” [MUSIC PLAYING] Developers use microservices
to architect for portability, scale, and decoupling. This presents some challenges
with operations and management. You have to manage all
of the various services and understand the
interactions between them. This is where a
service mesh comes in. Let’s take a look at how a
service mesh and Stackdriver can work together to
help us understand what’s going on in a
distributed application. To start, we need services. We’ll use this application. It’s a web store that
sells really cool products. It also happens to be composed
of a number of microservices written in a variety
of languages. You can take a look
at it for yourself in this GitHub repository. In a complex
environment like this, how can we go about
measuring reliability? This is where a service
mesh is going to help us. But first, what
is a service mesh? According to the Istio
open source project, a service mesh is
used to describe the network of
microservices that make up distributed applications
and the interactions between them. The real value of a
service mesh like Istio is that it enhances the
security, reliability, and observability
of services, which are not necessarily easy to
do in Kubernetes without it. What does that mean for us? Well, let’s take a
look at what happens when we have our microservices
application managed by a service mesh. First, let’s create
a GKE cluster with Istio already
installed on it. Here’s the command we’ll
use to create the cluster and enable Istio. This will automatically set
up the Stackdriver adapter to send metrics, logs,
and traces to Stackdriver. Now that the cluster’s
up and running, let’s deploy our
microservices application. Here are all the
services that it creates. These services are instrumented
with the proxy sidecar. The proxies send
telemetry information to a component called Mixer,
which in turn sends it to Stackdriver. Let’s go there and see what
new information is available. We’ll start in the
metrics explorer. If we search for
Istio, we’ll see all of the new metrics that
are created by Istio for all of the services in the mesh. The first thing we want
to do is to configure alerting using these metrics
so that we can find out if any of our services have
issues with performance or availability. First, let’s create an alerting
policy for front-end latency. We’ll use the server
response latency metric filtered by destination service
name for just our front end, and further filter
by response code to just measure latency
for successful requests. Now let’s create
another alerting policy for our front end availability. This time, we’ll use the
server request count metric. We’ll also filter by destination
service name for the front end, and then further filter
it to only count responses that do not return a success. And there we go. Now we’re using information
from the service mesh to let us know when
our front end service experience is in degradation in
either latency or availability. But what happens when we do get
an alert that there’s an issue and our service is degraded? If the problem has
to do with latency, we’d want to start by
looking at tracing. Well, the good news is that
the Stackdriver adapter in the service mesh is not
just sending metric data, it’s also exporting traces. We don’t need to do anything
to instrument our code. The service mesh
trace has serviced the service calls for us and
sends the data to Stackdriver. On the other hand,
if the problem has to do with
availability or errors, we would want to have
easy access to debugging information, like logs. The service mesh is also
exporting logs to Stackdriver. We can look at Kubernetes
container logs for the proxy, for example, to see a
record of all requests. If there are any issues
we need to debug, this can be a great
source of information. So thanks to the data being
exported by the service mesh to Stackdriver, we’ve
configured alerting and service availability and performance,
looked at latency with traces, and dug into logs. Come back next
time, when we look at improving observability even
further with custom metrics. That’s it for this episode. This is “The Stack Doctor.” Stay healthy out there. [MUSIC PLAYING]

One Comment

  • Abhideep Chakravarty

    Killing Endpoint? Using Endpoint and Ishtio would create duplicate date. Endpoint monitoring would be the actual duplicate thing. What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *