L7 - Cloud Monitoring 2/2 Flashcards

1
Q

What is Logstash

A

Logstash = light-weight, open-source, server-side data processing pipeline that allows you to collect data from a variety of sources, transform it on the fly, and send it to your desired destination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

X-pack

A

X-Pack brought a number of deeply integrated enterprise capabilities to the Elastic Stack which included security, alterting, monitoring, graph analytics etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Beats?

A

(lightweight, single-purpose data shippers)

agents to collect data
filebeat for logs
metricbeat for metrics
heartbeat for health
elastic search provides integration for data injection and major systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Prometheus for Metrics?

A

Prometheus is an open source monitoring system
https://prometheus.io
Initially built by soundcloud.com now a Cloud Native Foundation project

Features
Metric collection in form of time series
Storage by a time series database
Query language for accessing the time series
Alerting
Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cloud Native Computing Foundation

A
  • pushes for a sustainable ecosystem for Cloud Native Computing
  • hosts several fast-growing open source projects including Kubernetes, Prometheus and Envoy
  • runs CloudNativeCon
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Prometheus Architecture

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Prometheus Scraping

A
  • metrics retrieved through /metrics endpoint
    metric types:
  • counter: cumulative metric montonically increasing
  • gauge: numerical value arbitrarily gone up and down
  • histogram: counts for buckets, total sum, # events
  • metric name and labels define a time series
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Prometheus Exporters

A

exporters allow to provide metrics for services that cannot be instrumented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Prometheus Alerting

A
  • separated in prometheus server and alertmanager
  • rules determine an alert
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Prometheus Alertmanager?

A
  • manages alerts, including silencing, inhibition, grouping and sending out notifications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Prometheus Scalability?

A
  • hierarchical federation of servers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does visualization with Grafana work?

A
  • open-source
  • prometheus can be used as a data source (connect through ip-address and port; create your own dashboard)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do Borgmon instances do?

A
  • receives a list of target services e.g. from a discovery service
  • periodically collects the service monitoring interface:
    (the collection is distributed over the period;
    it decodes the results and stores them in memory as a time series)
  • metrics are counters and gauges
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are gauges?

A

instantaneous measurements e.g. CPU utilization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does time series storage work?

A
  • each metric is stored in a time series
  • entries are (timestamp, value) pairs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is in-memory buffering for Google Borgmon?

A
  • local buffer is designed to hold the time series for a certain time horizon
  • oldest entries are deleted once the horizon is reached
17
Q

How to compute the ratio of error to requests?

A
  1. Aggregate rates of response codes across instances
  2. Compute error rate for entire cluster
  3. Compute ratio of error to requests
18
Q

Alerting rules in Borgmon

A
  • specify a condition for alerting
  • minimum duration of alerting situation (alerts have pending and fire state)
19
Q

Borgmon summary concerning monitoring and usage

A

monitoring system:
- provides measurements of metrics
- stores measurements as time series
- rules for aggregation
- hierarchical design for scalability

usage:
- alerting
- dashboard

20
Q

What is Amazon CloudWatch?

A
  • monitoring and management system
  • collects:
  • metrics and logs
21
Q

What are some preselected metrics in Amazon CloudWatch?

A
  • CPU utilization
  • read/write latency
  • request counts and latency for LB
22
Q

How can you access the management console for Amazon CloudWatch?

A
  • CLI
  • Web service API
  • Libraries for Java etc.
23
Q

What is Google Dapper?

A

For distributed tracing.

24
Q

What is tracing?

A
  • capture the interaction of different services
  • capture the individual events e.g. submit a request, receive the request, start processing, submit answer, receive answer
  • associate events with a given request to be able to analyze the execution of this request
25
Q

What are the pros of Google Dapper?

A
  • continuous and ubiquitous tracing
  • low-overhead
  • application transparency
  • scalability
  • can collect payload data if added by application developer
  • can be used to enforce security policies (authentication and encryption)
  • allows for runtime verification that provides greater assurance than source code audits
26
Q

What is a Dapper trace tree?

A
  • nodes are called spans: lifetime of a request
  • edges indicate temporal relationship
27
Q

What is there to know about spans?

A
  • represented as remote procedure call (RPC)
  • attributes:
  • span id: identifies a span
  • parent id: span id of triggering span
  • trace id: identifies triggering request
28
Q

What are the advantages of Dapper?

A
  • Dapper does not collect any payload data
  • Dapper can be used to enforce security policies
  • Such runtime verification provides greater assurance than source code
    audits
  • Continuous and ubiquitous tracing
  • Low-overhead
  • Application transparency
  • Scalability
29
Q

Why is there overhead in Dapper?

A
  • trace generation and collection
  • amount of resources to store and analyze trace data
30
Q

How to reduce overhead in Dapper?

A
  • coalesce events: multiple trace events are coalesced to a log file write operation
  • asynchronous writes: writes are asynchronous to the traced application
  • adaptive sampling at the application (only a certain rate of requests per second are captured)
31
Q

What can Dapper Depot API be used for?

A
  • access to traces via trace id
32
Q

What is an open-source alternative to Dapper?

A

Open Telemetry