L7ESP K8s Monitoring with Grafana
Introduction
The Document below was created to give instruction to the DevOps support team to help them onboard the monitoring tool to enhance obervability across internal assets and customer assets.
Grafana Architecture
Grafana architecture (mutliple cluster monioring) will consist of one master thanos deployment at the master cluster with the grafana front end application, this master node will play as the single pane for monitoring. The client cluster will need a deployment of the thanos stack to ship metrics and log to the master cluster. In case the assets to be monitors are not on k8s, we will leverage the numerous datasource grafana provide to fetch metrics.
For a single cluster monitoring you can just leverage a prometheus agent scraping metrics from nodeexporter/kube state metrics on your nodes. Metrics are then shipped to grafana.
The logging of your application will be scraped and shipped to grafana by loki running jobs with promtail on the nodes to request logs from your k8s application.
At last grafana can also leverage API datasources to fetch metrics from other assests, such as Azure or AWS cloud environment. This will be done through their datasources.
Requirements
To install the grafana stack, on your kubernetes cluster, you must have
Kubernetes cluster to deploy your helm charts.
The kubectl
and helm
command line tools will be use respectively to programmatically access the Kubernetes cluster and the AWS account. To install the CLI tools, follow the link below and choose the correct operating sytem:
Grafana
The Grafana application can be installed using the helm chart. Download the helm chart on your laptop from the following website: Grafana helm chart them install it using helm.
Following are values.yaml parameters to managed your dashboards in grafana.
Automated uploads of new dashboard so user can upload new dashboard.json on the data folder of your confimap. That folder lable should match the label on the values file specified below.
dashboards:
enabled: true
SCProvider: true
# label that the configmaps with dashboards are marked with
label: grafana_dashboard
enable default dashboard at login by updating the following values parameters.
dashboards:
default_home_dashboard_path: /tmp/dashboards/k8s-dashboard.json
Set initial grafana login username and password
grafana:
enabled: true
# Pass values to Grafana child chart
adminUser: admin
adminPassword: admin
Datasource
Datasources are special back-end query API that query data from different assets. Below are a list of currently used data sources by ESP grafana.
blackboxexporter: A probe that leverages prometheus to check health and metrics from SSL certificates
thanos: High availability master-client architecture to scrape metrics from other clusters
prometheus: Time series data queryier for metrics
aws cloudwatch: Data source to ingest AWS cloudwatch data and logs
azure monitor: Data source to ingest Azure data and logs
Loki: Data source to ingest knative application logs.
Install datasources dependency helm charts
Datasources helm charts can be installed from the following websites: * Prometheus * prometheus-blackbox-exporter * Thanos * Loki * promtail
Values parameters configuration for each Datasources
Enable and configure each datasource from the values.yml:
configure prometheus
url: http://{{ .Release.Name }}-prometheus-server:80
prometheus:
server:
extraArgs:
log.level: debug
storage.tsdb.min-block-duration: 2h # Don't change this, see docs/components/sidecar.md
storage.tsdb.max-block-duration: 2h # Don't change this, see docs/components/sidecar.md
retention: 4h
service:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
statefulSet:
enabled: true
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "10902"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- prometheus
- key: component
operator: In
values:
- server
topologyKey: "kubernetes.io/hostname"
configure Thanos
url: http://{{ .Release.Name }}-thanos-query-frontend:9090
thanos:
objstoreConfig: |-
type: s3
config:
bucket:
endpoint:
access_key:
secret_key:
insecure: true
query:
stores:
-
# - SIDECAR-SERVICE-IP-ADDRESS-2:10901
bucketweb:
enabled: true
compactor:
enabled: true
storegateway:
enabled: true
ruler:
enabled: true
alertmanagers:
- http://l7-devops-monitoring-prometheus-alertmanager.grafana.svc.cluster.local:9093
config: |-
groups:
- name: "metamonitoring"
rules:
- alert: "PrometheusDown"
expr: absent(up{prometheus="monitoring/prometheus-operator"})
configure prometheus-blackbox
prometheus.yml:
scrape_configs:
- job_name: "healthchecks"
scrape_interval: 60s
scheme: https
metrics_path: projects/446fab8e-5a9f-4f41-a90b-f086581f64a5/metrics/lthdUEmLy2xTIbPH5Jngrm_BPMGopHT7
static_configs:
- targets: ["healthchecks.io"]
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- job_name: "snykmetrics"
static_configs:
- targets: ["localhost:9090"]
- targets: ["localhost:9532"]
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://clinops-stage.preview.l7esp.com
- https://stage.preview.l7esp.com
- https://250.preview.l7esp.com
- https://qa-clinops300.preview.l7esp.com
- https://preview.l7esp.com
- https://inspiring-jepsen-5082.edgestack.me
- https://lab7io.atlassian.net
- https://scipher-qa.l7esp.com
- https://cdn.l7esp.com
- https://lab7io.slack.com
- https://ci.l7informatics.com
- https://registry.l7informatics.com
- https://l7devopstest.azurecr.io
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: l7-devops-monitoring-prometheus-blackbox-exporter:9115 # Blackbox exporter scraping address
configure loki/promtails
url: http://{{ .Release.Name }}-loki.{{ .Release.Name }}.svc.cluster.local:3100
loki:
enabled: true
promtail:
enabled: true
scrapeConfigs: |
# See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
# Pods with a label 'app.kubernetes.io/name'
- job_name: kubernetes-pods-app-kubernetes-io-name
pipeline_stages:
{{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: replace
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
target_label: app
- action: drop
regex: ''
source_labels:
- app
- action: replace
source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_component
target_label: component
{{- if .Values.config.snippets.addScrapeJobLabel }}
- action: replace
replacement: kubernetes-pods-app-kubernetes-io-name
target_label: scrape_job
{{- end }}
{{- toYaml .Values.config.snippets.common | nindent 4 }}
Azure monitor
- name: Azure Monitor-1
type: grafana-azure-monitor-datasource
access: proxy
jsonData:
azureAuthType: clientsecret
cloudName: azuremonitor
tenantId:
clientId:
subscriptionId: l7 informatics
secureJsonData:
clientSecret:
version: 1
AWS cloudwatch
- name: CloudWatch
type: cloudwatch
jsonData:
authType: keys
defaultRegion: us-east-1a
secureJsonData:
accessKey: ''
secretKey: ''