AWS Modernization with Datadog > Monitor the Platform > Module 3.2: Monitoring the Kubernetes Platform > 3.2.3 Create a Dashboard

3.2.3 Create a Dashboard

From the Dashboard menu, you can also create new Dashboards. There are now three choices: New Dashboard, New Timeboard, and New Screenboard. We have always had the last two, but we are merging the best features of each into the Dashboard. Enter a name for your dashboard and click the New Dashboard button.
Scroll through the list of available widgets and then drag and drop the Timeseries onto the canvas.
Under Graph your data, click on Metric to see the other types of data that you can graph. Click on the metric, which defaults to system.cpu.user to see the list of all the metrics available. There are thousands of metrics available.
Take a look at some of the other options available on this page. You can also combine multiple metrics using the Advanced… link.
Click the Overview tab at the top. This is a nice way to preview some of the functions available, though there are many more on the main Edit tab when you click the plus button to the right of any metric.
In the previous section we saw the Correlations view. When you click the Correlations tab here, you can customize how correlations are found.
That’s a quick look at creating a dashboard. When you start working with Kubernetes, you will probably start with the dashboards provided and then gradually customize them and build your own. We have a complete course on Monitoring the Kubernetes Platform available on the Datadog Learning Center. Here are some of the key metrics you should consider keeping an eye on:

etcd metric	description
etcd.server.proposals.committed.total, etcd.server.proposals.applied.total, etcd.server.proposals.failed.total, etcd.server.proposals.pending	Proposals occur when configuration changes are sent from the leader of the cluster to the other nodes in the cluster. Most should be successful, but errors are important to keep an eye on.
etcd.disk.wal.fsync.duration.seconds, etcd.disk.backend.commit.duration.seconds	The etcd cluster sends proposals to each other via fsync, so disk performance is going to affect proposals.
etcd.debugging.mvcc.db.total.size.in_bytes	Database size has a finite max and you need to make sure it always stays below that level.
etcd.grpc.server.msg.received.total, etcd.grpc.server.msg.sent.total, etcd.network.client.grpc.received.bytes.total, etcd.network.client.grpc.sent.bytes.total	Network performance is going to affect the notifications of a proposals success to it’s important to watch that too.

apiserver metric	description
kube_apiserver.rest_client_requests_total, kube_apiserver.rest_client_requests_total.count, kube_apiserver.rest_client_request_latency_seconds.sum, kube_apiserver.authenticated_user_requests, kube_apiserver.rest_client_request_latency_seconds.count, kube_apiserver.apiserver_request_count, kube_apiserver.apiserver_request_total, kube_apiserver.authenticated_user_requests.count, kube_apiserver.current_inflight_requests, kube_apiserver.apiserver_request_count.count, kube_apiserver.apiserver_request_total.count	The apiserver is at its heart a webserver, so you need to monitor it like you would a webserver
docker.container.open_fds, the various docker.mem metrics, and kubernetes.cpu	Again, it’s a webserver and these are some other metrics you would monitor with a webserver

apiserver metric

description

kube_apiserver.rest_client_requests_total, kube_apiserver.rest_client_requests_total.count, kube_apiserver.rest_client_request_latency_seconds.sum, kube_apiserver.authenticated_user_requests, kube_apiserver.rest_client_request_latency_seconds.count, kube_apiserver.apiserver_request_count, kube_apiserver.apiserver_request_total, kube_apiserver.authenticated_user_requests.count, kube_apiserver.current_inflight_requests, kube_apiserver.apiserver_request_count.count, kube_apiserver.apiserver_request_total.count

The apiserver is at its heart a webserver, so you need to monitor it like you would a webserver

docker.container.open_fds, the various docker.mem metrics, and kubernetes.cpu

Again, it’s a webserver and these are some other metrics you would monitor with a webserver

controller manager and scheduler metrics	description
kube_controller_manager.nodes.count, kube_controller_manager.nodes.unhealthy	Making sure all the nodes are available and healthy is a good first step.
kube_controller_manager.queue.depth, kube_controller_manager.queue.retries	The Controller and scheduler work off of a queue so making sure the queue depth isn’t getting too big is important. A large queue can point to other issues
kube_controller_manager.client.http.requests	The controller also performs like a webserver

CoreDNS metrics	Description
coredns.request_count	Shows how many requests are coming into the DNS server.
coredns.cache_hits_count	Divide this the request count to see the cache hit rate. A low hit rate may show that you should raise the TTL value
coredns.request_duration.seconds.sum, coredns.request_duration.seconds.count	Understanding how long requests take to resolve is important
coredns.response_code_count	When CoreDNS encounters an error, an RCODE is generated. This shows how many of each error is occurring

Again, the course on Monitoring the Kubernetes Platform at https://learn.datadoghq.com goes into a lot of detail on this topic if you want to learn more.