AWS Cloud Developer Associate Certification

17.Monitoring,Logging, Auditing
Index

Monitoring Overview in AWS
CloudWatch
1. CloudWatch Metrics
2. CloudWatch Custom Metrics
3. CloudWatch Logs
4. CloudWatch Logs Hands On
5. CloudWatch Agent & CloudWatch Logs Agent
6. CloudWatch Logs Metric Filters
7. CloudWatch Alarms
8. CloudWatch Alarms Hands On
9. CloudWatch Events
EventBridge Overview
1. EventBridge Hands On
X-Ray Overview
1. X-Ray Hands On
2. X-Ray: Instrumentation and Concepts
3. X-Ray: Sampling Rules
4. X-Ray APIs
5. X-Ray with Beanstalk
6. X-Ray & ECS
CloudTrail
1. CloudTrail Hands On
CloudTrail vs CloudWatch vs X-Ray

Monitoring Overview in AWS

AWS CloudWatch:
1. Metrics: Collect and track key metrics
2. Logs: Collect, monitor, analyze and store log files
3. Events: Send notifications when certain events happen in your AWS
4. Alarms: React in real-time to metrics / events
AWS X-Ray:
1. Troubleshooting application performance and errors
2. Distributed tracing of microservices
AWS CloudTrail:
1. Internal monitoring of API calls being made
2. Audit changes to AWS Resources by your users

AWS CloudWatch Metrics

CloudWatch provides metrics for every services in AWS
Metric is a variable to monitor (CPUUtilization, NetworkIn…)
Metrics belong to namespaces
Dimension is an attribute of a metric (instance id, environment, etc…).
Up to 10 dimensions per metric
Metrics have timestamps
Can create CloudWatch dashboards of metrics

EC2 Detailed monitoring

EC2 instance metrics have metrics “every 5 minutes
With detailed monitoring (for a cost), you get data “every 1 minute”
Use detailed monitoring if you want to scale faster for your ASG!
The AWS Free Tier allows us to have 10 detailed monitoring metrics
Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)

CloudWatch Custom Metrics

Possibility to define and send your own custom metrics to CloudWatch
Example: memory (RAM) usage, disk space, number of logged in users …
Use API call PutMetricData
Ability to use dimensions (attributes) to segment metrics
1. Instance.id
2. Environment.name
Metric resolution (StorageResolution API parameter – two possible value):
1. Standard: 1 minute (60 seconds)
2. High Resolution: 1/5/10/30 second(s) – Higher cost
Important: Accepts metric data points two weeks in the past and two hours in the future (make sure to configure your EC2 instance time correctly)

CloudWatch Logs

Log groups: arbitrary name, usually representing an application
Log stream: instances within application / log files / containers
Can define log expiration policies (never expire, 30 days, etc..)
CloudWatch Logs can send logs to
1. Amazon S3 (exports)
2. • Kinesis Data Streams
3. • Kinesis Data Firehose
4. • AWS Lambda
5. • ElasticSearch

CloudWatch Logs – Sources

SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
• Elastic Beanstalk: collection of logs from application
• ECS: collection from containers
• AWS Lambda: collection from function logs
• VPC Flow Logs: VPC specific logs
• API Gateway
• CloudTrail based on filter
• Route53: Log DNS queries

CloudWatch Logs Metric Filter & Insights

CloudWatch Logs can use filter expressions
1. For example, find a specific IP inside of a log
2. Or count occurrences of “ERROR” in your logs
Metric filters can be used to trigger CloudWatch alarms
CloudWatch Logs Insights can be used to query logs and add queries to CloudWatch Dashboards

CloudWatch Logs – S3 Export

Log data can take up to 12 hours to become available for export
The API call is CreateExportTask
Not near-real time or real-time… use Logs Subscriptions instead

CloudWatch Logs for EC2

By default, no logs from your EC2 machine will go to CloudWatch
You need to run a CloudWatch agent on EC2 to push the log files
Make sure IAM permissions are coorect
The CloudWatch log agent can be setup on-premises too

CloudWatch Logs Agent & Unified Agent

For virtual servers (EC2 instances, on-premise servers…)
CloudWatch Logs Agent
1. Old version of the agent
2. Can only send to CloudWatch Logs
CloudWatch Unified Agent
1. Collect additional system-level metrics such as RAM, processes, etc…
2. Collect logs to send to CloudWatch Logs
3. Centralized configuration using SSM Parameter Store

CloudWatch Unified Agent – Metrics

Collected directly on your Linux server / EC2 instance
1. CPU (active, guest, idle, system, user, steal)
2. • Disk metrics (free, used, total), Disk IO (writes, reads, bytes, iops)
3. • RAM (free, inactive, used, total, cached)
4. • Netstat (number of TCP and UDP connections, net packets, bytes)
5. • Processes (total, dead, bloqued, idle, running, sleep)
6. • Swap Space (free, used, used %)
Reminder: out-of-the box metrics for EC2 – disk, CPU, network (high level)

CloudWatch Logs Metric Filter

CloudWatch Logs can use filter expressions
1. • For example, find a specific IP inside of a log
2. • Or count occurrences of “ERROR” in your logs
3. • Metric filters can be used to trigger alarms
• Filters do not retroactively filter data. Filters only publish the metric datapoints for events that happen after the filter was created.

CloudWatch Alarms

• Alarms are used to trigger notifications for any metric
• Various options (sampling, %, max, min, etc…)
• Alarm States:
1. • OK
2. • INSUFFICIENT_DATA
3. • ALARM
• Period:
• Length of time in seconds to evaluate the metric
• High resolution custom metrics: 10 sec, 30 sec or multiples of 60 sec

CloudWatch Alarm Targets

• Stop, Terminate, Reboot, or Recover an EC2 Instance
• Trigger Auto Scaling Action
• Send notification to SNS (from which you can do pretty much anything)

EC2 Instance Recovery

Status Check:
1. Instance status = check the EC2 VM
2. System status = check the underlying hardware
Recovery: Same Private, Public, Elastic IP, metadata, placement group

CloudWatch Alarm: good to know

Alarms can be created based on CloudWatch Logs Metrics Filters
To test alarms and notifications, set the alarm state to Alarm using CLI

aws cloudwatch set-alarm-state –alarm-name “myalarm” –state-value
ALARM –state-reason “testing purposes

CloudWatch Events

Event Pattern: Intercept events from AWS services (Sources
1. Example sources: EC2 Instance Start, CodeBuild Failure, S3, Trusted Advisor
2. Can intercept any API call with CloudTrail integration
Schedule or Cron (example: create an event every 4 hours
A JSON payload is created from the event and passed to a target
1. Compute: Lambda, Batch, ECS task
2. • Integration: SQS, SNS, Kinesis Data Streams, Kinesis Data Firehose
3. • Orchestration: Step Functions, CodePipeline, CodeBuild
4. • Maintenance: SSM, EC2 Actions

EventBridge Overview

EventBridge is the next evolution of CloudWatch Events
• Default event bus: generated by AWS services (CloudWatch Events)
• Partner event bus: receive events from SaaS service or applications
(Zendesk, DataDog, Segment, Auth0…)
• Custom Event buses: for your own applications
• Event buses can be accessed by other AWS accounts
• Rules: how to process the events (similar to CloudWatch Events)

Amazon EventBridge Schema Registry

EventBridge can analyze the events in your bus and infer the schema
The Schema Registry allows you to generate code for your application, that will know in advance how data is structured in the event bus
Schema can be versioned

Amazon EventBridge vs CloudWatch Events

• Amazon EventBridge builds upon and extends CloudWatch Events.
• It uses the same service API and endpoint, and the same underlying service infrastructure.
• EventBridge allows extension to add event buses for your custom applications and your third-party SaaS apps.
• Event Bridge has the Schema Registry capability
• EventBridge has a different name to mark the new capabilities
• Over time, the CloudWatch Events name will be replaced with EventBridge

AWS X-Ray Overview

Debugging in Production, the good old way:
1. • Test locally
2. • Add log statements everywhere
3. • Re-deploy in production
• Log formats differ across applications using CloudWatch and analytics is hard.
• Debugging: monolith “easy”, distributed services “hard”
• No common views of your entire architecture!
• Enter… AWS X-Ray!

AWS X-Ray advantages

Troubleshooting performance (bottlenecks)
• Understand dependencies in a microservice architecture
• Pinpoint service issues
• Review request behavior
• Find errors and exceptions
• Are we meeting time SLA?
• Where I am throttled?
• Identify users that are impacted

X-Ray: In

AWS X-Ray Leverages Tracing

• Tracing is an end to end way to following a “request”
• Each component dealing with the request adds its own “trace”
• Tracing is made of segments (+ sub segments)
• Annotations can be added to traces to provide extra-information
• Ability to trace:
1. • Every request
2. • Sample request (as a % for example or a rate per minute)
• X-Ray Security:
1. • IAM for authorization
2. • KMS for encryption at rest

AWS X-Ray How to enable it?

Your code (Java, Python, Go, Node.js, .NET) must import the AWS X-Ray SDK
1. • Very little code modification needed
2. • The application SDK will then capture:
3. • Calls to AWS services
4. • HTTP / HTTPS requests
5. • Database Calls (MySQL, PostgreSQL, DynamoDB)
6. • Queue calls (SQS)
Install the X-Ray daemon or enable X-Ray AWS Integration
1. X-Ray daemon works as a low level UDP packet interceptor (Linux / Windows / Mac…)
2. AWS Lambda / other AWS services already run the X-Ray daemon for you
3. Each application must have the IAM rights to write data to X-Ray

X-Ray Instrumentation in your code

Instrumentation means the measure of product’s performance, diagnose errors, and to write trace information.
• To instrument your application code, you use the X-Ray SDK
• Many SDK require only configuration changes
• You can modify your application code to customize and annotation the data that the SDK sends to XRay, using interceptors, filters, handlers, middleware

X-Ray Concepts

• Segments: each application / service will send them
• Subsegments: if you need more details in your segment
• Trace: segments collected together to form an end-to-end trace
• Sampling: decrease the amount of requests sent to X-Ray, reduce cost
• Annotations: Key Value pairs used to index traces and use with filters
• Metadata: Key Value pairs, not indexed, not used for searching

The X-Ray daemon / agent has a config to send traces cross account:
• make sure the IAM permissions are correct – the agent will assume the role
• This allows to have a central account for all your application tracing

X-Ray: Sampling Rules

With sampling rules, you control the amount of data that you record
You can modify sampling rules without changing your code
By default, the X-Ray SDK records the first request each second, and five percent of any additional requests.
One request per second is the reservoir, which ensures that at least one trace is recorded each second as long the service is serving requests
Five percent is the rate at which additional requests beyond the reservoir size are sampled

X-Ray Write APIs (used by the X-Ray daemon)

PutTraceSegments: Uploads segment documents to AWS X-Ray
PutTelemetryRecords: Used by the AWS X-Ray daemon to upload telemetry.
1. SegmentsReceivedCount,
2. SegmentsRejectedCounts,
3. BackendConnectionErrors
GetSamplingRules: Retrieve all sampling rules (to know what/when to send)
GetSamplingTargets & GetSamplingStatisticSummaries: advanced
The X-Ray daemon needs to have an IAM policy authorizing the correct API calls

X-Ray Read APIs –

GetServiceGraph: main graph
BatchGetTraces: Retrieves a list of traces specified by ID. Each trace is a collection of segment documents that originates from a single request.
GetTraceSummaries: Retrieves IDs and annotations for traces available for a specified time frame using an optional filter. To get the full traces, pass the trace IDs to BatchGetTraces
GetTraceGraph: Retrieves a service graph for one or more specific traceIDs

X-Ray with Beanstalk

You can run the daemon by setting an option in the Elastic Beanstalk console or with a configuration file (in .ebextensions/xray-daemon.config)
Note: The X-Ray daemon is not provided for Multicontainer Docker

X-Ray & ECS

AWS CloudTrail

Provides governance, compliance and audit for your AWS Account
• CloudTrail is enabled by default!
• Get an history of events / API calls made within your AWS Account by:
1. • Console
2. • SDK
3. • CLI
4. • AWS Services
• Can put logs from CloudTrail into CloudWatch Logs or S3
• A trail can be applied to All Regions (default) or a single Region.
• If a resource is deleted in AWS, investigate CloudTrail first!

CloudTrail Events

• Management Events:
1. • Operations that are performed on resources in your AWS account
2. • Examples:
  1. • Configuring security (IAM AttachRolePolicy)
  2. • Configuring rules for routing data (Amazon EC2 CreateSubnet)
  3. • Setting up logging (AWS CloudTrail CreateTrail)
3. • By default, trails are configured to log management events.
4. • Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
• Data Events:
1. • By default, data events are not logged (because high volume operations)
2. • Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
3. • AWS Lambda function execution activity (the Invoke API)
• CloudTrail Insights Events:
1. Enable CloudTrail Insights to detect unusual activity in your account:
  1. • inaccurate resource provisioning
  2. • hitting service limits
  3. • Bursts of AWS IAM actions
  4. • Gaps in periodic maintenance activity
  5. • CloudTrail Insights analyzes normal management events to create a baseline
2. • And then continuously analyzes write events to detect unusual patterns
  1. • Anomalies appear in the CloudTrail console
  2. • Event is sent to Amazon S3
  3. • An EventBridge event is generated (for automation needs)

CloudTrail Events Retention

Events are stored for 90 days in CloudTrail
To keep events beyond this period, log them to S3 and use Athena

CloudTrail vs CloudWatch vs X-Ray

CloudTrail:
1. • Audit API calls made by users / services / AWS console
2. • Useful to detect unauthorized calls or root cause of changes
• CloudWatch:
1. • CloudWatch Metrics over time for monitoring
2. • CloudWatch Logs for storing application log
3. • CloudWatch Alarms to send notifications in case of unexpected metrics
• X-Ray:
1. • Automated Trace Analysis & Central Service Map Visualization
2. • Latency, Errors and Fault analysis
3. • Request tracking across distributed systems

Pages: 1234567891011121314151617181920212223

AWS Cloud Developer Associate Certification

Published by

sevanand yadav

Leave a comment Cancel reply

Share this:

Related

Published by

sevanand yadav

Leave a comment Cancel reply