Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (2 votes)
406 views

Monitoring & Metrics: Sysops Admin Exam Notes

The document discusses various services in AWS that can be monitored using CloudWatch. It provides details on monitoring EC2 instances, EBS volumes, RDS databases, ELB load balancers, and Elasticache using CloudWatch metrics. Key metrics for each service are listed, such as CPU utilization and IOPS for EBS, database connections and latency for RDS, request count and latency for ELB, and CPU usage, evictions and connections for Elasticache. The document also discusses setting alarms in CloudWatch and different volume status checks for EBS.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
406 views

Monitoring & Metrics: Sysops Admin Exam Notes

The document discusses various services in AWS that can be monitored using CloudWatch. It provides details on monitoring EC2 instances, EBS volumes, RDS databases, ELB load balancers, and Elasticache using CloudWatch metrics. Key metrics for each service are listed, such as CPU utilization and IOPS for EBS, database connections and latency for RDS, request count and latency for ELB, and CPU usage, evictions and connections for Elasticache. The document also discusses setting alarms in CloudWatch and different volume status checks for EBS.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

SysOps Admin Exam Notes

Monitoring & Metrics


● CloudWatch - A monitoring service
o Monitors the following:
▪ Autoscaling groups ▪ RDS Instances
▪ ELBs ▪ EMR Job Flows
▪ Route 53 Health Checks ▪ Redshift
▪ EBS Volumes ▪ SNS Topics
▪ Storage Gateways ▪ SQS Queues
▪ CloudFront ▪ OpsWorks
▪ Dynamo DB ▪ Cloudwatch Logs
▪ Elasticache Nodes

● CloudWatch & EC2


o Host level Metrics: CPU, Disk, Network, Status Check
o RAM Utilization is a custom metric. By default EC2 monitoring is 5 minute intervals, unless
you enable detailed monitoring.

● Storing CloudWatch Metrics


o Use GetMeticStatistics API to retrieve data
o You can retrieve data from any terminated EC2 or ELB after it has been terminated
o For custom metrics, the minimum granularity that you can have is 1 minute.

● CloudWatchAlarms
o Can be created to monitor any metric in your account (CPU Utilization, ELB latency, etc.)
o Alarms can be set on thresholds.

● EC2 Status Checks


o 2 types: System Status Checks, Instance Status Checks
▪ These can be seen in the console (instances>status checks)
o System Status Check (checks host)
▪ Loss of network connectivity
▪ loss of system power
▪ software issues on the physical host
▪ hardware issues on the physical host
▪ Resolution: Best way is by stopping and then starting the VM again (this is not
rebooting). This bring it back up on another host.
o Instance Status Check (checks VM/EC2 instance)
▪ Failed system status checks
▪ misconfigured networking or startup configuration
▪ exhausted memory
▪ corrupted file system
▪ incompatible kernel

Page 1 of 16
SysOps Admin Exam Notes
▪ Resolution: reboot the instance or making modifications in OS

● Monitoring EBS
o Links:
▪ EBS status checks
▪ EBS volume types
▪ Monitoring Status of EBS volumes
o 4 Different types of EBS volumes
▪ general purpose (SSD) - gp2
● Used for most workloads: System boot volumes, virtual desktops, low latency apps, dev and
test environments.
▪ provisioned IOPS (SSD) - io1
● Used for database workloads. Performance of more than 10,000 IOPS or 160 MiBs of
throughput data. MongoDB, Cassandra, MS SQL Server, MySQL, PostgreSQL, Oracle
▪ throughput optimized (HDD) - st1
● Used for streaming workloads. Big Data, Data Warehouses, log processing
▪ cold (HDD) - sc1
● Used for infrequently accessed data. Low storage cost.
o Remember API Names for exam
o IOPS & Volumes
▪ General Purpose SSD volumes have a base of 3 IOPS per/GiB of volume size
▪ Max volume size is 16,384 gb
▪ max IOPS size of 10,000 IOPS total (after that you need to move to provisioned IOPS)
▪ You can only burst up to 3,000 IOPS.
o I/O Credits
▪ When your volume requires more than the baseline performance I/O level, it starts to
use IO credits. This allows the bursting up to 3,000 IOPS.
▪ Initial credit is 5,400,000 I/O credits. This is enough to sustain the max burst of 3,000
IOPS for 30 minutes
▪ When not going over provisioned IO level (i.e. bursting) you will be earning credits.
o EBS CloudWatch Metrics
▪ VolumeReadBytes & VolumeWriteBytes - Provides information on the I/O operations
in a specified period of time. (statistics: sum, average, minimum, maximum,
SampleCount)
▪ VolumeReadOps & VolumeWriteOps - Total number of I/O operations in a specified
period of time.
▪ VolumeTotalReadTime & VolumeTotalWriteTime - The total number of seconds
spent by all operations that completed in a specified period of time.
▪ VolumbeIdleTime - Total number of seconds in a specified period of time when no
read or write operations were submitted.
▪ VolumeQueueLength
● The number of read and write operations requests waiting to be completed in a
specified period of time.
● Signifies disk is busy and not getting good performance; very high IOPS
o Volume Status Checks
▪ OK
● Normal (volume running as expected)

Page 2 of 16
SysOps Admin Exam Notes
▪WARNING
● Degraded (below expectations),
● Severely Degraded (well below expectations)
▪ IMPAIRED
● Disabled (volume is offline and pending recovery, or is waiting for the user to
enable I/O)
● Stalled (volume performance is severely impacted)
● Not Available (unable to determine I/O performance because I/O is disabled)
▪ INSUFFICIENT DATA
● Insufficient Data (no data about volume)
o Modifying EBS Volumes
▪ EBS volumes can be updated while attached to an EC2 instance

● Monitoring RDS
o 2 Types of Monitoring: Metrics & Events
o Metrics - They can be viewed in the console. You can use the graph and go from 1 hour to 1
week of visibility.
o Events - These subscriptions (event subscriptions) can be set up in the console (RDS
Dashboard) to get notifications on things like failing over RDS from one AZ to another.
● RDS Metrics
o DatabaseConnections - The number of database connections in use.
o DiskQueueDepth - The number of outstanding IOs (read/write requests) waiting to access the
disk.
o FreeStorageSpace - The amount of available storage space.
o ReplicaLag (seconds) - The amount of time a Read Replica DB instance lags behind the
source DB instance. Applies to MySQL, MariaDB, and PostgreSQL Read Replicas.
o ReadIOPS - The average number of disk read I/O operations per second.
o WriteIOPS - The average number of disk write I/O operations per second.
o ReadLatency - The average amount of time taken per disk I/O operation.
o WriteLatency - The average number of bytes written to disk per second.
o SwapUsage - The amount of swap space used on the DB instance.
o **Have a general idea of what each metric does**

● Monitoring ELB
o Monitored every 60 seconds (provided there is traffic)
o If there are no requests for data for a given metric, the metric will not be reported to
CloudWatch
o ELB Metrics
▪ HealthyHostCount - Number of healthy hosts inside the pool
▪ UnHealthyHostCount - Number of unhealthy hosts inside the pool
▪ RequestCount - Number of completed requests that were received and routed back to
the back-end-instances
▪ Latency - Measures the time elapsed in seconds after the request leaves the load
balancers until the response is received.

Page 3 of 16
SysOps Admin Exam Notes
▪ SurgeQueueLength - A count of the total number of request that are pending
submission to a registered instance
▪ SpilloverCount - A count of the total number of requests that were rejected due to the
queue being full.

● Monitoring Elasticache
o 4 important monitoring items to look at
▪ CPU Utilization
▪ Swap Usage
▪ Evictions
▪ Concurrent Connections

o CPU Utilization
Memcached
▪ Multi-threaded
▪ Can handle loads up to 90%. If it exceeds 90%, then add more nodes to the cluster.
Redis
▪ Not Multi-threaded
▪ To determine scale point, take 90 and divide by the number of cores.

o Swap Usage - The amount of the swap file that is used. Size of swap file equals size of RAM.
Memcached
▪ Should be 0 most of the time.
▪ If it exceeds 50mb, you should increase the
memcached_connections_overhead_parameter.
▪ The memcached_connections_overhead defines the amount of memory to be reserved
for memcached connections and other miscellaneous overhead.
Redis
▪ No SwapUsage metric; instead use reserved-memory

o Evictions - Think of evictions like tenants in an apartment building. There are a number of
empty apartments that slowly fill up with tenants. Eventually the apartment block is full,
however more tenants need to be added.
▪ An eviction occurs when a new item is added and an old item must be removed due to
lack of free space in the system.
Memcached
▪ No recommended setting.
▪ Either scale up (add memory) or scale out (add nodes)
Redis
▪ No recommended setting
▪ Only scale out (add replicas)

Page 4 of 16
SysOps Admin Exam Notes

o Concurrent Connections
Memcached & Redis
▪ No recommended setting. Choose a threshold based off application.
▪ If there are large and sustained spokes in the number of concurrent connections, this
can either mean a large traffic spike OR your application is not releasing connections as
it should be.
▪ NOTE: Set an alarm on the number of concurrent connections for Elasticache

High Availability
● Elasticity & Scalability 101
o Elasticity - Allows you to stretch out and retract back (think rubber band) your infrastructure,
based on your demand. Under this model you only pay for what you need. Elasticity is used
during a short time period, such as hours or days.
o Scalability - Used to talk about building out the infrastructure to meet your long term
demands. Scalability is used over longer time periods, such as weeks, days, months and years.
● Scaling Up - Increases the instance type from say a T1micro to a T2small, T2medium, etc.
● Scaling Out - Adding additional EC2 instances and using auto scaling.
● The larger the EC2 instance, the better network performance you get.
● RDS and Multi-AZ Failover
o Multi-AZ deployments for the MySQL, Oracle, and PostgreSQL engines uses synchronous
physical replication to keep data on the standby up-to date with the primary.
o Multi-AZ deployments for Microsoft SQL Server uses their built in synchronous logical
replication for the same results by using a mirroring technology.
o RDS Multi-AZ Failover Advantages
▪ High availability
▪ backups are taken from secondary which avoids I/O suspension to the primary
▪ restores are taken from secondary which avoids I/O suspension to the primary
▪ Exam Tip: You can force a failover from one AZ to another by rebooting your
instance.
▪ Multi-AZ is not a scaling solution
▪ Read Replicas are used for scaling
● RDS and Using Red Replicas
o Use Cases
▪ Scaling beyond the compute or I/O capacity of a single DB instance for read-heavy
database workloads.
▪ Serving read traffic while the source DB instance is unavailable.
▪ Business reporting or data warehousing scenarios.
▪ NOTE; You can have up to 5 read-replicas per master database
o Creating Read Replicas
▪ When creating a new Read Replica, AWS will take a snapshot of your database.
▪ If Multi-AZ is not enabled, the snapshot will be of your primary database and can cause
brief I/O suspension for around 1 minute.

Page 5 of 16
SysOps Admin Exam Notes
▪ If Multi-AZ is enabled, the snapshot will be of your secondary database and you will
not experience a performance hit.
▪ When a new read replica is created you will be able to connect to it using a new
endpoint DNS address.
▪ Read Replicas can be promoted to be its own standalone database.
▪ Exam Tips:
● You can have 5 read replicas
● You can have read replicas in different regions for all engines
● Replication is Asynchronous only, not synchronous
o When you execute something synchronously, you wait for it to finish
before moving on to another task. When you execute something
asynchronously, you can move on to another task before it finishes
● Read Replicas can be built off Multi-AZ databases
● Read Replicas themselves cannot be Multi-AZ
● Beware of latency with read replicas
● DB snapshots and automated backups cannot be taken of read replicas
● key metric to look for is Replica Lag
● **Know the difference between read replicas and Multi-AZ**
▪ More Exam Tips:
● If you can’t create a read replica, you most likely have disabled database
backups. Modify the database and turn them on.
● You can create read replicas of read replicas in multiple Regions
● You can either modify the database itself or create a new database from a
snapshot
● endpoints will not change if you modify a database, they will change if you
create a new database from a snapshot or if you create a read replica
o Troubleshooting Autoscaling - Things to look for if your instances are not launching:
▪ Associated Key Pair does not exist
▪ security group does not exist
▪ Autoscaling config is not working correctly
▪ Autoscaling group not found
▪ instance type specified is not supported in the AZ
▪ AZ is no longer supported
▪ invalid EBS device mapping
▪ Autoscaling service is not enabled on your account
▪ attempting to attach an EBS block device to an instance-store AMI
● Deployment & Provisioning
o Root Access to AWS Services
▪ Elastic Beanstalk
▪ Elastic MapReduce
▪ OpsWork
▪ EC2
● ELB Configuration
o EXAM TIPS
▪ You can use ELBs to load balance across different AZs within the same region, but not
to different regions (or different VPCs).

Page 6 of 16
SysOps Admin Exam Notes
▪ ELB and a NAT are totally different
o 2 Types of ELBS
▪ External (with external DNS names)
▪ Internal (with internal DNS names)
o Health Checks
▪ HTTP protocol
▪ Port 80
▪ Ping Path (/index.html)
▪ Reponses Timeout
▪ Health Check Interval
▪ Unhealthy Threshold
▪ Healthy Threshold
● Sticky Sessions
o You can use the sticky session feature (also known as session affinity), which enables the load
balancer to bind a user's session to a specific instance. This ensures that all requests from the
user during the session are sent to the same instance.
o 2 Types: Application Based, Duration Based
o Duration Based Stickiness
▪ Most commonly used. The load balancer creates the cookie.
▪ When the load balancer receives a request, it first checks to see if this cookie is present
in the request. If so, the request is sent to the application instance specified in the
cookie. If there is no cookie, the load balancer chooses an application instance based
on the existing load balancing algorithm and adds a new cookie in to the response.
▪ The stickiness policy configuration defines a cookie expiration (in seconds) which
establishes the duration of validity for each cookie. The cookie is automatically
updated after its duration expires.
▪ If an application instance fails or becomes unhealthy, the load balancer stops routing
requests to that instance. It instead chooses a new instance based on the existing load
balancing algorithm. The request is routed to the new instance as if there is no cookie
and the session is no longer sticky.
o Application Based Stickiness
▪ The load balancer uses a special cookie to associate the session with the original server
that handled the request, but follows the lifetime of the application-generated cookie
corresponding to the cookie name specified in the policy configuration.
▪ The load balancer only inserts a new stickiness cookie if the application response
includes a new application cookie. The load balancer stickiness cookie does not update
with each request. If the application cookie is explicitly removed or expires, the
session stops being sticky until a new application cookie is issued.
▪ If an application instance fails or becomes unhealthy, the load balancer stops routing
requests to that instance. It instead chooses a new healthy application based on the
existing load balancing algorithm. The load balancer will treat the new session as now
“stuck” to the new healthy instance and continue routing requests to that instance even
if the failed application instance comes back. However, it is up to the new application
instance whether and how to respond to a session which it has not previously seen.
o ELB Algorithm when Cookie IS present:
▪ First it checks to see if the cookie is present in the service request

Page 7 of 16
SysOps Admin Exam Notes
▪Since the cookie is found in the request, it will then decide which instance the service
request should be routed to based on the already present cookie
▪ Finally the cookie is inserted in the response
o ELB Algorithm when Cookie IS NOT present:
▪ First it checks to see if the cookie is present in the service request
▪ Since the cookie is not found in the request, it will then decide which instance the
service request should be routed to
▪ Finally the cookie is inserted in the response

Data Management
● Disaster Recovery, Backup & AWS
o Services includes: Regions, Storage, S3, Glacier, EBS, Direct Connect, Storage Gateway
(Cached Volumes, Stored Volumes, Virtual Tape Library)
o Compute Services
▪ EC2
▪ EC2 VM Import Connector
o Networking
▪ Route 53
▪ ELB
▪ VPC
▪ Direct Connect
o Databases
▪ RDD
▪ Dynamo DB
▪ RedShift
o Orchestration
▪ CloudFormation
▪ ElasticBeanstalk
▪ OpsWork
o RTO vs RPO
▪ Recovery Time Objective (RTO) is the length of time from which you can recover
from a disaster. It is measured from when the disaster first occurred as to when you
have fully recovered from it.
▪ Recovery Point Objective (RPO) is the amount of data your organization is prepared to
lose in the event of a disaster.
▪ The lower RTO and RPO, the more costly the solution will be.
o DR Scenarios
▪ Backup & Restore
● In traditional environments, data is backed up to tape and sent off site regularly.
Amazon S3 is the ideal destination for backup data.
● You can use AWS Import/Export to transfer very large data sets by shipping
storage devices to AWS.
● You can also use Glacier in conjunction with S3 for a tiered backup solution.

Page 8 of 16
SysOps Admin Exam Notes


Pilot Light
● A minimal version of an environment is always running in the cloud. The idea
of the pilot light is an analogy that comes from the gas heater. In a gas heater, a
small flame that’s always on and quickly ignite the entire furnace to heat up a
house.
● This scenario is similar to a backup-and-restore scenario. For example, with
AWS you can maintain a pilot light by configuring and running the most critical
core elements of your system in AWS. When the time comes for recovery you
can rapidly providing a full-scale production environment around the critical
core.
● Infrastructure elements for the pilot light itself typically include your database
servers, which would replicate data to Amazon EC2 or Amazon RDS.
Depending on the system, there might be other critical data outside of the
database that needs to be replicated to AWS. This is the critical core of the
system (the pilot light) around which all other infrastructure pieces in AWS (the
rest of the furnace) can quickly be provisioned to restore the complete system.
● Exam Note: Use pre-allocated elastic IP addresses and associate them with
your instances when invoking DR. You can also use pre-allocated elastic
network interfaces (ENIs) with pre-allocated Mac Addresses for applications
with special licensing requirements.
▪ Warm Standby
● A scaled down version of a fully functional environment is always running in
the cloud. A warm standby solution extends the pilot light elements and
preparation. It further decreases the recovery time because some services are
always running. By identifying your business-critical systems, you can fully
duplicate these systems on AWS and have them always on.
● These servers can be running on a minimum-sized fleet of EC2 instances on the
smallest sizes possible. This solution is not scaled to take a full-production
load, but it is fully functional. It can be used for non-production work, such as
testing, QA and internal use.
● In a disaster, the system is scaled up quickly to handle the production load. In
AWS, this can be done by adding more instances to the load balancer and by
resizing the small capacity serves to run on large EC2 instance types.
● Horizontal scaling is preferred over vertical scaling.
▪ Multi-Site
● A multi-site solution runs in AWS as well as on your existing on-site
infrastructure, in an active-active configuration. The data replication method
that you employ will be determined by the recovery point that you choose.
● You can use Route 53 to route traffic to both sites either symmetrically or
asymmetrically.
o AWS Services and Automated Backups
▪ RDS
● There is a performance hit if Multi-AZ is not enabled
● If you delete an instance, then ALL automated backups are deleted
● Manual DB snapshots will NOT be deleted
● All stored on S3

Page 9 of 16
SysOps Admin Exam Notes
● When you do a restore, you can change the engine type. Provided you have
enough storage space.
▪ Elasticache
● Available for Redis Cache Cluster only
● the entire cluster is snapshotted
● snapshot will degrade performance
● therefore only set your snapshot window during the least busy part of the day
● stored on S3
▪ Redshift
● By default, Redshift enables automated backups of your data warehouse cluster
with a 1-day retention period
● Redshift only backs up data that has changed so most snapshots only use up a
small amount of your free backup storage
● stored on S3
o EC2 Types - EBS vs Instance Store
▪ Instance Store (Ephemeral) - Temporary
▪ Elastic Block Storage - Allows users to have data persistence and to save their data
permanently.
▪ Confusion
● Root Volume - Where OS is installed
● Additional Volumes - Not where OS is installed
▪ Root Volume Sizes
● Root device volumes can either be EBS volumes or Instance Store volumes
● An instance store root device volume maximum size is 10gb
● EBS root device volume can be up to 1 or 2 Tb depending on the OS
▪ Terminating an EBS backed Instance
● EBS root device volumes are terminated by Default when the EC2 instance is
terminated. You can stop this by unselecting the “Delete on Termination”
option when creating the instance or by setting the deleteontermination flag to
false using the command line.
● Other EBS volumes attached to the instance are preserved however, if you
delete the instance
o Security Token Service (STS) - Grants users temporary access to AWS resources. Users can
come from 3 sources:
▪ Federation (typically Active Directory)
● Uses SAML
● Grants temporary access based off the users Active Directory credentials. Does
not need to be an IAM user.
● Single sign on allows users to log in to AWS console without assigning IAM
credentials
▪ Federation with Mobile Apps (typically OpenID)
● Use Facebook/Amazon/Google or other OpenID providers to log in.
▪ Cross Account Access
● Let’s users from one AWS account access resources in another
▪ Understanding Key Terms

Page 10 of 16
SysOps Admin Exam Notes
● Federation - combining or joining a list of users in one domain (such as IAM)
with a list of users in another domain (such as Active Directory, Facebook,
LinkedIn, etc.)
● Identity Broker: a service that allows you to take an identity from point A and
join it (federate it) to point B
● Identity Store - services like Active Directory, Facebook, Google, etc.
● Identities - a user of a service like Facebook etc.
▪ Federation Distilled Down
● 1. Develop an Identity Broker to communicate with LDAP and AWS STS.
● 2. Identity Broker always authenticate with LDAP first, then with AWS STS.
● 3. Application then gets temporary access to AWS resources.

● Route 53
o DNS (Domain Name Service) - Used to convert human friendly domain names (such as
http://acloud.guru) into an Internet Protocol (IP) address (such as http://82.124.53.1).
▪ IPV4 - a 32 bit field and has over 4 billion different IP addresses
▪ IPV6 - created to solve the depletion issue with IPv4 and has an address space of 128
bits
▪ NS Records - NS stands for Name Server records and are used by Top Level Domain
servers to direct traffic to the content DNS server.
▪ TTL (time to live) - The length of time that a DNS record is cached on either the
resolving server or the users own local PC is equal to the value of the TTL in seconds.
The lower the time to live, the faster changes to DNS records take to propagate
throughout the internet.
▪ A Records
● An “A” record is the fundamental type of DNS record and the “A” in A record
stands for “Address”. The A record is used by a computer to translate the name
of the domain to the IP address.
▪ CNAME
● A Canonical Name (CNAME) can be used to resolve one domain name to
another. For example, you may have a mobile website with the domain name
http://m.acloud.guru that is used for when users browse to your domain name
on their mobile devices. You may also want the name http://mobile.acloud.guru
to resolve to this address.
▪ Alias Records (only used for AWS) -
● Alias records are used to map resource record sets in your hosted zone to ELBs,
CloudFront distributions, or S3 buckets that are configured as websites.
● Alias records work like a CNAME record in that you can map one DNS name
(www.example.com) to another ‘target’ DNS name
(elb1234.elb.amazonaws.com).
● Key Difference - A CNAME can’t be used for naked domain names (zone
apex). You can’t have a CNAME for http://acloud.guru, it must be either an A
record or an Alias.
o Naked Domain Name - An address without the “www.” in front of it.
Example: acloud.guru
▪ Exam Tips

Page 11 of 16
SysOps Admin Exam Notes
● ELBs do not have a pre-defined IPv4 address, you resolve to them using a DNS
name.
● Understand the difference between an Alias Record and a CNAME.
● Given the choice, always choose Alias Record over CNAME.
o Route53 Routing Policies
▪ Simple
● This is the default routing policy. Most commonly used when you have a single
resource that performs a given function for your domain.
▪ Weighted
● Allows you to split your traffic based on different weights assigned. For
example, you can set 10% of your traffic to go to us-east-1 and 90% of the
traffic to go to us-west1.
▪ Latency
● Allows you to route your traffic based on the lowest network latency for your
end user (i.e. which region will give them the fastest response time).
▪ Failover
● Policies used when creating an active/passive setup.
● Ex: You may want your primary site to be in eu-west-2 and your secondary DR
site in ap-southeast-2
▪ Geolocation
● Allows you to route traffic based on the geographic location of your users.

Additional Information
● AWS Systems Manager
o What is AWS Systems Manager?
▪ AWS Systems Manager allows you to centralize operational data from multiple AWS services
and automate tasks across your AWS resources. You can create logical groups of resources
such as applications, different layers of an application stack, or production versus development
environments. With Systems Manager, you can select a resource group and view its recent API
activity, resource configuration changes, related notifications, operational alerts, software
inventory, and patch compliance status. You can also take action on each resource group
depending on your operational needs. Systems Manager provides a central place to view and
manage your AWS resources, so you can have complete visibility and control over your
operations.
o Who should use AWS Systems Manager?
▪ If you use multiple AWS services, AWS Systems Manager provides you with a centralized and
consistent way to gather operational insights and carry out routine management tasks. You can
use AWS Systems Manager to perform routine operations, track your development, test, and
production environments, and proactively act on events or other operational incidents. AWS
Systems Manager provides an operations complement to the more developer-focused tools you
use, such as code editors and integrated development environments (IDEs). Similar to an IDE,
AWS Systems Manager integrates a broad range of operations tools.
● AWS Config
o AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS
resources. Config continuously monitors and records your AWS resource configurations and allows
you to automate the evaluation of recorded configurations against desired configurations. With Config,

Page 12 of 16
SysOps Admin Exam Notes
you can review changes in configurations and relationships between AWS resources, dive into detailed
resource configuration histories, and determine your overall compliance against the configurations
specified in your internal guidelines. This enables you to simplify compliance auditing, security
analysis, change management, and operational troubleshooting.

● OpsWorks
o AWS OpsWorks is a configuration management service that provides managed instances of
Chef and Puppet. Chef and Puppet are automation platforms that allow you to use code to
automate the configurations of your servers.
o OpsWorks lets you use Chef and Puppet to automate how servers are configured, deployed,
and managed across your Amazon EC2 instances or on-premises compute environments.
OpsWorks has three offerings, AWS OpsWorks for Chef Automate, AWS OpsWorks for
Puppet Enterprise, and AWS OpsWorks Stacks.
o Provisioning an RDS instance (exam feedback from forum):
https://aws.amazon.com/blogs/aws/aws-opsworks-with-amazon-rds/

● AWS Elastic Beanstalk


o AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications
and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on
familiar servers such as Apache, Nginx, Passenger, and IIS.
o You can simply upload your code and Elastic Beanstalk automatically handles the deployment,
from capacity provisioning, load balancing, auto-scaling to application health monitoring. At
the same time, you retain full control over the AWS resources powering your application and
can access the underlying resources at any time

● Autoscaling Lifecycle Hooks


o Lifecycle hooks enable you to perform custom actions by pausing instances as an Auto Scaling
group launches or terminates them. For example, while your newly launched instance is
paused, you could install or configure software on it.
o Each Auto Scaling group can have multiple lifecycle hooks. However, there is a limit on the
number of hooks per Auto Scaling group.

● AWS Artifact
o An audit and compliance portal for on-demand access to download AWS’ compliance reports
and manage select agreements.
o AWS Artifact provides on-demand access to AWS’ security and compliance reports and select
online agreements. Reports available in AWS Artifact include our Service Organization
Control (SOC) reports, Payment Card Industry (PCI) reports, and certifications from
accreditation bodies across geographies and compliance verticals that validate the
implementation and operating effectiveness of AWS security controls. Agreements available in

Page 13 of 16
SysOps Admin Exam Notes
AWS Artifact include the Business Associate Addendum (BAA) and the Nondisclosure
Agreement (NDA).

● Direct Connect
o Key Components
▪ Connection - Create a connection in an AWS Direct Connect location to establish a
network connection from your premises to an AWS region.
▪ Virtual Interface - Create a virtual interface to enable access to AWS services. A
public virtual interface enables access to public-facing services, such as Amazon S3. A
private virtual interface enables access to your VPC.
o Autonomous System Number (ASN)
▪ Autonomous System Numbers are used to identify networks that present a clearly
defined external routing policy to the Internet. AWS Direct Connect requires an ASN
to create a public or private virtual interface. You may use a public ASN which you
own, or you can pick any private ASN number between 64512 to 65535 ranges.

CloudFormation
You can use the following intrinsic functions to define conditions:
● Fn::And
● Fn::Equals
● Fn::If
● Fn::Or
● Fn::Not

JSON Policies
https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html
https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements.html

Introduction to JSON Policies


To assign permissions to a user, group, role, or resource, you create a JSON policy, which is a document
that defines permissions. The policy document includes the following elements:
● Effect – whether the policy allows or denies access
● Action – the list of actions that are allowed or denied by the policy
● Resource – the list of resources on which the actions can occur
● Condition (Optional) – the circumstances under which the policy grants permission
● Principal - states who is allowed to access the resource

Page 14 of 16
SysOps Admin Exam Notes
Deny/Explicit Deny/Allow -
https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html

Auto Scaling Processes


https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html

● Launch - Adds a new EC2 instance to the group, increasing its capacity.
● Terminate - Removes an EC2 instance from the group, decreasing its capacity.
● HealthCheck - Checks the health of the instances. Amazon EC2 Auto Scaling marks an instance
as unhealthy if Amazon EC2 or Elastic Load Balancing tells Amazon EC2 Auto Scaling that the
instance is unhealthy. This process can override the health status of an instance that you set
manually.
● ReplaceUnhealthy - Terminates instances that are marked as unhealthy and later creates new
instances to replace them.
● AZRebalance - Balances the number of EC2 instances in the group across the Availability Zones
in the region.
● AlarmNotification - Accepts notifications from CloudWatch alarms that are associated with the
group.
● ScheduledActions - Performs scheduled actions that you create.
● AddToLoadBalancer - Adds instances to the attached load balancer or target group when they are
launched.

Swagat Feedback (ACG link)


I also passed my sysops administrator associate just now with 100% score (Even I am amazed). Putting down the
questions which I can think of:
(i) Couple of questions on AWS Direct Connect: How to access S3 from direct connect (make the ASN public), and
create a private gateway.
(ii) Steps to add database server in OpsWorks: Create a database layer, get the attributes (hostname, db name , etc
from a json file), configurable items between database layer and application layer.
(iii)EC2 Auto recovery: EBS Backed instances and (don't recall the other options - had 2 answers).
(iv) Weighted routing policy

Page 15 of 16
SysOps Admin Exam Notes
(v) LDAP on premise, need to establish VPN and replicate a LDAP server in AWS and use that.
(vi)How to restrict users to a geographical region : Use a 3rd party software to evaluate geographical region of user
(latency based routing will not work in this case).
(vi) Throttling error in cloud formation: Answer is to use exponential back off.
● Exponential backoff – back off when Amazon SES responds with a “Throttling – Maximum sending rate exceeded”
error. The idea behind this kind of algorithm is to reduce the rate at which you are executing an operation by
introducing delays, so when you are “backing off” you are waiting for a period of time before attempting to execute
the operation again. A common backoff algorithm is ”exponential backoff“. With exponential backoff, you
exponentially increase the backoff duration on each consecutive failure. The following Java snippet shows a simple
implementation of the exponential backoff algorithm and how to use it to call Amazon SES.
(vii) ELB and EMR : Root Access to underlying OS

(viii) Couple of questions on Placement groups


You can launch or start instances in a placement group, which determines how instances are placed on underlying
hardware. When you create a placement group, you specify one of the following strategies for the group:

● Cluster—clusters instances into a low-latency group in a single Availability Zone


● Spread—spreads instances across underlying hardware
There is no charge for creating a placement group.

(ix) S3: ACL, Bucket Policy: 3-4 questions


(iv) S3 Backed AMI: Data is deleted when instance is terminated.
(x) One question on Using which instance type for a high memory intensive app (100000 IOPS) situation. I
answered hi instance.

Page 16 of 16

You might also like