Module 5 - S3
Module 5 - S3
Module 5 - S3
Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security,
and performance.
Store and protect any amount of data for a range of use cases, such as data lakes, websites, cloud-native
applications, backups, archive, machine learning, and analytics.
Amazon S3 is designed for 99.999999999% (11 9's) of durability, and stores data for millions of
customers all around the world.
Run big data analytics, artificial intelligence (AI), machine learning (ML), and high-performance
computing (HPC) applications to unlock data insights.
Build fast, powerful mobile and web-based cloud-native apps that scale automatically in a highly
available configuration.
Meet Recovery Time Objectives (RTO), Recovery Point Objectives (RPO), and compliance requirements
with S3’s robust replication features.
Move data archives to the Amazon S3 Glacier storage classes to lower costs, eliminate operational
complexities, and gain new insights.
Amazon S3 stores data as objects within buckets. An object consists of a file and optionally any metadata
that describes that file. To store an object in Amazon S3, you upload the file you want to store to a
bucket. When you upload a file, you can set permissions on the object and any metadata.
Buckets are the containers for objects. You can have one or more buckets. For each bucket, you can
control access to it (who can create, delete, and list objects in the bucket), view access logs for it and its
objects, and choose the geographical region where Amazon S3 will store the bucket and its contents.
Storing Objects: Users can upload objects (data files) to their S3 buckets using the AWS Management
Console, AWS CLI, SDKs, or other tools. Each object is associated with a unique key (a string that
identifies the object within the bucket).
Data Durability: Amazon S3 provides high durability for stored data. When you upload an object to S3, it
automatically replicates the data across multiple Availability Zones within a region to ensure data
redundancy and fault tolerance.
Data Availability: Amazon S3 is designed to provide high availability. It ensures that objects are
accessible and retrievable at all times. Users can access objects using HTTP or HTTPS.
Access Control: Amazon S3 offers fine-grained access control options. You can set permissions on
buckets and objects to control who can read or write data. Common access control mechanisms include
bucket policies, Access Control Lists (ACLs), and Identity and Access Management (IAM) roles.
Data Versioning: S3 allows versioning of objects, which means you can preserve, retrieve, and restore
every version of every object stored in a bucket.
Data Encryption: S3 provides data encryption options for securing objects at rest and during transmission.
This includes server-side encryption (SSE) and client-side encryption.
Storage Classes: Amazon S3 offers different storage classes to optimize costs based on data access
patterns and retrieval times. For example, you can use Standard, Intelligent-Tiering, Glacier, and others.
Lifecycle Policies: Users can define rules to transition or expire objects automatically based on the data's
age, access frequency, and other criteria.
Event Notifications: Amazon S3 can trigger events when objects are created, modified, or deleted in a
bucket. These events can be used to automate workflows and trigger actions through AWS services like
AWS Lambda.
Data Analytics: S3 is often used in combination with Amazon S3 Select, Amazon Athena, and Amazon
Redshift Spectrum to query and analyze data directly in S3 without the need to move it to a separate
database.
Data Transfer Acceleration: Amazon S3 Transfer Acceleration enables faster uploads and downloads of
objects using Amazon CloudFront's globally distributed edge locations.
Logging and Monitoring: Amazon S3 provides detailed access logs and metrics that help users monitor
and track access to their data.
Cross-Region Replication: For enhanced data protection and disaster recovery, users can set up cross-
region replication to replicate objects to a different AWS region.
S3 Storage Classes:
Amazon S3 provides several storage classes that allow users to optimize costs and access patterns for
their data. Each storage class is designed to meet different performance, durability, and cost requirements.
Here are some of the key S3 storage classes:
S3 Standard: This is the default storage class for frequently accessed data. It offers low latency and high
throughput performance. S3 Standard provides high durability and availability, making it suitable for a
wide range of use cases.
S3 Intelligent-Tiering: This storage class is designed for data with unknown or changing access patterns.
It automatically moves objects between two access tiers: frequent and infrequent access. It helps users
save costs by charging lower fees for infrequent access objects.
S3 Standard-IA (Infrequent Access): S3 Standard-IA is suitable for infrequently accessed data. It offers
the same low latency and high throughput performance as S3 Standard but at a lower storage cost.
Retrieval fees apply when accessing objects.
S3 One Zone-IA: Similar to Standard-IA but stores data in a single availability zone, which makes it less
expensive. However, it doesn't provide the same durability as S3 Standard or S3 Standard-IA since data is
not replicated across multiple zones.
S3 Glacier: S3 Glacier is designed for long-term archival and data backup. It offers the lowest storage
costs but with a longer retrieval time, typically taking several hours.
S3 Glacier Deep Archive: This storage class is for data archiving with the lowest storage costs but
extended retrieval times, often taking 12 hours or more.