Vamsi ..Word
Vamsi ..Word
Vamsi ..Word
BACHELOR OF TECHNOLOGY IN
MECHANICAL ENGINEERING
By
BHADRI SAI VAMSI
ROLL NO.21131A0306
This report on
“AICTE AWS AI-ML VIRTUAL INTERNSHIP”
is a bonafide record of the Internship work submitted
By
B.SAI VAMSI
21131A0306
AWS Academy Cloud Foundations is intended for students who seek an overall understanding of cloud
computing concepts, independent of specific technical roles. It provides a detailed overview of cloud
concepts, AWS core services, security, architecture, pricing, and support.
We learned how to use AWS Cloud services basics and how the adoption framework works.
Computational services like AMAZON EC2, LAMBDA etc. We learned about different storage types
like AMAZON EBS, S3, EFS. This course also introduced about the AWS databases like
DYNAMODB, REDSHIFT, RDS.
Machine learning is the use and development of computer systems that can learn and adapt without
following explicit instructions, by using algorithms and statistical models to analyse and draw
inferences from patterns in data.
In this course, we will learn how to describe machine learning (ML), which includes how to recognize
how machine learning and deep learning are part of artificial intelligence, it also describes artificial
intelligence and machine learning terminology. Through this we can identify how machine learning can
be used to solve a business problem. We can also learn how to describe the machine learning process
in detail and the list the tools available to data scientists to identify when to use machine learning instead
of traditional software development methods. Implementation of a machine learning pipeline, which
includes learning how to formulate a problem from a business request, obtain and secure data for
machine learning, use Amazon Sage Maker to build a Jupyter notebook, outline the process for
evaluating data, explanation of why data must be pre-processed. Using the open-source tools to
examine and pre-process data. We can use Amazon Sage Maker to train andhost a machine learning
model.
It also includes in the use of cross validation to test the performance of a machine learning model, use
of hosted model for inference and creating an Amazon Sage Maker hyperparameter tuning job to
optimize a model’s effectiveness. Final y, we wil learn how to use managed Amazon ML services to solve
specific machine learning problems in forecasting, computer vision, and natural language processing.
For this course we took a case study- “Unlocking Clinical Data from Narrative Reports”. Objective of
this case study is to evaluate the automated detection of clinical conditions described in narrative
reports using “Natural Language Processing”.
ACKNOWLEDGEMENT
We would like to express our deep sense of gratitude to our esteemed institute
Gayatri Vidya Parishad College of Engineering (Autonomous), which has
provided us an opportunity to fulfill our cherished desire.
We are highly indebted to Dr. B.Govinda Rao, Professor and Head of the
Department of Mechanical Engineering, Gayatri Vidya Parishad College of
Engineering (Autonomous), for giving us an opportunity to do the internship in
college.
We express our sincere thanks to our Principal Dr. A.B. KOTESWARA RAO,
Gayatri Vidya Parishad College of Engineering (Autonomous) for his
encouragement to us during this project, giving us a chance to explore and learn
new technologies in the form of mini project.
We are also thankful and grateful to Eduskills, AICTE and SS&C Blue Prism
Foundationfor providing us with this opportunity. Finally, we are indebted to the
teaching and non- teaching staff of the Mechanical Engineering Department for all
their support in completion of our project.
B.SAIVAMSI
21131A0306
INDEX
COURSE: AWS CLOUD FOUNDATIONS
7. Storage
● Amazon Elastic Block Store (Amazon EBS)
Module 7
● Amazon Simple Storage Service (Amazon S3) 20
● Amazon Elastic File System (Amazon EFS)
● Amazon Simple Storage Service Glacier
8. Databases
● Amazon Relational Database Service (Amazon
RDS)
Module 8 21
● Amazon DynamoDB
● Amazon Redshift
● Amazon Aurora
9. Cloud Architecture
● AWS Well-Architected Framework
Module 9 22
● Reliability and high availability
● AWS Trusted Advisor
10. Auto Scaling and Monitoring
● Elastic Load Balancing
Module 10 23-24
● Amazon CloudWatch
● Amazon EC2 Auto Scaling
6. Case Study
Case study on Natural Language Processing- 42
Unlocking Clinical Data from Narrative Reports.
7. Conclusion and References 43
COURSE: AWS CLOUD FOUNDATIONS
MODULE: 1 CLOUD CONCEPTS OVERVIEW
CLOUD SERVICES: There are three main cloud service models. The below figure-1.1 explains
about the control over IT resources.
Figure-1.3-Fundamental Pricing
Total Cost of Ownership (TCO) is the financial estimate to help identify direct and indirect costs of a
system.
● To compare the costs of running an entire infrastructure environment or specific workload
on-premises versus on AWS
3. AWS Organisations
AWS Organizations is a free account management service that enables you to consolidate multiple
AWS accounts into an organization that you create and centrally manage.
AWS Organizations include consolidated billing and account management capabilities that help you
to better meet the budgetary, security, and compliance needs of your business. The main benefits of
AWS Organizations are:
● Centrally managed access policies across multiple AWS accounts.
● Controlled access to AWS services.
Table-1.2
2. Amazon EC2
● Backup and storage –Provide data backup and storage services for others
● Application hosting –Provide services that deploy, install, and manage web applications
● Software delivery –Host your software applications that customers can download
3. Amazon Elastic File System (EFS)
File storage in the AWS Cloud:
● Works well for big data and analytics, media processing workflows, content management,
web serving, and home directories.
● Petabyte-scale, low-latency file system.
● Shared storage.
● Elastic capacity.
● Compatible with all Linux-based AMIs for Amazon EC2.
4. Amazon Simple Storage Service Glacier
● Amazon S3 Glacier is a data archiving service that is designed for security, durability, and an
extremely low cost.
● Amazon S3 Glacier is designed to provide 11 9s of durability for objects.
● It supports the encryption of data in transit and at rest through Secure Sockets Layer (SSL) or
Transport Layer Security (TLS).
● The Vault Lock feature enforces compliance through a policy.
● Extremely low-cost design works well for long-term archiving.
● Provides three options for access to archives—expedited, standard, and bulk—retrieval times
range from a few minutes to several hours.
MODULE: 8 DATABASES
Cost Optimization–AWS Trusted Advisor looks at your resource use and makes recommendations
to help you optimize cost by eliminating unused and idle resources, or by making commitments to
reserved capacity.
Performance–Improve the performance of your service by checking your service limits, ensuring
you take advantage of provisioned throughput, and monitoring for overutilized instances. Security–
Improve the security of your application by closing gaps, enabling various AWS securityfeatures,
and examining your permissions.
Fault Tolerance–Increase the availability and redundancy of your AWS application by taking
advantage of automatic scaling, health checks, multi-AZ deployments, and backup capabilities.
Service Limits–AWS Trusted Advisor checks for service usage that is more than 80 percent of the
service limit. Values are based on a snapshot.
RELIABILITY
Figure-1.5-Relaibility of System
● A measure of your system’s ability to provide functionality when desired by the user can be defined by
reliability shown in figure-1.5.
● System includes all system components: Hardware and Software.
● Probability that your entire system will function as intended for a specified period.
● Mean Time Between Failures (MTBF)= total time in service/number of failures.
MODULE: 10 AUTO SCALING AND MONITORING
1. Elastic Load Balancing: Elastic Load Balancing automatically distributes your incoming traffic
across multiple targets, such as EC2 instances, containers, and IP addresses, in one or more Availability Zones
as shown in figure-1.6. It monitors the health of its registered targets, and routes traffic only to the healthy
targets.
Types of Elastic Load Balancing: There are 3 types of elastic load balancing consisting of
Application Load Balancer, Network Load Balancer, Classic Load Balancer. Each of the 3 types has
their better features and upgrades than the previous ones as shown in table-1.3.
2. Amazon CloudWatch:
CloudWatch enables you to –
● Collect and track standard and custom metrics.
● Define rules that match changes in your AWS environment and route these events to targets for processing.
3. Amazon EC2 Auto Scaling:
● Helps you maintain application availability.
● Enables you to automatically add or remove EC2 instances according to conditions that you define.
● Detects impaired EC2 instances and unhealthy applications and replaces the instances without your
intervention.
COURSE: MACHINE LEARNING FOUNDATIONS
● Artificial intelligence is the broad field of building machines to perform human tasks.
● Machine learning is a subset of AI. It focuses on using data to train ML models so the models can
make predictions.
● Deep learning is a technique that was inspired from human biology. It uses layers of neurons to
build networks that solve problems.
2. Business Problems Solved with Machine Learning
Machine learning is used throughout a person’s digital life. Here are some examples:
● Spam –Your spam filter is the result of an ML program that was trained with examples of
spam and regular email messages.
● Recommendations –Based on books that you read or products that you buy, ML programs
predict other books or products that you might want. Again, the ML program was trained
with data from other readers’ habits and purchases.
Machine learning problems can be grouped into –
● Supervised learning: You have training data for which you know the answer.
● Unsupervised learning: You have data, but you are looking for insights within the data.
● Reinforcement learning: The model learns in a way that is based on experience and feedback.
3. Machine Learning Process
The machine learning pipeline process can guide you through the process of training and evaluating
a model.
The iterative process can be broken into three broad steps –
● Data processing
● Model training
● Model evaluation
ML PIPELINE: A machine learning pipeline is the end-to-end construct that orchestrates theflow
of data into, and output from, a machine learning model as in figure-2.2 (or set of multiple models). It
includes raw data input, features, outputs, the machine learning model and model parameters, and
prediction outputs.
Figure-2.2-Ml Pipeline
● Jupyter Notebook is an open-source web application that enables you to create and share
documents that contain live code, equations, visualizations, and narrative text.
● Jupyter Lab is a web-based interactive development environment for Jupyter notebooks,
code, and data. Jupyter Lab is flexible.
● pandas is an open-source Python library. It’s used for data handling and analysis. It represents
data in a table that is similar to a spreadsheet. This table is known as a panda Data Frame.
● Matplotlib is a library for creating scientific static, animated, and interactive visualizations in
Python. You use it to generate plots of your data later in this course.
3. Evaluating Data
● Descriptive statistics can be organized into different categories. Overall statistics include the
number of rows (instances) and the number of columns (features or attributes) in your dataset.
This information, which relates to the dimensions of your data, is important. For example, it can
indicate that you have too many features, which can lead to high dimensionality and poor model
performance.
● Attribute statistics are another type of descriptive statistic, specifically for numeric attributes.
They give a better sense of the shape of your attributes, including properties like the mean,
standard deviation, variance, minimum value, and maximum value.
● Multivariate statistics look at relationships between more than one variable, such as correlations
and relationships between your attributes.
4. Feature Engineering
Feature selection is about selecting the features that are most relevant and discarding the rest. Feature
selection is applied to prevent either redundancy or irrelevance in the existing features, or to get a
limited number of features to prevent overfitting.
Feature extraction is about building up valuable information from raw data by reformatting,
combining, and transforming primary features into new ones. This transformation continues until it
yields a new set of data that can be consumed by the model to achieve the goals.
Outliers:
During feature engineering. You can handle outliers with several different approaches. They include, but
are not limited to:
● Deleting the outlier: This approach might be a good choice if your outlier is based on an
artificial error. Artificial error means that the outlier isn’t natural and was introduced because
of some failure—perhaps incorrectly entered data.
● Imputing a new value for the outlier: You can use the mean of the feature, for instance, and
impute that value to replace the outlier value. Again, this would be a good approach if an
artificial error caused the outlier.
Feature Selection: Filter Methods
Filter methods (figure-2.4) use a proxy measure instead of the actual model’s performance. Filter
methods are fast to compute, and they still capturing the usefulness of the feature set. Common
measures include:
● Pearson’s correlation coefficient –Measures the statistical relationship or association between
two continuous variables.
● Linear discriminant analysis (LDA) –Is used to find a linear combination of features that
separates two or more classes.
Feature Selection: Wrapper Methods
● Forward selection starts with no features and adds them until the best model is found. (figure-
2.5)
● Backward selection starts with all features, drops them one at a time, and selects the best
model.
Feature Selection: Embedded Methods
Embedded methods(figure-2.6) combine the qualities of filter and wrapper methods. They are
implemented from algorithms that have their own built-in feature selection methods.
LINEAR LEARNER: The Amazon SageMaker linear learner algorithm provides a solution for both
classification and regression problems. The Amazon SageMaker linear learner algorithm compares
favourably with methods that provide a solution for only continuous objectives. It provides a
significant increase in speed over naive hyperparameter optimization techniques.
● You can deploy your trained model by using Amazon SageMaker to handle API calls from
applications, or to perform predictions by using a batch transformation.
● Use Single-model endpoints for simple use cases and use multi-model endpoint support to
save resources when you have multiple models to deploy.
Figure-2.9-Confusion Matrix
Figure-2.10- Specificity
8. Hyperparameter and model tuning
HYPERPARAMETER TUNING:
● Tuning hyperparameters can be labour-intensive. Traditionally, this kind of tuning was done manually.
● Then, they would train the model and score it on the validation data. This process would be repeat ed until satisfactory results
were achieved.
● This process is not always the most thorough and efficient way of tuning your hyperparameters. It helps the model to define
and filtering, and optimizer for finding patterns and defining the attributes of data by itself(figure-2.11).
Figure-2.11-Model tuning
LAB: Implementing a Machine Learning pipeline with Amazon SageMaker.
1. OVERVIEW OF FORECASTING
Forecasting is an important area of machine learning. It is important because so many opportunities
for predicting future outcomes are based on historical data. It’s based on time series of data.
Time series data as falling into two broad categories.
The first type is univariate, which means that it has only one variable. The second type is
multivariate.
In addition to these two categories, most time series datasets also follow one of the following
patterns:
● Trend –A pattern that shows the values as they increase, decrease, or stay the same over time.
● Seasonal –A repeating pattern that is based on the seasons in a year.
● Cyclical –Some other form of a repeating pattern.
● Irregular –Changes in the data over time that appear to be random or that have no discernible
pattern.
● Time Series Data Handling: Smoothing of Data: Smoothing your data can help you deal with
outliers and other anomalies. You might consider smoothing for the following reasons.
● Data preparation –Removing error values and outliers.
● Visualization –Reducing noise in a plot.
Time Series Data Algorithms: There are 5 types of Time Series Data Algorithms consisting of
ARMA, DeepAR+, ETS, NPTS, Prophet as shown in figure-2.13.
Figure-2.13-Time series data algorithms
● Autoregressive Integrated Moving Average (ARIMA): This algorithm removes
autocorrelations, which might influence the pattern of observations.
● Deep AR+: A supervised learning algorithm for forecasting one-dimensional time series. It
uses a recurrent neural network to train a model over multiple time series.
● Exponential Smoothing (ETS): This algorithm is useful for datasets with seasonality. It uses
a weighted average for all observations. The weights are decreased over time.
● Non-Parametric Time Series (NPTS): –Predictions are based on sampling from past
observations. Specialized versions are available for seasonal and climatological datasets.
● Prophet: A Bayesian time series model. It’s useful for datasets that span a long time period,
have missing data, or have large outliers.
Import your data –You must import as much data as you have—both historical data and related data.
You should do some basic evaluation and feature engineering before you use the data to train a model.
Train a predictor –To train a predictor, you must choose an algorithm. If you are not sure which
algorithm is best for your data, you can let Amazon Forecast choose by selecting Auto ML as your
algorithm. You also must select a domain for your data, but if you’re not sure which domain fits best,
you can select a custom domain. Domains have specific types of data that they require. For more
information, see Predefined Dataset Domains and Dataset Types in the Amazon Forecast
documentation.
Generate forecasts –As soon as you have a trained model, you can use the model to make a forecast
by using an input dataset group. After you generate a forecast, you can query the forecast, or you can
export it to an Amazon Simple Storage Service (Amazon S3) bucket. You also have the option to
encrypt the data in the forecast before you export it.
Computer vision with image and facial recognition can help to quickly identify unlawful entries or
persons of interest. This process can result in safer communities and a more effective way of deterring
crimes.
Authentication and enhanced computer-human interaction:
Enhanced human-computer interaction can improve customer satisfaction. Examples include products
that are based on customer sentiment analysis in retail outlets or faster banking services with quick
authentication that is based on customer identity and preferences.
Content management and analysis:
Millions of images are added every day to media and social channels. The use of computer vision
technologies—such as metadata extraction and image classification—can improve efficiency and
revenue opportunities.
Autonomous driving:
By using computer-vision technologies, auto manufacturers can provide improved and safer self-
driving car navigation, which can help realize autonomous driving and make it a reliable transportation
option.
Medical imaging:
Medical image analysis with computer vision can improve the accuracy and speed of a patient's medical
diagnosis, which can result in better treatment outcomes and life expectancy. Manufacturing process
control Well-trained computer vision that is incorporated into robotics can improve quality assurance
and operational efficiencies in manufacturing applications. This process can result in more reliable and
cost-effective products.
videos searchable so that you can discover the objects and scenes that appear in them.
Figure-2.16-Problem 1
Figure-2.17-Problem 2
CASE 01: Searchable Image Library
Figure-2.18
CASE 02: Sentiment Analysis
Figure-2.19
4. Preparing Customs Dataset for Computer Vison
There are 6 steps involved in preparing customs data:
Each step has it’s functionalities like collection of images, creating training dataset, create test
dataset, train the model, evaluate and then use the model.
Figure-2.20-Structure of NLP
● Discovering the structure of the text –One of the first tasks of any NLP application is
to break the text into meaningful units, such as words, phrases, and sentences.
● Labelling data –After the system converts the text to data, the next challenge is to apply
labels that represent the various parts of speech. Every language requires a different labelling
scheme to match the language’s grammar.
● Representing context –Because word meaning depends on context, any NLP system needs a
way to represent context. It is a big challenge because of the large number of contexts.
● Applying grammar –Dealing with the variation in how humans use language is a major
challenge for NLP systems.
NLP FLOW CHART: NLP flow chart starts with collection of test database as shown in figure-
2.21. Then the test data gets tokenize using word vector coding and further it gets analysed and use
model for prediction of results.
● Interactive Assistants
● Database Queries
LAB: Natural Language Processing
1. To create an Amazon Lex bot (console)
2. Sign into the AWS Management Console and open the Amazon Lex console at
https://console.aws.amazon.com/lex/.
3. If this is your first bot, choose Get Started; otherwise, on the Bots page, choose
Create.
● On the Create your Lex bot page, provide the following information, and then
● default bot name (Order Flowers).
● For COPPA, choose No.
● For User utterance storage, choose the appropriate response.
● Choose Create.
4. The console makes the necessary requests to Amazon Lex to save the configuration.
5. The console then displays the bot editor window.
6. Wait for confirmation that your bot was built.
7. Test the bot.
CASE STUDY
Multi Channel Alert System for Organizations
Problem Statement:
Designing a Multi Channel Alert System for Organisations
Domain used: Amazon SNS
Alerting system is one of the important mechanisms that an organization needs.
● Multi Channel Support
● User-Friendly
● Effective Delivery
● Scalability
● fully configurable
● Global Support
A Multi Channel Alert System is an automated delivery
mechanism to send a message as in notification to multiple users through multiple
channels and multiple services.
Prerequisites:
● AWS account
● General understanding of SNS concepts
● No Charges,as SNS is free tier
Introduction:
Organised Alert System is a very impactful tool that every organisation from small to
big should have because every organisation should be able to grab their users attention
when they need to,In order to successfully deliver your information that they want to.
Working:
● Sign in to AWS management Console and Select SNS service
Creating SNS Topics:
● Create a SNS Topic by selecting topics and clicking on create Topic
● Fill out necessary Fields like Topic name,owner of Topic
● If you wish to fill other information like encryption details,Tag,Delivery Retry
policy you can by selecting the appropriate section
● After filling the desired fields hit create topic and take note of ARN generated
Creating SNS Subscriptions:
● To create SNS subscriptions follow step 1 and select subscriptions
● Hit create subscriptions and choose the Topic ARN/Topic’s ARN’s
● Choose the type of protocol You want to send through like email,AWS
Lambda,HTTP,Email Json
● Specify the endpoint like email address or phone number
● Choose create subscription
Note:
and AWS Lambda endpoint doesn’t require confirmation.
Publishing Alerts:
● To publish a Alert follow step 1 and choose a Topic and hit publish message
● In message Subject section choose your title
● In Time to Live section if you want to schedule your message choose the time
you want to publish
● In message body section write your context to all protocols or choose different
payloads to different protocols if you desire by clicking custom payload
● If you want toad message attributes go on and add them
View Analytics:
● You can view the status of your alerts by going in Dashboard and can view
various metrics as in failed and successful messages and retried messages etc
CONCLUSION
Amazon SNS can be used create to Multi Channel Alert System as it is highly
scalable,and very cost effective at almost no cost to publisher and also very reliable,An
Alert system is very vital to organizations but it can be a challenge to both small scale
and large scale organizations as in small scale organizations can’t spare much to spend
on a alert system and on large scale organizations their alert system canget quite
complex over time yet not be productive and effective.While through amazonSNS one
individual can also able to create and manage a Alert System without spending a penny
on it.
REFERENCES
1.Machine learning on AWS
https://aws.amazon.com/machine-learning/?nc2=h_ql_sol_use_ml
2.Amazon _AWS _EC2
https://aws.amazon.com/ec2/
3.Amazon _AWS_S3
https://aws.amazon.com/s3/
4.Amazon_AWS_sagemaker
https://aws.amazon.com/sagemaker/
5.GitHub machine learning scikit learn
https://github.com/scikit-learn/scikit-learn.git
6.AWS_forecast
https://aws.amazon.com/forecast
7.Case_study_on_ML_architecture-uber
https://pantelis.github.io/cs634/docs/common/lectures/uber-ml-arch-case-study/
8.AWS Global Infrastructure Map: https://aws.amazon.com/about-aws/global-
infrastructure/#AWS_Global_Infrastructure_MapChoose a circle on the map to view
summary information about the Region represented by the circle.
9.Regions and Availability Zones: https://aws.amazon.com/about-aws/global-
infrastructure/regions_az/Choose a tab to view a map of the selected geography and a
list of Regions, Edge locations, Local zones, and Regional Caches.