Artificial
Artificial
Artificial
1.Every project, regardless of its size, starts with business understanding, which lays the foundation
for successful resolution of the business problem.
2. If the problem is to determine probabilities of an action, then a predictive model might be sed.
4.If the problem requires a yes/ no answer, then a classification approach to predicting a response
would be suitable.
5.Techniques such as descriptive statistics and visualization can be applied to the data set, to assess
the content, quality, and initial insights about the data
A capstone project is a project where students must research a topic independently to find a deep
understanding of the subject matter. It gives an opportunity for the student to integrate all their
knowledge and demonstrate it through a comprehensive project.
The premise that underlies all Machine Learning disciplines is that there needs to be a pattern.If
there is no pattern, then the problem cannot be solved with AI technology. It is fundamental that this
question is asked before deciding to embark on an AI development journey.
4.List down different problem categories that comes under predictive analysis? Write one example
for each?
5.What is design thinking? Draw the diagram and briefly explain each stage of design thinking?
1.Empathize
3.Ideate
4.Prototype
5.Test
6.What is problem decomposition? Write down the steps involved in problem decomposition?
Problem decomposition is the process of breaking down the problem into smaller units before
coding
• 1.Understand the problem and then restate the problem in your ownwords
• 2.Break the problem down into a few large pieces.
• 3.Break complicated pieces down into smaller pieces.
• 4.Code one small piece at a time. Think about how to implement it Write the
code/query Test it… on its own .Fix problems, if any
• The train-test split is a technique for evaluating the performance of a machine learning
algorithm.
• It can be used for classification or regression problems and can be used for any supervised
learning algorithm.
• The procedure involves taking a dataset and dividing it into two subsets.
• The first subset is used to fit the model and is referred to as the training dataset.
• The second subset is not used to train the model; but to evaluate the fit machine learning
model. It is referred to as testing dataset.
8.
This is most commonly expressed as a percentage between 0 and 1 for either the train or test
datasets.
For example, a training set with the size of 0.67 (67 percent) means that the remainder percentage
0.33 (33 percent) is assigned to the test set.
• On small datasets, the extra computational burden of running cross-validation isn’t a big
deal. So, if your dataset is smaller, you should run cross-validation
• If your dataset is larger, you can use train-test-split method For example K=10 for 10-fold
cross validation. More reliable, though it takes longer to run
Hyper parameters are parameters whose values govern the learning process. They also determine
the value of model parameters learned by a learning algorithm.Eg: The ratio of train-test-split,
Number of hidden layers in neural network, Number of clusters in clustering task
12.What is loss function? What are the different categories of loss function?
A loss function is a measure of how good a prediction model does in terms of being able to predict
the expected outcome. Loss functions can be broadly categorized into 2 types: Classification and
Regression Loss. Regression functions predict a quantity, and classification functions predict a label
14.Draw the diagram of Analytic Approach and explain each stage/ Explain foundational
methodology of data science?
1.Business understanding
2.Analytic approach
3.Data Requirement
4.Data collection
• •Where is the data coming from (identify all sources) and how will you get it?
• •The Data Scientist identifies and collects data resources (structured, unstructured and semi-
structured) that are relevant to the problem area.
• •If the data scientist finds gaps in the data collection, he may need to review the data
requirements and collect more data.
5.Data understanding
• •Is the data that you collected representative of the problem to be solved?
• •Descriptive statistics and visualization techniques can help a data scientist understand the
content of the data, assess its quality, and obtain initial information about the data.
6.Data preparation
• •What additional work is required to manipulate and work with the data?
• •The Data preparation step includes all the activities used to create the data set used during
the modeling phase.
• •This includes cleansing data, combining data from multiple sources, and transforming data
into more useful variables.
• •In addition, feature engineering and text analysis can be used to derive new structured
variables to enrich all predictors and improve model accuracy.
7.Model Training
• •In What way can the data be visualized to get the answer that is required?
• •From the first version of the prepared data set, Data scientists use a Training data
set(historical data in which the desired result is known) to develop predictive or descriptive
models.
• •The modelling process is very iterative.
8.Model Evaluation
• •Does the model used really answer the initial question or does it need to be adjusted?
• •The Data Scientist evaluates the quality of the model and verifies that the business problem
is handled in a complete and adequate manner.
9.Deployment
10.Feedback