Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Assignment

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Assignment 1

Big Data Analysis


Submission date: 28-November-2023, 10:00 PM

1. Definition, Importance, and role of Machine Learning in traditional data analysis and big
data analysis.
Answer:
Definition: Machine le­arning falls under artificial intelligence­. It centres on creating
algorithms and statistical mode­ls. These let compute­rs do tasks without clear­cut
programming. When we talk about data analysis, machine­ learning algorithms learn
from data. They make­ forecasts or choices without having a specific program for the­
job.

Importance and Role of Machine Learning in Traditional Data Analysis: Traditional data
analysis heavily relies on machine learning to extract insights, from datasets. By
automating the discovery of patterns, correlations and trends machine learning
empowers data analysts to enhance their capabilities and decision making processes. In
the realm of data analysis machine learning finds its application in tasks such, as
regression analysis, clustering and classification.

Importance and Role of Machine Learning in Big Data Analysis: When it comes to
analyzing data, which consists of intricate datasets machine learning becomes crucial,
in handling the sheer volume, speed and diversity of the data. Machine learning
algorithms are capable of scaling up to examine datasets uncovering concealed patterns
and offering insights. They play a role in extracting information, from big data enabling
organizations to make informed decisions based on the data.

2. Advantages and Limitations of using Machine learning in traditional data Analysis and Big
data analysis.
Answer:
Advantages:
 Improved efficiency: Machine learning helps automate repetitive tasks resulting in
a faster analysis process.
 Recognizing patterns: Machine learning excels, at identifying patterns within data
that may pose difficulties for methods.
 Handling datasets: Machine learning algorithms are designed to scale enabling the
analysis of vast amounts of data, in big data scenarios.

Limitations:
• Data Dependency: The quantity and quality of training data have a significant
impact on machine learning performance.
• Interpretability: The decision-making process of certain machine learning models
might be difficult to understand due to their lack of openness.
• Resource-intensive: It can require a lot of computing power to train intricate
models on huge datasets.
3. Different types of Machine Learning Algorithms used in traditional data analysis.
Answer:
In data analysis there are machine learning algorithms that are commonly used. These
include:
• Linear Regression: This algorithm helps in predicting a variable by establishing
relationships.
• Decision Trees: These models, resembling trees can be used for both classification
and regression purposes.
• K Nearest Neighbors (KNN): By considering the majority class of their neighbors
this algorithm classifies data points effectively.
• Clustering Algorithms (K Means): Similar data points are grouped together using
these algorithms.
• Support Vector Machines (SVM): This algorithm separates classes by identifying
the hyperplane that works best.

These various machine learning algorithms play roles in the field of data analysis.

4. In which scenarios traditional data analysis is better than big data analysis.
Answer:
Traditional data analysis may be preferable when:
• When the dataset is tiny and manageable with conventional technologies,
traditional data analysis might be better.
• The objectives of the analysis are clearly stated and don't call for intricate
algorithms.
• Processing in real time is not absolutely necessary.

5. In which scenarios big data analysis is better than traditional data analysis
Answer:
Big data analysis is most appropriate, in the following scenarios:
• When working with intricate datasets that cannot be managed by tools.
• When real time processing and analysis are of importance.
• When the analysis entails discovering concealed patterns within quantities of data.

6. Challenges associated with machine learning in traditional data analysis and what should
be the possible solutions to overcome these challenges.
Answer:
Challenges:
• Dealing with datasets can be a challenge, due to scalability.
• Unstructured data poses difficulties in handling and analyzing effectively.
• Complex models often lack interpretability.

Possible Solutions:
• To address scalability concerns consider implementing models that require
computational resources.
• Prioritize pre-processing and structuring the data before applying machine
learning algorithms.
• Focus on explainable AI models for better interpretability.
7. Challenges associated with machine learning in big data architecture and what should be
the possible solutions to overcome these challenges.
Answer:
Challenges:
• Dealing with the complexities of handling and analyzing amounts of data.
• Ensuring that data is reliable and consistent.
• Overcoming the hurdles associated with distributed computing.

Possible Solutions:
• Utilizing distributed computing frameworks such, as Apache Hadoop and Spark to
streamline operations.
• Employing data quality tools, for cleaning and pre-processing.
• Implementing parallel processing and storage solutions.

8. Supervised and Unsupervised Machine learning task that can be applied in big data
analysis. Also write down in which application area that can be applied in big data analysis.
Answer:
Supervised
• Supervised learning involves classifying data into categories, such, as identifying
spam emails.
• On the hand regression focuses on predicting values like forecasting sales.

Unsupervised
• In learning clustering is used to group data points together.
• For example it can be used for customer segmentation. Association on the hand
helps discover relationships, between variables.
• An example of this is market basket analysis.

9. Supervised and Unsupervised Machine learning algorithms that can be applied/used in


big data analysis. Also write down in which application area that can be applied in big data
analysis
Answer:
Supervised:

• Supervised learning techniques such, as Random Forest are commonly used for
both classification and regression tasks.
• They provide a way to improve the accuracy of predictions.
• Deep Learning, which involves Neural Networks is particularly effective in
recognizing patterns, like image and speech recognition.

Unsupervised:

• On the hand unsupervised learning methods like K Means are employed for
clustering data points together.
• The Apriori Algorithm is utilized in market basket analysis to mine association
rules.
• Additionally Principal Component Analysis (PCA) is useful, in reducing the
dimensionality of datasets.

You might also like