Machine Learning
Machine Learning
At its core, machine learning is all about creating and implementing algorithms that facilitate
these decisions and predictions. These algorithms are designed to improve their performance
over time, becoming more accurate and effective as they process more data.
For instance, if we want a computer to recognize images of cats, we don't provide it with
specific instructions on what a cat looks like. Instead, we give it thousands of images of cats
and let the machine learning algorithm figure out the common patterns and features that
define a cat. Over time, as the algorithm processes more images, it gets better at recognizing
cats, even when presented with images it has never seen before.
This ability to learn from data and improve over time makes machine learning incredibly
powerful and versatile. It's the driving force behind many of the technological advancements
we see today, from voice assistants and recommendation systems to self-driving cars and
predictive analytics.
AI refers to the development of programs that behave intelligently and mimic human
intelligence through a set of algorithms. The field focuses on three skills: learning, reasoning,
and self-correction to obtain maximum efficiency. AI can refer to either machine learning-
based programs or even explicitly programmed computer programs.
Machine learning is a subset of AI, which uses algorithms that learn from data to make
predictions. These predictions can be generated through supervised learning, where
algorithms learn patterns from existing data, or unsupervised learning, where they discover
general patterns in data. ML models can predict numerical values based on historical data,
categorize events as true or false, and cluster data points based on commonalities.
Deep learning, on the other hand, is a subfield of machine learning dealing with algorithms
based essentially on multi-layered artificial neural networks (ANN) that are inspired by the
structure of the human brain.
Unlike conventional machine learning algorithms, deep learning algorithms are less linear,
more complex, and hierarchical, capable of learning from enormous amounts of data, and
able to produce highly accurate results. Language translation, image recognition, and
personalized medicines are some examples of deep learning applications.
Comparing different industry terms
Here are some reasons why it’s so essential in the modern world:
Once collected, the data needs to be prepared for machine learning. This process involves
organizing the data in a suitable format, such as a CSV file or a database, and ensuring that
the data is relevant to the problem you're trying to solve.
Preprocessing improves the quality of your data and ensures that your machine learning
model can interpret it correctly. This step can significantly improve the accuracy of your
model. Our course, Preprocessing for Machine Learning in Python, explores how to get
your cleaned data ready for modeling.
Factors to consider when choosing a model include the size and type of your data, the
complexity of the problem, and the computational resources available. You can read more
about the different machine learning models in a separate article.
During training, it's important to avoid overfitting (where the model performs well on the
training data but poorly on new data) and underfitting (where the model performs poorly on
both the training data and new data). You can learn more about the full machine learning
process in our Machine Learning Fundamentals with Python skill track, which explores
the essential concepts and how to apply them.
Common metrics for evaluating a model's performance include accuracy (for classification
problems), precision and recall (for binary classification problems), and mean squared error
(for regression problems). We cover this evaluation process in more detail in
our Responsible AI webinar.
Techniques for hyperparameter tuning include grid search (where you try out different
combinations of parameters) and cross validation (where you divide your data into subsets
and train your model on each subset to ensure it performs well on different data).
Deploying the model involves integrating it into a production environment where it can
process real-world data and provide real-time insights. This process is often known as
MLOps. Discover more about MLOps in a separate tutorial.
Supervised learning
Supervised learning is the most common type of machine learning. In this approach, the
model is trained on a labeled dataset. In other words, the data is accompanied by a label that
the model is trying to predict. This could be anything from a category label to a real-valued
number.
The model learns a mapping between the input (features) and the output (label) during the
training process. Once trained, the model can predict the output for new, unseen data.
Unsupervised learning
Unsupervised learning, on the other hand, involves training the model on an unlabeled
dataset. The model is left to find patterns and relationships in the data on its own.
This type of learning is often used for clustering and dimensionality reduction. Clustering
involves grouping similar data points together, while dimensionality reduction involves
reducing the number of random variables under consideration by obtaining a set of principal
variables.
Reinforcement learning
Reinforcement learning is a type of machine learning where an agent learns to make
decisions by interacting with its environment. The agent is rewarded or penalized (with
points) for the actions it takes, and its goal is to maximize the total reward.
“Machine learning is the most transformative technology of our time. It’s going to
transform every single vertical.”
Healthcare
In healthcare, machine learning is used to predict disease outbreaks, personalize patient
treatment plans, and improve medical imaging accuracy. For instance, Google's DeepMind
Health is working with doctors to build machine learning models to detect diseases earlier
and improve patient care.
Finance
The finance sector has also greatly benefited from machine learning. It's used for credit
scoring, algorithmic trading, and fraud detection. A recent survey found that 56% of global
executives said that artificial intelligence (AI) and machine learning have been implemented
into financial crime compliance programs.
Transportation
Machine learning is at the heart of the self-driving car revolution. Companies like Tesla and
Waymo use machine learning algorithms to interpret sensor data in real-time, allowing their
vehicles to recognize objects, make decisions, and navigate roads autonomously. Similarly,
the Swedish Transport Administration recently started working with computer vision and
machine learning specialists to optimize the country’s road infrastructure management.
Recommendation systems
Recommendation systems are one of the most visible applications of machine learning.
Companies like Netflix and Amazon use machine learning to analyze your past behavior and
recommend products or movies you might like. Learn how to build a recommendation
engine in Python with our online course.
Voice assistants
Voice assistants like Siri, Alexa, and Google Assistant use machine learning to understand
your voice commands and provide relevant responses. They continually learn from your
interactions to improve their performance.
Fraud detection
Banks and credit card companies use machine learning to detect fraudulent transactions. By
analyzing patterns of normal and abnormal behavior, they can flag suspicious activity in real-
time. We have a fraud detection in Python course, which explores the concept in more
detail.
Social media
Social media platforms use machine learning for a variety of tasks, from personalizing your
feed to filtering out inappropriate content.
Our machine learning cheat sheet covers different algorithms and their uses
Libraries such as NumPy and Pandas are used for data manipulation and analysis, while
Matplotlib is used for data visualization. Scikit-learn provides a wide range of machine
learning algorithms, and TensorFlow and PyTorch are used for building and training neural
networks.
Packages like caret, mlr, and randomForest provide a variety of machine learning algorithms,
from regression and classification to clustering and dimensionality reduction.
TensorFlow
TensorFlow is a powerful open-source library for numerical computation, particularly well-
suited for large-scale machine learning. It was developed by the Google Brain team and
supports both CPUs and GPUs.
TensorFlow allows you to build and train complex neural networks, making it a popular
choice for deep learning applications.
Scikit-learn
Scikit-learn is a Python library that provides a wide range of machine learning algorithms for
both supervised and unsupervised learning. It's known for its clear API and detailed
documentation.
Scikit-learn is often used for data mining and data analysis, and it integrates well with other
Python libraries like NumPy and Pandas.
Keras
Keras is a high-level neural networks API, written in Python and capable of running on top of
TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast
experimentation.
Keras provides a user-friendly interface for building and training neural networks, making it a
great choice for beginners in deep learning.
PyTorch
PyTorch is an open-source machine learning library based on the Torch library. It's known
for its flexibility and efficiency, making it popular among researchers.
PyTorch supports a wide range of applications, from computer vision to natural language
processing. One of its key features is the dynamic computational graph, which allows for
flexible and optimized computation.
Data scientist
A data scientist uses scientific methods, processes, algorithms, and systems to extract
knowledge and insights from structured and unstructured data. Machine learning is a key tool
in a data scientist's arsenal, allowing them to make predictions and uncover patterns in data.
Key skills:
Statistical analysis
Programming (Python, R)
Machine learning
Data visualization
Problem-solving
Essential tools:
Python
R
SQL
Hadoop
Spark
Tableau
Key skills:
Python
TensorFlow
Scikit-learn
PyTorch
Keras