The document provides an overview of various machine learning algorithms and methods. It begins with an introduction to predictive modeling and supervised vs. unsupervised learning. It then describes several supervised learning algorithms in detail including linear regression, K-nearest neighbors (KNN), decision trees, random forest, logistic regression, support vector machines (SVM), and naive Bayes. It also briefly discusses unsupervised learning techniques like clustering and dimensionality reduction methods.
This presentation introduces clustering analysis and the k-means clustering technique. It defines clustering as an unsupervised method to segment data into groups with similar traits. The presentation outlines different clustering types (hard vs soft), techniques (partitioning, hierarchical, etc.), and describes the k-means algorithm in detail through multiple steps. It discusses requirements for clustering, provides examples of applications, and reviews advantages and disadvantages of k-means clustering.
Machine learning is a method of data analysis that uses algorithms to iteratively learn from data without being explicitly programmed. It allows computers to find hidden insights in data and become better at tasks via experience. Machine learning has many practical applications and is important due to growing data availability, cheaper and more powerful computation, and affordable storage. It is used in fields like finance, healthcare, marketing and transportation. The main approaches are supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. Each has real-world examples like loan prediction, market basket analysis, webpage classification, and marketing campaign optimization.
A PPT which gives a brief introduction on Machine Learning and on the products developed by using Machine Learning Algorithms in them. Gives the introduction by using content and also by using a few images in the slides as part of the explanation. It includes some examples of cool products like Google Cloud Platform, Cozmo (a tiny robot built by using Artificial Intelligence), IBM Watson and many more.
This document provides an introduction to machine learning. It discusses how machine learning allows computers to learn from experience to improve their performance on tasks. Supervised learning is described, where the goal is to learn a function that maps inputs to outputs from a labeled dataset. Cross-validation techniques like the test set method, leave-one-out cross-validation, and k-fold cross-validation are introduced to evaluate model performance without overfitting. Applications of machine learning like medical diagnosis, recommendation systems, and autonomous driving are briefly outlined.
Machine learning works by processing data to discover patterns that can be used to analyze new data. Popular programming languages for machine learning include Python, R, and SQL. There are several types of machine learning including supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and deep learning. Common machine learning tasks involve classification, regression, clustering, dimensionality reduction, and model selection. Machine learning is widely used for applications such as spam filtering, recommendations, speech recognition, and machine translation.
The document introduces artificial intelligence, machine learning, and deep learning. It discusses supervised, unsupervised, and reinforced learning techniques. Examples of applications discussed include image recognition, natural language processing, and virtual assistants. The document also notes that some AI systems have developed their own internal languages when interacting without human supervision.
1. Machine learning is a set of techniques that use data to build models that can make predictions without being explicitly programmed.
2. There are two main types of machine learning: supervised learning, where the model is trained on labeled examples, and unsupervised learning, where the model finds patterns in unlabeled data.
3. Common machine learning algorithms include linear regression, logistic regression, decision trees, support vector machines, naive Bayes, k-nearest neighbors, k-means clustering, and random forests. These can be used for regression, classification, clustering, and dimensionality reduction.
Here are the key calculations:
1) Probability that persons p and q will be at the same hotel on a given day d is 1/100 × 1/100 × 10-5 = 10-9, since there are 100 hotels and each person stays in a hotel with probability 10-5 on any given day.
2) Probability that p and q will be at the same hotel on given days d1 and d2 is (10-9) × (10-9) = 10-18, since the events are independent.
Machine Learning: Applications, Process and TechniquesRui Pedro Paiva
Machine learning can be applied across many domains such as business, entertainment, medicine, and software engineering. The document outlines the machine learning process which includes data collection, feature extraction, model learning, and evaluation. It also provides examples of machine learning applications in various domains, such as using decision trees to make credit decisions in business, classifying emotions in music for playlist generation in entertainment, and detecting heart murmurs from audio data in medicine.
This document provides an overview of machine learning concepts including supervised learning, unsupervised learning, and reinforcement learning. It explains that supervised learning involves learning from labeled examples, unsupervised learning involves categorizing without labels, and reinforcement learning involves learning behaviors to achieve goals through interaction. The document also discusses regression vs classification problems, the learning and testing process, and examples of machine learning applications like customer profiling, face recognition, and handwritten character recognition.
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
This Machine Learning presentation is ideal for beginners to learn Machine Learning from scratch. By the end of this presentation, you will learn why Machine Learning is so important in our lives, what is Machine Learning, the various types of Machine Learning (Supervised, Unsupervised and Reinforcement learning), how do we choose the right Machine Learning solution, what are the different Machine Learning algorithms and how do they work (with simple examples and use-cases).
This Machine Learning presentation will cover the following topics:
1. Life without Machine Learning
2. Life with Machine Learning
3. What is Machine Learning
4. Machine Learning Process
5. Types of Machine Learning
6. Supervised Vs Unsupervised
7. The right Machine Learning solutions
8. Machine Learning Algorithms
9. Use case - Predicting the price of a house using Linear Regression
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - - -
Who should take this Machine Learning Training Course?
We recommend this Machine Learning training course for the following professionals in particular:
1. Developers aspiring to be a data scientist or Machine Learning engineer
2. Information architects who want to gain expertise in Machine Learning algorithms
3. Analytics professionals who want to work in Machine Learning or artificial intelligence
4. Graduates looking to build a career in data science and Machine Learning
- - - - - -
Deep learning is a type of machine learning that uses neural networks inspired by the human brain. It has been successfully applied to problems like image recognition, speech recognition, and natural language processing. Deep learning requires large datasets, clear goals, computing power, and neural network architectures. Popular deep learning models include convolutional neural networks and recurrent neural networks. Researchers like Geoffry Hinton and companies like Google have advanced the field through innovations that have won image recognition challenges. Deep learning will continue solving harder artificial intelligence problems by learning from massive amounts of data.
K-means clustering is an algorithm that groups data points into k clusters based on their attributes and distances from initial cluster center points. It works by first randomly selecting k data points as initial centroids, then assigning all other points to the closest centroid and recalculating the centroids. This process repeats until the centroids are stable or a maximum number of iterations is reached. K-means clustering is widely used for machine learning applications like image segmentation and speech recognition due to its efficiency, but it is sensitive to initialization and assumes spherical clusters of similar size and density.
This document discusses unsupervised learning approaches including clustering, blind signal separation, and self-organizing maps (SOM). Clustering groups unlabeled data points together based on similarities. Blind signal separation separates mixed signals into their underlying source signals without information about the mixing process. SOM is an algorithm that maps higher-dimensional data onto lower-dimensional displays to visualize relationships in the data.
This is a deep learning presentation based on Deep Neural Network. It reviews the deep learning concept, related works and specific application areas.It describes a use case scenario of deep learning and highlights the current trends and research issues of deep learning
This document provides an introduction to machine learning for data science. It discusses the applications and foundations of data science, including statistics, linear algebra, computer science, and programming. It then describes machine learning, including the three main categories of supervised learning, unsupervised learning, and reinforcement learning. Supervised learning algorithms covered include logistic regression, decision trees, random forests, k-nearest neighbors, and support vector machines. Unsupervised learning methods discussed are principal component analysis and cluster analysis.
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
Data is increasing day by day and so is the cost of data storage and handling. However, by understanding the concepts of machine learning one can easily handle the excessive data and can process it in an affordable manner.
The process includes making models by using several kinds of algorithms. If the model is created precisely for certain task, then the organizations have a very wide chance of making use of profitable opportunities and avoiding the risks lurking behind the scenes.
Learn more about:
» Understanding Machine Learning Objectives.
» Data dimensions in Machine Learning.
» Fundamentals of Algorithms and Mapping from Input/Output.
» Parametric and Non-parametric Machine Learning Algorithms.
» Supervised, Unsupervised and Semi-Supervised Learning.
» Estimating Over-fitting and Under-fitting.
» Use Cases.
Basic machine learning background with Python scikit-learn
This document provides an overview of machine learning and the Python scikit-learn library. It introduces key machine learning concepts like classification, linear models, support vector machines, decision trees, bagging, boosting, and clustering. It also demonstrates how to perform tasks like SVM classification, decision tree modeling, random forest, principal component analysis, and k-means clustering using scikit-learn. The document concludes that scikit-learn can handle large datasets and recommends Keras for deep learning.
This document provides an overview of decision tree algorithms for machine learning. It discusses key concepts such as:
- Decision trees can be used for classification or regression problems.
- They represent rules that can be understood by humans and used in knowledge systems.
- The trees are built by splitting the data into purer subsets based on attribute tests, using measures like information gain.
- Issues like overfitting are addressed through techniques like reduced error pruning and rule post-pruning.
Predictive analytics uses data mining, statistical modeling and machine learning techniques to extract insights from existing data and use them to predict unknown future events. It involves identifying relationships between variables in historical data and applying patterns to unknowns. Predictive analytics is more sophisticated than analytics which has a retrospective focus on understanding trends, while predictive analytics focuses on gaining insights for decision making. Common predictive analytics techniques include regression, classification, time series forecasting, association rule mining and clustering. Ensemble methods like bagging, boosting and stacking combine multiple predictive models to improve performance.
Instance-based learning, also known as lazy learning, is a non-parametric learning method where the training data is stored and a new instance is classified based on its similarity to the nearest stored instances. It is similar to a desktop in that all data is kept in memory. The key aspects are setting the K value for the K-nearest neighbors algorithm and the distance metric such as Euclidean distance. Training involves storing all input data, finding the K nearest neighbors of each test instance, and classifying based on the majority class of those neighbors.
This document discusses k-nearest neighbors (KNN) classification, an instance-based machine learning algorithm. KNN works by finding the k training examples closest in distance to a new data point, and assigning the most common class among those k neighbors as the prediction for the new point. The document notes that KNN has high variance, since each data point acts as its own hypothesis. It suggests ways to reduce overfitting, such as using KNN with multiple neighbors (k>1), weighting neighbors by distance, and approximating KNN with data structures like k-d trees.
This document discusses unsupervised machine learning techniques for clustering unlabeled data. It covers k-means clustering, which partitions data into k groups based on minimizing distance between points and cluster centroids. It also discusses agglomerative hierarchical clustering, which successively merges clusters based on their distance. As an example, it shows hierarchical clustering of texture images from five classes to group similar textures.
An introduction to variable and feature selectionMarco Meoni
Presentation of a great paper from Isabelle Guyon (Clopinet) and André Elisseeff (Max Planck Institute) back in 2003, which outlines the main techniques for feature selection and model validation in machine learning systems
Cluster analysis is an unsupervised learning technique used to group similar objects together. It identifies clusters of objects such that objects within a cluster are more closely related to each other than objects in different clusters. Common applications of cluster analysis include document clustering, market segmentation, and identifying types of customers or animals. Popular clustering algorithms include k-means, k-medoids, hierarchical clustering, density-based clustering, and grid-based clustering.
Hierarchical clustering methods create a hierarchy of clusters based on distance or similarity measures. They do not require specifying the number of clusters k in advance. Hierarchical methods either merge smaller clusters into larger ones (agglomerative) or split larger clusters into smaller ones (divisive) at each step. This continues recursively until all objects are linked or placed into individual clusters.
This document provides an overview of decision trees, including:
- Decision trees can classify data quickly, achieve accuracy similar to other models, and are simple to understand.
- A decision tree has root, internal, and leaf nodes organized in a top-down structure to partition data based on attribute tests.
- To classify a record, the attribute tests are applied from the root node down until a leaf node is reached, which assigns the record's class.
- Decision trees require attribute-value data, predefined target classes, and sufficient training data to learn the model.
The document discusses various topics related to active learning and optimal experimental design. It begins with motivations for active learning such as collecting higher quality data being more useful than simply more data, and data collection often being expensive. It provides examples of applying active learning techniques to problems like classification with gene expression data, collaborative filtering for movie recommendations, sequencing genomes, and improving cell culture conditions. It then covers topics like uncertainty sampling, query by committee, and information-based loss functions for active learning. For optimal experimental design, it discusses techniques like A-optimal, D-optimal, and E-optimal design and how they can be applied to problems with linear models. It also covers extensions to non-linear models using techniques like sequential experimental design
This document discusses decision trees and random forests for classification problems. It explains that decision trees use a top-down approach to split a training dataset based on attribute values to build a model for classification. Random forests improve upon decision trees by growing many de-correlated trees on randomly sampled subsets of data and features, then aggregating their predictions, which helps avoid overfitting. The document provides examples of using decision trees to classify wine preferences, sports preferences, and weather conditions for sport activities based on attribute values.
Cluster analysis, or clustering, is the process of grouping data objects into subsets called clusters so that objects within a cluster are similar to each other but dissimilar to objects in other clusters. There are several approaches to clustering, including partitioning, hierarchical, density-based, and grid-based methods. The k-means and k-medoids algorithms are popular partitioning methods that aim to partition observations into k clusters by minimizing distances between observations and cluster centroids or medoids. K-medoids is more robust to outliers as it uses actual observations as cluster representatives rather than centroids. Both methods require specifying the number of clusters k in advance.
This document discusses various techniques for data classification including decision tree induction, Bayesian classification methods, rule-based classification, and classification by backpropagation. It covers key concepts such as supervised vs. unsupervised learning, training data vs. test data, and issues around preprocessing data for classification. The document also discusses evaluating classification models using metrics like accuracy, precision, recall, and F-measures as well as techniques like holdout validation, cross-validation, and bootstrap.
This document summarizes the analysis of a movie review sentiment dataset using various classification algorithms. It describes extracting features from the dataset, loading it into a dataframe, and applying logistic regression, decision trees, random forests, SVM, k-NN, and Naive Bayes classifiers. Random forest achieved the highest accuracy of 0.6611. Logistic regression had the second highest at 0.6705. The document also discusses counting words by sentiment and visualizing the results.
This document discusses WEKA, an open-source data mining and machine learning tool. It summarizes how WEKA was used to analyze a bike sharing dataset from Washington D.C. to predict bike usage. Different WEKA techniques were explored, including classification algorithms like J48 and Naive Bayes. J48 performed best by visualizing decision trees. Clustering was also attempted but seasonal patterns were only partially distinguished. Overall, the dataset seemed better suited to classification than clustering for predicting bike usage.
Spark Streaming allows processing live data streams using small batch sizes to provide low latency results. It provides a simple API to implement complex stream processing algorithms across hundreds of nodes. Spark SQL allows querying structured data using SQL or the Hive query language and integrates with Spark's batch and interactive processing. MLlib provides machine learning algorithms and pipelines to easily apply ML to large datasets. GraphX extends Spark with an API for graph-parallel computation on property graphs.
Spark is an open-source cluster computing framework that uses in-memory processing to allow data sharing across jobs for faster iterative queries and interactive analytics, it uses Resilient Distributed Datasets (RDDs) that can survive failures through lineage tracking and supports programming in Scala, Java, and Python for batch, streaming, and machine learning workloads.
This document provides an overview of effective big data visualization. It discusses information visualization and data visualization, including common chart types like histograms, scatter plots, and dashboards. It covers visualization goals, considerations, processes, basics, and guidelines. Examples of good visualization are provided. Tools for creating infographics are listed, as are resources for learning more about data visualization and references. Overall, the document serves as a comprehensive introduction to big data visualization.
Graph databases store data in graph structures with nodes, edges, and properties. Neo4j is a popular open-source graph database that uses a property graph model. It has a core API for programmatic access, indexes for fast lookups, and Cypher for graph querying. Neo4j provides high availability through master-slave replication and scales horizontally by sharding graphs across instances through techniques like cache sharding and domain-specific sharding.
This document discusses information retrieval techniques. It begins by defining information retrieval as selecting the most relevant documents from a large collection based on a query. It then discusses some key aspects of information retrieval including document representation, indexing, query representation, and ranking models. The document also covers specific techniques used in information retrieval systems like parsing documents, tokenization, removing stop words, normalization, stemming, and lemmatization.
This document provides an overview of natural language processing (NLP). It discusses topics like natural language understanding, text categorization, syntactic analysis including parsing and part-of-speech tagging, semantic analysis, and pragmatic analysis. It also covers corpus-based statistical approaches to NLP, measuring performance, and supervised learning methods. The document outlines challenges in NLP like ambiguity and knowledge representation.
This document provides an overview of the Natural Language Toolkit (NLTK), a Python library for natural language processing. It discusses NLTK's modules for common NLP tasks like tokenization, part-of-speech tagging, parsing, and classification. It also describes how NLTK can be used to analyze text corpora, frequency distributions, collocations and concordances. Key functions of NLTK include tokenizing text, accessing annotated corpora, analyzing word frequencies, part-of-speech tagging, and shallow parsing.
This document provides an overview of NoSQL databases and summarizes key information about several NoSQL databases, including HBase, Redis, Cassandra, MongoDB, and Memcached. It discusses concepts like horizontal scalability, the CAP theorem, eventual consistency, and data models used by different NoSQL databases like key-value, document, columnar, and graph structures.
This document provides an overview of recommender systems for e-commerce. It discusses various recommender approaches including collaborative filtering algorithms like nearest neighbor methods, item-based collaborative filtering, and matrix factorization. It also covers content-based recommendation, classification techniques, addressing challenges like data sparsity and scalability, and hybrid recommendation approaches.
Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It implements Google's MapReduce programming model and the Hadoop Distributed File System (HDFS) for reliable data storage. Key components include a JobTracker that coordinates jobs, TaskTrackers that run tasks on worker nodes, and a NameNode that manages the HDFS namespace and DataNodes that store application data. The framework provides fault tolerance, parallelization, and scalability.
This document provides an overview of the statistical programming language R. It discusses key R concepts like data types, vectors, matrices, data frames, lists, and functions. It also covers important R tools for data analysis like statistical functions, linear regression, multiple regression, and file input/output. The goal of R is to provide a large integrated collection of tools for data analysis and statistical computing.
This document provides an overview of the Python programming language. It discusses Python's history and evolution, its key features like being object-oriented, open source, portable, having dynamic typing and built-in types/tools. It also covers Python's use for numeric processing with libraries like NumPy and SciPy. The document explains how to use Python interactively from the command line and as scripts. It describes Python's basic data types like integers, floats, strings, lists, tuples and dictionaries as well as common operations on these types.
The document provides an overview of functional programming, including its key features, history, differences from imperative programming, and examples using Lisp and Scheme. Some of the main points covered include:
- Functional programming is based on evaluating mathematical functions rather than modifying state through assignments.
- It uses recursion instead of loops and treats functions as first-class objects.
- Lisp was the first functional language in 1960 and introduced many core concepts like lists and first-class functions. Scheme was developed in 1975 as a simpler dialect of Lisp.
- Functional programs are more focused on what to compute rather than how to compute it, making them more modular and easier to reason about mathematically.
Project management Course in Australia.pptxdeathreaper9
Project Management Course
Over the past few decades, organisations have discovered something incredible: the principles that lead to great success on large projects can be applied to projects of any size to achieve extraordinary success. As a result, many employees are expected to be familiar with project management techniques and how they apply them to projects.
https://projectmanagementcoursesonline.au/
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
Airports, banks, stock exchanges, and countless other critical operations got thrown into chaos!
In an unprecedented event, a recent CrowdStrike update had caused a global IT meltdown, leading to widespread Blue Screen of Death (BSOD) errors, and crippling 8.5 million Microsoft Windows systems.
What triggered this massive disruption? How did Microsoft step in to provide a lifeline? And what are the next steps for recovery?
Swipe to uncover the full story, including expert insights and recovery steps for those affected.
Global Collaboration for Space Exploration.pdfSachin Chitre
Distinguished readers, leaders, esteemed colleagues, and fellow dreamers,
We stand at the precipice of a new era, an epoch where the boundaries of human potential are poised to be redefined. For centuries, humanity has gazed up at the celestial expanse, yearning to explore the cosmic mysteries that beckon us.
Today, I present a vision, a blueprint for a journey that transcends the limitations of conventional science and technology.
Imagine a world where the shackles of gravity are broken, where interstellar travel is no longer confined to the realms of science fiction. A world united not by petty differences, but by a shared purpose – to explore, to discover, and to elevate humanity.
This presentation outlines a comprehensive research project to construct and deploy Vimanas – ancient, aerial vehicles of wisdom and power. By harnessing the knowledge of our ancestors and the advancements of modern science, we can embark on a quest to not only conquer the skies but to conquer the cosmos.
Let us together ignite the spark of human ingenuity and propel our civilization towards a future where the stars are within our reach and where the bonds of humanity are strengthened through shared exploration.
The time for action is now. Let us embark on this extraordinary journey together."
The Hilarious Saga of Ships Losing Their Voices: these gigantic vessels that rule the seas can't even keep track of themselves without our help. When their beloved AIS system fails, they're rendered blind, deaf and dumb - a cruel joke on their supposed maritime prowess.
This document, in its grand ambition, seeks to dissect the marvel that is maritime open-source intelligence (maritime OSINT). Real-world case studies will be presented with the gravitas of a Shakespearean tragedy, illustrating the practical applications and undeniable benefits of maritime OSINT in various security scenarios.
For the cybersecurity professionals and maritime law enforcement authorities, this document will be nothing short of a revelation, equipping them with the knowledge and tools to navigate the complexities of maritime OSINT operations while maintaining a veneer of ethical and legal propriety. Researchers, policymakers, and industry stakeholders will find this document to be an indispensable resource, shedding light on the potential and implications of maritime OSINT in safeguarding our seas and ensuring maritime security and safety.
-------------------------
This document aims to provide a comprehensive analysis of maritime open-source intelligence (maritime OSINT) and its various aspects: examining the ethical implications of employing maritime OSINT techniques, particularly in the context of maritime law enforcement authorities, identifying and addressing the operational challenges faced by maritime law enforcement authorities when utilizing maritime OSINT, such as data acquisition, analysis, and dissemination.
The analysis will offer a thorough and insightful examination of these aspects, providing a valuable resource for cybersecurity professionals, law enforcement agencies, maritime industry stakeholders, and researchers alike. Additionally, the document will serve as a valuable resource for researchers, policymakers, and industry stakeholders seeking to understand the potential and implications of maritime OSINT in ensuring maritime security and safety.
Maritime Open-Source Intelligence (OSINT) refers to the practice of gathering and analyzing publicly available information related to maritime activities, vessels, ports, and other maritime infrastructure for intelligence purposes. It involves leveraging various open-source data sources and tools to monitor, track, and gain insights into maritime operations, potential threats, and anomalies. Maritime Open-Source Intelligence (OSINT) is crucial for capturing information critical to business operations, especially when electronic systems like Automatic Identification Systems (AIS) fail. OSINT can provide valuable context and insights into vessel operations, including the identification of vessels, their positions, courses, and speeds
A. Data Sources
• Vessel tracking websites and services (e.g., MarineTraffic, VesselFinder) that provide real-time and historical data on ship movements, positions, and d
Selling software today doesn’t look anything like it did a few years ago. Especially software that runs inside a customer environment. Dreamfactory has used Anchore and Ask Sage to achieve compliance in a record time. Reducing attack surface to keep vulnerability counts low, and configuring automation to meet those compliance requirements. After achieving compliance, they are keeping up to date with Anchore Enterprise in their CI/CD pipelines.
The CEO of Ask Sage, Nic Chaillan, the CEO of Dreamfactory Terence Bennet, and Anchore’s VP of Security Josh Bressers are going to discuss these hard problems.
In this webinar we will cover:
- The standards Dreamfactory decided to use for their compliance efforts
- How Dreamfactory used Ask Sage to collect and write up their evidence
- How Dreamfactory used Anchore Enterprise to help achieve their compliance needs
- How Dreamfactory is using automation to stay in compliance continuously
- How reducing attack surface can lower vulnerability findings
- How you can apply these principles in your own environment
When you do security right, they won’t know you’ve done anything at all!
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
3. Machine Learning and Pattern Classification
• Predictive modelling is building a model capable of making predictions
• Such a model includes a machine learning algorithm that learns certain properties from a
training dataset in order to make those predictions
• Predictive modelling types - Regression and pattern classification
• Regression models analyze relationships between variables and trends in order to make
predictions about continuous variables
– Prediction of the maximum temperature for the upcoming days in weather forecasting
• Pattern classification assigns discrete class labels to particular observations as outcomes of
a prediction
– Prediction of a sunny, rainy or snowy day
4. Machine Learning Methodologies
• Supervised learning
– Learning from labelled data
– Classification, Regression, Prediction, Function Approximation
• Unsupervised learning
– Learning from unlabelled data
– Clustering,Visualization, Dimensionality Reduction
5. Machine Learning Methodologies
• Semi-supervised learning
– mix of Supervised and Unsupervised learning
– usually small part of data is labelled
• Reinforcement learning
– Model learns from a series of actions by maximizing a reward function
– The reward function can either be maximized by penalizing bad actions
and/or rewarding good actions
– Example - training of self-driving car using feedback from the environment
6. Applications
• Speech recognition
• Effective web search
• Recommendation systems
• Computer vision
• Information retrieval
• Spam filtering
• Computational finance
• Fraud detection
• Medical diagnosis
• Stock market analysis
• Structural health monitoring
9. Learning Process
• Supervised LearningAlgorithms are used in classification and prediction
• Training set - each record contains a set of attributes, one of the
attributes is the class
• Classification or prediction algorithm learns from training data about
relationship between predictor variables and outcome variable
• This process results in
– Classification model
– Predictive model
12. Supervised Learning Model
• The class labels in the dataset used to build the classification model are
known
• Example - a dataset for spam filtering would contain spam messages as
well as "ham" (= not-spam) messages
• In a supervised learning problem, it is known which message in the
training set is spam or ham and this information is used to train our model
in order to classify new unseen messages
15. Linear Regression
• A standard and simple mathematical technique for predicting numeric outcome
• Oldest and most widely used predictive model
• Goal - minimize the sum of the squared errors to fit a straight line to a set of data points
• Fits a linear function to a set of data points
• Form of the function
– Y = β0 + β1*X1 + β2*X2 + … + βn*Xn
– Y is the target variable and X1, X2, ... Xn are the predictor variables
– β1, β2, … βn are the coefficients that multiply the predictor variables
– β0 is constant
• Linear regression with multiple variables
– Scale the data, and implement the gradient descent and the cost function
17. K Nearest Neighbors - KNN
• A simple algorithm that stores all available cases and classifies new cases based on a
similarity measure
• Extremely simple to implement
• Lazy Learning - function is only approximated locally and all computation is deferred until
classification
• Has a weighted version and can also be used for regression
• Usually works very well when there is a distance between examples (Euclidean, Manhattan)
• Slow speed when training set is large (say 10^6 examples) and distance calculation is non-
trivial
• Only a single hyper-parameter – K (usually optimized using cross-validation)
20. DecisionTree Learning
• Decision trees classify instances or examples by starting at the root of the
tree and moving through it until a leaf node
• A method for approximating discrete-valued functions
• Decision tree is a classifier in the form of a tree structure
– Decision node - specifies a test on a single attribute
– Leaf node - indicates the value of the target attribute
– Branch - split of one attribute
– Path - a disjunction of test to make the final decision
21. When to Consider DecisionTrees
• Attribute-value description- object or case must be expressible in terms
of a fixed collection of properties or attributes
– hot, mild, cold
• Predefined classes (target values) - the target function has discrete
output values
– Boolean or multiclass
– Sufficient data - enough training cases should be provided to learn the model
• Possibly noisy training data
• Missing attribute values
22. DecisionTree Applications
• Credit risk analysis
• Manufacturing – chemical material evaluation
• Production – Process optimization
• Biomedical Engineering – identify features to use in implantable devices
• Astronomy – filter noise from Hubble telescope images
• Molecular biology – analyze amino acid sequences in Human Genome project
• Pharmacology – drug efficacy analysis
• Planning – scheduling of PCB assembly lines
• Medicine – analysis of syndromes
23. Strengths
• Trees are inexpensive to construct
• Extremely fast at classifying unknown records
• Easy to interpret for small-sized trees
• Accuracy is comparable to other classification techniques for many simple
data sets
• Generates understandable rules
• Handles continuous and categorical variables
• Provides a clear indication of which fields are most important for
prediction or classification
24. Weaknesses
• Not suitable for prediction of continuous attribute
• Perform poorly with many classes and small data
• Computationally expensive to train
– At each node each candidate splitting field must be sorted before its best split can
be found
– In some algorithms combinations of fields are used and a search must be made for
optimal combining weights
– Pruning algorithms can also be expensive since many candidate sub-trees must be
formed and compared
• Not suitable for non-rectangular regions
25. Tree Representation
• Each node in the tree specifies a test for some
attribute of the instance
• Each branch corresponds to an attribute value
• Each leaf node assigns a classification
28. Problems of Random split
• The tree can grow huge
• These trees are hard to understand
• Larger trees are typically less accurate than smaller trees
• So most tree construction methods use some greedy manner
– find the feature that best divides positive examples from negative
examples for Information gain
29. OptimizedTree Induction
• Greedy strategy - Split the records based on an attribute test
that optimizes certain criterion
• Issues
– Determine root node
– Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
– Determine when to stop splitting
30. OptimizedTree Induction
• Selection of an attribute at each node
– Choose the most useful attribute for classifying training examples
• Information gain
– Measures how well a given attribute separates the training examples
according to their target classification
– This measure is used to select among the candidate attributes at each
step while growing the tree
31. Entropy
• A measure of homogeneity of the set of examples
• Given a set S of positive and negative examples of some target
concept (a 2-class problem), the entropy of set S relative to this
binary classification
– E(S) = - p(P)log2 p(P) – p(N)log2 p(N)
• Example
– Suppose S has 25 examples, 15 positive and 10 negatives [15+, 10-]
– Then entropy of S relative to this classification
• E(S)=-(15/25) log2(15/25) - (10/25) log2 (10/25)
33. Information Gain
• Information gain measures the expected reduction in entropy or uncertainty
• Values(A) is the set of all possible values for attributeA andSv is subset of S for which attributeA has
value v, Sv = {s in S |A(s) = v}
• First term in the equation is the entropy of the original collection S
• Second term is the expected value of the entropy after S is partitioned using attributeA
• It is the expected reduction in entropy caused by partitioning the examples according to this attribute
• It is the number of bits saved when encoding the target value of an arbitrary member of S by knowing the
value of attributeA
( )
( , ) ( ) ( )v
v
v Values A
S
Gain S A Entropy S Entropy S
S
34. A simple example
• Guess the outcome of next week's game between the MallRats and the
Chinooks
• Available knowledge / Attribute
– was the game at Home or Away
– was the starting time 5pm, 7pm or 9pm
– Did Joe play center, or forward
– whether that opponent's center was tall or not
– …..
36. Problem Data
• The game will be away at 9pm and that Joe will play center on offense…
• A classification problem
• Generalizing the learned rule to new examples
37. Examples
• Before partitioning, the entropy is
– H(10/20, 10/20) = - 10/20 log(10/20) - 10/20 log(10/20) = 1
• Using the where attribute, divide into 2 subsets
– Entropy of the first set H(home) = - 6/12 log(6/12) - 6/12 log(6/12) = 1
– Entropy of the second set H(away) = - 4/8 log(6/8) - 4/8 log(4/8) = 1
• Expected entropy after partitioning
– 12/20 * H(home) + 8/20 * H(away) = 1
38. Examples
• Using the when attribute, divide into 3 subsets
– Entropy of the first set H(5pm) = - 1/4 log(1/4) - 3/4 log(3/4);
– Entropy of the second set H(7pm) = - 9/12 log(9/12) - 3/12 log(3/12);
– Entropy of the second set H(9pm) = - 0/4 log(0/4) - 4/4 log(4/4) = 0
• Expected entropy after partitioning
– 4/20 * H(1/4, 3/4) + 12/20 * H(9/12, 3/12) + 4/20 * H(0/4, 4/4) = 0.65
• Information gain 1-0.65 = 0.35
39. Decision
• Knowing the when attribute values provides larger information gain than where
• Therefore the when attribute should be chosen for testing prior to the where
attribute
• Similarly we can compute the information gain for other attributes
• At each node choose the attribute with the largest information gain
• Stopping rule
– Every attribute has already been included along this path through the tree or
– The training examples associated with this leaf node all have the same target attribute
value - entropy is zero
40. Continuous Attribute
• Each non-leaf node is a test
• Its edge partitions the attribute into subsets (easy for discrete attribute)
• For continuous attribute
– Partition the continuous value of attribute A into a discrete set of intervals
– Create a new Boolean attribute Ac, looking for a threshold c,
How to choose c ?
if
otherwise
c
c
true A c
A
false
41. Evaluation
• Training accuracy
– How many training instances can be correctly classify based on the available data?
– Is it high when tree is deep/large or when there is less confliction in the training
instances
– Higher value does not mean good generalization
• Testing accuracy
– Given a number of new instances how many of them can be correctly classified?
– Cross validation
43. Random Forest
• An ensemble classifier that consists of many decision trees
• Outputs the class that is the mode of the class's output by
individual trees
• The method combines Breiman's bagging idea and the
random selection of features
• Used for classification and regression
45. Algorithm
• Let the number of training cases be N and number of variables in the classifier M
• The number m of input variables to be used to determine the decision at a node of the tree -
m should be much less than M
• Choose a training set for this tree by choosing n times with replacement from all N available
training cases
• Use the rest of cases to estimate the error of the tree by predicting their classes
• For each node of the tree, randomly choose m variables on which to base the decision at
that node
• Calculate the best split based on these m variables in the training set
• Each tree is fully grown and not pruned
46. Gini Index
• Random forest uses Gini index taken from CART learning system to
construct decision trees
• The Gini Index of node impurity is the measure most commonly chosen
for classification type problems
• How to select N? - Build trees until the error no longer decreases
• How to select M? -Try to recommend defaults, half of them and twice of
them and pick the best
48. Working of Random Forest
• For prediction, a new sample is pushed down the tree
• It is assigned the label of the training sample in the terminal node it ends
up in
• This procedure is iterated over all trees in the ensemble
• Average vote of all trees is reported as random forest prediction
49. Random Forest - Advantages
• One of the most accurate learning algorithms
• Produces a highly accurate classifier
• Runs efficiently on large databases
• Handles thousands of input variables without variable deletion
• Gives estimates of what variables are important in classification
• Generates an internal unbiased estimate of the generalization error as
the forest building progresses
• Effective method for estimating missing data and maintains accuracy
when a large proportion of the data are missing
50. Random Forest - Advantages
• Methods for balancing error in class population unbalanced data sets
• Prototypes are computed that give information about the relation
between the variables and the classification
• Computes proximities between pairs of cases that can be used in
clustering, locating outliers or by scaling gives interesting views of data
• Above capabilities can be extended to unlabeled data, leading to
unsupervised clustering, data views and outlier detection
• Offers an experimental method for detecting variable interactions
51. Random Forest - Disadvantages
• Random forests have been observed to overfit for some datasets with
noisy classification/regression tasks
• For data including categorical variables with different number of levels,
random forests are biased in favor of those attributes with more levels
• Therefore the variable importance scores from random forest are not
reliable for this type of data
52. Logistic Regression
• Models the relationship between a dependent and one or more independent variables
• Allows to look at the fit of the model as well as at the significance of the relationships
(between dependent and independent variables) being modelled
• Estimates the probability of an event occurring - the probability of a pupil continuing in
education post 16
• Predict from a knowledge of relevant independent variables the probability (p) that it is 1
(event occurring) rather than 0
• While in linear regression the relationship between the dependent and the independent
variables is linear, this assumption is not made in logistic regression
53. Logistic Regression
• Logistic regression function $$ P =
frac{e^{alpha+{beta}x}}{1+e^{alpha+{beta}x}} $$
• P is the probability of a 1 and e is base of natural logarithm (about 2.718)
• $$alpha$$ and $$beta$$ are the parameters of the model
• The value of $$alpha$$ yields P when x is zero and $$beta$$ indicates how the
probability of a 1 changes when x changes by a single unit
• Because the relation between x and P is nonlinear, $$beta$$ does not have as
straightforward an interpretation in this model as it does in ordinary linear
regression
55. SupportVector Machine - SVM
• A supervised learning model with associated learning algorithms that analyze
data and recognize patterns
• Given a set of training examples, each marked for belonging to one of two
categories, SVM training algorithm builds a model that assigns new examples into
one category or the other, making it a non-probabilistic binary linear classifier
• An SVM model is a representation of the examples as points in space, mapped so
that the examples of the separate categories are divided by a clear gap that is as
wide as possible
• New examples are then mapped into that same space and predicted to belong to
a category based on which side of the gap they fall on
57. Naive Bayes Classifier
• A family of simple probabilistic classifiers based on applying Bayes'
theorem with strong (naive) independence assumptions between the
features
• A popular method for text categorization, the problem of judging
documents as belonging to one category or the other such as spam or
legitimate, sports or politics etc with word frequencies as the features
• Highly scalable, requires a number of parameters linear in the number of
variables (features/predictors) in a learning problem
61. Clustering
• A technique to find similar groups in data clusters
• Groups data instances that are similar to (near) each other in one cluster and data
instances that are very different (far away) from each other into different clusters
• Called an unsupervised learning task - since no class values denoting an a priori
grouping of the data instances are given, which is the case in supervised learning
• One of the most utilized data mining techniques
• A long history and used in almost every field like medicine, psychology, botany,
sociology, biology, archeology, marketing, insurance, libraries and text clustering
63. Applications
• Group people of similar sizes together to make small, medium and largeT-Shirts
– Tailor-made for each person - too expensive
– One-size-fits-all - does not fit all
• In marketing, segment customers according to their similarities
– Targeted marketing
• Given a collection of text documents, organize them according to their content
similarities
– To produce a topic hierarchy
64. Aspects of clustering
• Clustering algorithms
– Partitional clustering
– Hierarchical clustering
• A distance function - similarity or dissimilarity
• Clustering quality
– Inter-clusters distance maximized
– Intra-clusters distance minimized
• Quality of a clustering process depends on algorithm, distance function and
application
65. K-means Clustering
• A partitional clustering algorithm
• Classify a given data set through a certain number of k clusters (k is fixed)
• Let the set of data points D be {x1, x2, …, xn}
– xi = (xi1, xi2, …, xir) is a vector in a real-valued space X Rr
– r = number of attributes (dimensions) in the data
• Algorithm partitions given data into k clusters
– Each cluster has a cluster center (centroid)
– K is user defined
67. K-Means Algorithm
1. Choose k
2. Randomly choose k data points (seeds) as initial centroids
3. Assign each data point to the closest centroid
4. Re-compute the centroids using the current cluster
memberships
5. If a convergence criterion is not met, go to 3
68. k initial means (in
this case k=3) are
randomly generated
within the data
domain
k clusters are
created by
associating every
observation with
the nearest mean
The centroid of
each of the k
clusters becomes
the new mean
Steps 2 and 3 are
repeated until
convergence has
been reached
K-Means Algorithm
69. Stopping / Convergence Criterion
1. No (or minimum) re-assignments of data points to different clusters
2. No (or minimum) change of centroids or minimum decrease in the sum
of squared error (SSE)
– Cj is the jth cluster, mj is the centroid of cluster Cj (the mean vector of all the data
points in Cj), and dist(x, mj) is the distance between data point x and centroid mj
k
j
C j
j
distSSE
1
2
),(x
mx
72. K Means - Strengths
• Simple to understand and implement
• Efficient:Time complexity O(tkn) where
– n is number of data points
– k is number of clusters
– t is number of iterations
• Since both k and t are small - a linear algorithm
• Most popular clustering algorithm
• Terminates at a local optimum if SSE is used
• The global optimum is hard to find due to complexity
73. K Means -Weaknesses
• Only applicable if mean is defined
– For categorical data, k-mode - the centroid is represented by most frequent
values
• User must specify k
• Sensitive to outliers
– Outliers are data points that are very far away from other data points
– Outliers could be errors in the data recording or some special data points with
very different values
75. Handling Outliers
• Remove data points in the clustering process that are much further away
from the centroids than other data points
• Perform random sampling
– Since in sampling we only choose a small subset of the data points, the
chance of selecting an outlier is very small
– Assign the rest of the data points to the clusters by distance or similarity
comparison or classification
79. K-Means
• Still the most popular algorithm - simplicity, efficiency
• Other clustering algorithms have their own weaknesses
• No clear evidence that any other clustering algorithm performs better in
general
– although other algorithms could be more suitable for some specific types of
data or applications
• Comparing different clustering algorithms is a difficult task
• No one knows the correct clusters
80. Clusters Representation
• Use the centroid of each cluster to represent the cluster
• Compute the radius and standard deviation of the cluster to determine its
spread in each dimension
• Centroid representation alone works well if the clusters are of the hyper-
spherical shape
• If clusters are elongated or are of other shapes, centroids are not
sufficient
81. Cluster Classification
• All the points in a cluster have the same class label - the
cluster ID
• Run a supervised learning algorithm on the data to find a
classification model
84. Single Link Method
• The distance between two clusters is the distance between two closest
data points in the two clusters, one data point from each cluster
• It can find arbitrarily shaped clusters, but
– It may cause the undesirable chain effect by noisy points
85. Complete Link Method
• Distance between two clusters is the distance of two furthest data points
in the two clusters
• Sensitive to outliers because they are far away
86. Average Link Method
• Distance between two clusters is the average distance of all pair-wise
distances between the data points in two clusters
• A compromise between
– the sensitivity of complete-link clustering to outliers
– the tendency of single-link clustering to form long chains that do not
correspond to the intuitive notion of clusters as compact, spherical objects
88. Algorithmic Complexity
• All the algorithms are at least O(n2)
– n is the number of data points
• Single link can be done in O(n2)
• Complete and average links can be done in O(n2logn)
• Due the complexity, hard to use for large data sets
– Sampling
– Scale-up methods (BIRCH)
89. Distance Functions
• Key to clustering
• similarity and dissimilarity are also commonly used terms
• Numerous distance functions
– Different types of data
• Numeric data
• Nominal data
– Different specific applications
90. Distance Functions - Numeric Attributes
• Euclidean distance
• Manhattan (city block) distance
• Denote distance with dist(xi, xj) where xi and xj are data points (vectors)
• They are special cases of Minkowski distance
• h is positive integer
hh
jrir
h
ji
h
jiji xxxxxxdist
1
2211 ))(...)()((),( xx
91. Distance Formulae
• If h = 2, it is the Euclidean distance
• If h = 1, it is the Manhattan distance
• Weighted Euclidean distance
22
22
2
11 )(...)()(),( jrirjijiji xxxxxxdist xx
||...||||),( 2211 jrirjijiji xxxxxxdist xx
22
222
2
111 )(...)()(),( jrirrjijiji xxwxxwxxwdist xx
92. Distance Formulae
• Squared Euclidean distance - to place progressively greater weight on
data points that are further apart
• Chebychev distance - one wants to define two data points as different if
they are different on any one of the attributes
22
22
2
11 )(...)()(),( jrirjijiji xxxxxxdist xx
|)|...,|,||,max(|),( 2211 jrirjijiji xxxxxxdist xx
93. Curse of Dimensionality
• Various problems that arise analyzing and organizing data in high
dimensional spaces do not occur in low dimensional space like 2D or 3D
• In the context of classification/function approximation, performance of
classification algorithm can improve by removing irrelevant features
94. Dimensionality Reduction - Applications
• Information Retrieval – web documents where dimensionality is
vocabulary of words
• Recommender systems – large scale of ratings matrix
• Social networks – social graph with large number of users
• Biology – gene expressions
• Image processing – facial recognition
95. Dimensionality Reduction
• Defying the curse of dimensionality - simpler models result in improved generalization
• Classification algorithm may not scale up to the size of the full feature set either in space or
time
• Improves understanding of domain
• Cheaper to collect and store data based on reduced feature set
• TwoTechniques
– FeatureConstruction
– Feature Selection
97. Techniques
• Linear Discriminant Analysis – LDA
– Tries to identify attributes that account for the most variance between classes
– LDA compared to PCA is a supervised method using known labels
• Principal component analysis – PCA
– Identifies combination of linearly co-related attributes (principal components or directions in
feature space) that accounts for the most variance of data
– Plot the different samples on 2 first principal components
• Singular Value Decomposition – SVD
– Factorization of real or complex matrix
– Derived from PCA
98. Feature Construction
• Linear methods
– Principal component analysis (PCA)
– Independent component analysis (ICA)
– Fisher Linear Discriminant (LDA)
– ….
• Non-linear methods
– Kernel PCA
– Non linear component analysis (NLCA)
– Local linear embedding (LLE)
– ….
99. Principal component analysis - PCA
• A tool in exploratory data analysis and to create predictive models
• Involves calculating Eigenvalue decomposition of a data covariance matrix,
usually after mean centering the data for each attribute
• Mathematically defined as an orthogonal linear transformation to map data to a
new coordinate system such that the greatest variance by any projection of the
data comes to lie on the first coordinate (called the first principal component), the
second greatest variance on the second coordinate, and so on
• Theoretically the optimal linear scheme, in terms of least mean square error, for
compressing a set of high dimensional vectors into a set of lower dimensional
vectors and then reconstructing the original set
102. Fisher Linear Discriminant
• A classification method that projects high-dimensional data
onto a line and performs classification in this one-
dimensional space
• The projection maximizes the distance between the means of
the two classes while minimizing the variance within each
class
104. Linear Discriminant Analysis - LDA
• A generalization of Fisher's linear
discriminant
• A method used to find a linear
combination of features that
characterizes or separates two or more
classes of objects or events
• The resulting combination may be used
as a linear classifier or more commonly
for dimensionality reduction before later
classification
106. Kernel PCA
• Classic PCA approach is a linear projection technique that works well if
the data is linearly separable
• In the case of linearly inseparable data, a nonlinear technique is required
if the task is to reduce the dimensionality of a dataset
107. Kernel PCA
• The basic idea to deal with linearly inseparable data is to project it onto a higher
dimensional space where it becomes linearly separable
• Consider a nonlinear mapping function ϕ so that the mapping of a sample xx can be written
as xx→ϕ(xx), which is called kernel function
• The term kernel describes a function that calculates the dot product of the images of the
samples xx under ϕ
• κ(xxi,xxj)=ϕ(xxi)Tϕ(xxj)
• Function ϕ maps the original d-dimensional features into a larger k-dimensional feature
space by creating nonlinear combinations of the original features
109. SingularValue Decomposition - SVD
• A mechanism to break a matrix into simpler meaningful pieces
• Used to detect groupings in data
• A factorization of a real or complex matrix
• A general rectangular M-by-N matrix A has a SVD into the product of an
M-by-N orthogonal matrix U, an N-by-N diagonal matrix of singular
values S and the transpose of an N-by-N orthogonal square matrixV
– A = U SV^T
111. Hidden Markov Models - HMM
• A statistical Markov model in which the system being modelled is
assumed to be a Markov process with unobserved (hidden) states
• Used in pattern recognition, such as handwriting and speech analysis
• In simpler Markov models like a Markov chain, the state is directly visible
to the observer, and therefore the state transition probabilities are the
only parameters
• In HMM, the state is not directly visible, but output, dependent on the
state, is visible
112. Hidden Markov Models - HMM
• Each state has a probability distribution over the possible output tokens
• Therefore the sequence of tokens generated by an HMM gives some
information about the sequence of states
• Adjective hidden refers to the state sequence through which the model
passes, not to the parameters of the model
• The model is still referred to as a 'hidden' Markov model even if these
parameters are known exactly
115. Model Evaluation
• How accurate is the classifier?
• When the classifier is wrong, how is it wrong?
• Decide on which classifier (which parameters) to use
and to estimate what the performance of the system
will be
116. Testing Set
• Split the available data into a training set and a test set
• Train the classifier in the training set and evaluate based on
the test set
117. Classifier Accuracy
• The accuracy of a classifier on a given test set is the
percentage of test set tuples that are correctly classified by
the classifier
• Often also referred to as recognition rate
• Error rate (or misclassification rate) is the opposite of
accuracy
118. False PositiveVs. Negative
• When is the model wrong?
– False positives vs. false negatives
– Related to type I and type II errors in statistics
• Often there is a different cost associated with false
positives and false negatives
– Diagnosing diseases
119. Confusion Matrix
• Mechanism to illustrate how a model is performing in terms
of false positives and false negatives
• Provides more information than a single accuracy figure
• Allows thinking about the cost of mistakes
• Extendable to any number of classes
122. Area Under ROC Curve - AUC
• ROC curves can be used to
compare models
• Bigger the AUC, the more
accurate the model
• ROC index is the area under
the ROC curve
124. K-Fold CrossValidation
• Divide the entire data set into k folds
• For each of k experiments, use kth fold for testing and
everything else for training
125. K-Fold CrossValidation
• The accuracy of the system is calculated as the average error across the k
folds
• The main advantages of k-fold cross validation are that every example is
used in testing at some stage and the problem of an unfortunate split is
avoided
• Any value can be used for k
– 10 is most common
– Depends on the data set
126. References
1. W. L. Chao, J. J. Ding, “Integrated Machine Learning Algorithms for Human Age Estimation”, NTU, 2011
2. Phil Simon (March 18, 2013). Too Big to Ignore: The Business Case for Big Data. Wiley
3. Mitchell, T. (1997). Machine Learning, McGraw Hill. ISBN 0-07-042807-7
4. Harnad, Stevan (2008), "The Annotation Game: On Turing (1950) on Computing, Machinery, and Intelligence", in Epstein, Robert;
Peters, Grace, The Turing Test Sourcebook: Philosophical and Methodological Issues in the Quest for the Thinking Computer,
Kluwer
5. Russell, Stuart; Norvig, Peter (2003) [1995] Artificial Intelligence: A Modern Approach (2nd ed.) Prentice Hall
6. Langley, Pat (2011). "The changing science of machine learning". Machine Learning 82 (3): 275–279. doi:10.1007/s10994-011-5242-y
7. Le Roux, Nicolas; Bengio, Yoshua; Fitzgibbon, Andrew (2012). "Improving First and Second-Order Methods by Modeling
Uncertainty". In Sra, Suvrit; Nowozin, Sebastian; Wright, Stephen J. Optimization for Machine Learning. MIT Press. p. 404
8. MI Jordan (2014-09-10). "statistics and machine learning“ Cornell University Library. "Breiman : Statistical Modeling: The Two
Cultures (with comments and a rejoinder by the author)”
9. Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). An Introduction to Statistical Learning. Springer
10. Yoshua Bengio (2009). Learning Deep Architectures for AI. Now Publishers Inc. pp. 1–3. ISBN 978-1-60198-294-0
11. A. M. Tillmann, "On the Computational Intractability of Exact and Approximate Dictionary Learning", IEEE Signal Processing
Letters 22(1), 2015: 45–49
12. Aharon, M, M Elad, and A Bruckstein. 2006. "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse
Representation." Signal Processing, IEEE Transactions on 54 (11): 4311-4322
13. Goldberg, David E.; Holland, John H. (1988). "Genetic algorithms and machine learning". Machine Learning 3 (2): 95–99
127. ThankYou
Check Out My LinkedIn Profile at
https://in.linkedin.com/in/girishkhanzode