Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Random forest
 Random forest is a classifier
 An ensemble classifier using many decision tree models.
 Can be used for classification and regression
 Accuracy and variable importance information is provided with the result
 A random forest is a collection of unpruned CART-like trees following specific
rules for
 Tree growing
 Tree combination
 Self-testing
 Post-processing
 Trees are grown using binary partitioning
 Similar to decision tree with a few differences
 For each split-point, the search is not over all variables but just over a part of variables
 No pruning necessary. Trees can be grown until each node contain just very few
observations
 Advantages over decision tree
 Better prediction (in general)
 No parameter tuning necessary with RF
 Terminology
 Training size (N)
 Total number of attributes (M)
 Number of attributes used (m)
 Total number of trees (n)
 A random seed is chosen which pulls out at random a collection of samples from
training dataset while maintaining the class distribution
 With this selected dataset, a random set of attributes from original dataset is
chosen based on user defined values. All the input variables are not considered
because of enormous computation and high chances of over fitting
 In a dataset, where M is the total number of input attributes in the dataset, only
m attributes are chosen at random for each tree where m<M
 The attribute for this set creates the best possible split using the gini index to
develop a decision tree model. This process repeats for each of the branches until
the termination condition stating that the leaves are the nodes that are too small
to split.
 Information from random forest
 Classification accuracy
 Variable importance
 Outliers (Classification)
 Missing Data Estimation
 Error Rates for Random Forest Object
 Advantages
 No need for pruning trees
 Accuracy and variable importance generated automatically
 Overfitting is not a problem
 Not very sensitive to outliers in training data
 Easy to set parameters
 Limitations
 Regression cant predict beyond range in the training data
 Extreme values are not predicted accurately
 Applications
 Classification
 Land cover classification
 Cloud screening
 Regression
 Continuous field mapping
 Biomass mapping
 Efficient use of Multi-Core Technology
 Though it is OS dependent, but the usage of Hadoop guarantees efficient use of
multi-core
 Its a technique from machine learning for learning a linear classifier from labelled
examples
 Similar to perceptron algorithm
 While perceptron algorithm uses additive weight-update scheme, winnowing uses
a multiplicative weight-update scheme
 Performs well when many of the features given to the learner turns out to be
irrelevant
 During training, its shown a sequence of positive and negative examples. From
these it learn a decision hyperplane which can be used to novel examples as
positive or negative
 Uses linear threshold function (like the perceptron training algorithm) as
hypothesis and performs incremental updates to its current hypothesis
 Initialize the weights w1,…….wn to 1
 Both winnow and perceptron algorithm uses the same classification scheme
 The winnowing algorithms differs form the perceptron algorithm in its updating
scheme.
 When misclassifying a positive training example x (i.e. a prediction was negative because
w.x was too small)
 When misclassifying a negative training example x (i.e. Prediction was positive because
w.x was too large)
SPAM Example – each email is a Boolean vector indicating which phase appears
and which don’t
SPAM if at least one of the phrase in S is present
Random forest
 Initialize the weights w1, …..wn = 1 on the n variables
 Given an example x = (x1,……..xn), output 1 if
 Else output 0
 If the algorithm makes a mistake:
 On positive – if it predicts 0 when f(x)=1, then for each xi equal to 1, double the value of
wi
 On negative – if it predicts 1 when f(x)=0, then for each xi equal to 1 cut the value of wi
in half
Random forest
 The principle of maximum entropy states that, subject to precisely stated prior
data, the probability distribution which best represents the current state of
knowledge is the one with the largest entropy.
 Commonly used in Natural Language Processing, speech and Information
Retrieval
 What is maximum entropy classifier?
 Probabilistic classifier which belongs to the class of exponential models
 Does not assume the features that are conditionally independent of each other
 Based on the principle of maximum entropy and forms all models that fit our training
data and selects the one which has the largest entropy
 A piece of information is testable if it can be determined whether a given
distribution is consistent with it
 The expectation of variable x is 2.87
 And p2 + p3 > 0.6
 Are statements of testable information
 Maximum entropy procedure consist of seeking the probability distribution which
maximizes information entropy, subject to constrains of the information.
 Entropy maximization takes place under a single constrain: the sum of
probabilities must be one
 When to use maximum entropy?
 Since it makes minimum assumptions, we use it when we don’t know about the prior
distribution
 Used when we cannot assume conditional independence of the features
 The principle of maximum entropy is commonly applied in two ways to inferential
problems
 Prior Probabilities: its often used to obtain prior probability distribution for Bayesian
inference
 Maximum Entropy Models: involved in model specifications which are widely used in
natural language processing. Ex. Logistic regression

More Related Content

What's hot

Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
Andrew Ferlitsch
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Random Forest
Random ForestRandom Forest
Random Forest
Abdullah al Mamun
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Simplilearn
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
Knoldus Inc.
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
Ashraf Uddin
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
Andrew Ferlitsch
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Milind Gokhale
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
Zhen Li
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
GauravSonawane51
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Shrey Malik
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 

What's hot (20)

Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Random Forest
Random ForestRandom Forest
Random Forest
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 

Similar to Random forest

13 random forest
13 random forest13 random forest
13 random forest
Vishal Dutt
 
Classifiers
ClassifiersClassifiers
Classifiers
Ayurdata
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
Matthew Magistrado
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
IOSR Journals
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2
NBER
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
DynamicPitch
 
Download It
Download ItDownload It
Download It
butest
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
Nihar Ranjan
 
dm1.pdf
dm1.pdfdm1.pdf
dm1.pdf
MarriamAmir1
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
AaryanArora10
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
Vikash Kumar
 
Machine learning and reinforcement learning
Machine learning and reinforcement learningMachine learning and reinforcement learning
Machine learning and reinforcement learning
jenil desai
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
Abhishek Vijayvargia
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
Valerii Klymchuk
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
jim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
butest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
kevinlan
 

Similar to Random forest (20)

13 random forest
13 random forest13 random forest
13 random forest
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 
Download It
Download ItDownload It
Download It
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
dm1.pdf
dm1.pdfdm1.pdf
dm1.pdf
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Machine learning and reinforcement learning
Machine learning and reinforcement learningMachine learning and reinforcement learning
Machine learning and reinforcement learning
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 

More from Ujjawal

fMRI in machine learning
fMRI in machine learningfMRI in machine learning
fMRI in machine learning
Ujjawal
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learning
Ujjawal
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
Ujjawal
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
Ujjawal
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
Ujjawal
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
Ujjawal
 
Vector space classification
Vector space classificationVector space classification
Vector space classification
Ujjawal
 
Scoring, term weighting and the vector space
Scoring, term weighting and the vector spaceScoring, term weighting and the vector space
Scoring, term weighting and the vector space
Ujjawal
 
Bayes’ theorem and logistic regression
Bayes’ theorem and logistic regressionBayes’ theorem and logistic regression
Bayes’ theorem and logistic regression
Ujjawal
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Ujjawal
 

More from Ujjawal (10)

fMRI in machine learning
fMRI in machine learningfMRI in machine learning
fMRI in machine learning
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learning
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Vector space classification
Vector space classificationVector space classification
Vector space classification
 
Scoring, term weighting and the vector space
Scoring, term weighting and the vector spaceScoring, term weighting and the vector space
Scoring, term weighting and the vector space
 
Bayes’ theorem and logistic regression
Bayes’ theorem and logistic regressionBayes’ theorem and logistic regression
Bayes’ theorem and logistic regression
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 

Recently uploaded

393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
Ladislau5
 
Agritech Ecosystem in Indonesia2 2023.pdf
Agritech Ecosystem in Indonesia2 2023.pdfAgritech Ecosystem in Indonesia2 2023.pdf
Agritech Ecosystem in Indonesia2 2023.pdf
SafiraMajory
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
evwcarr
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Deepikakumari457585
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
talha2khan2k
 
Toward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive ComputingToward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive Computing
Larry Smarr
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
SantuJana12
 
Tailoring a Seamless Data Warehouse Architecture
Tailoring a Seamless Data Warehouse ArchitectureTailoring a Seamless Data Warehouse Architecture
Tailoring a Seamless Data Warehouse Architecture
GetOnData
 
一比一原版(macewan毕业证书)加拿大麦科文大学毕业证如何办理
一比一原版(macewan毕业证书)加拿大麦科文大学毕业证如何办理一比一原版(macewan毕业证书)加拿大麦科文大学毕业证如何办理
一比一原版(macewan毕业证书)加拿大麦科文大学毕业证如何办理
da42ki0
 
emotional interface - dehligame satta for you
emotional interface  -  dehligame satta for youemotional interface  -  dehligame satta for you
emotional interface - dehligame satta for you
bkldehligame1
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics July 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics July 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics July 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics July 2024
Vietnam Cotton & Spinning Association
 
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Integrated Optical Fiber/Wireless Systemsfor Environmental MonitoringIntegrated Optical Fiber/Wireless Systemsfor Environmental Monitoring
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Larry Smarr
 
BGTUG Meeting Q3 2024 - Get Ready for Summer
BGTUG Meeting Q3 2024 - Get Ready for SummerBGTUG Meeting Q3 2024 - Get Ready for Summer
BGTUG Meeting Q3 2024 - Get Ready for Summer
Stanislava Tropcheva
 
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
da42ki0
 
chapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someonechapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someone
abeeeeeeeer588
 
IOT NOTES BASED ON THE ENGINEERING ACADEMICS
IOT NOTES BASED ON THE ENGINEERING ACADEMICSIOT NOTES BASED ON THE ENGINEERING ACADEMICS
IOT NOTES BASED ON THE ENGINEERING ACADEMICS
sunejakatkar1
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
District 11 Solutions
 
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Larry Smarr
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
wojakmodern
 
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
deepikakumaridk25
 

Recently uploaded (20)

393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
 
Agritech Ecosystem in Indonesia2 2023.pdf
Agritech Ecosystem in Indonesia2 2023.pdfAgritech Ecosystem in Indonesia2 2023.pdf
Agritech Ecosystem in Indonesia2 2023.pdf
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
 
Toward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive ComputingToward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive Computing
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
 
Tailoring a Seamless Data Warehouse Architecture
Tailoring a Seamless Data Warehouse ArchitectureTailoring a Seamless Data Warehouse Architecture
Tailoring a Seamless Data Warehouse Architecture
 
一比一原版(macewan毕业证书)加拿大麦科文大学毕业证如何办理
一比一原版(macewan毕业证书)加拿大麦科文大学毕业证如何办理一比一原版(macewan毕业证书)加拿大麦科文大学毕业证如何办理
一比一原版(macewan毕业证书)加拿大麦科文大学毕业证如何办理
 
emotional interface - dehligame satta for you
emotional interface  -  dehligame satta for youemotional interface  -  dehligame satta for you
emotional interface - dehligame satta for you
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics July 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics July 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics July 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics July 2024
 
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Integrated Optical Fiber/Wireless Systemsfor Environmental MonitoringIntegrated Optical Fiber/Wireless Systemsfor Environmental Monitoring
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
 
BGTUG Meeting Q3 2024 - Get Ready for Summer
BGTUG Meeting Q3 2024 - Get Ready for SummerBGTUG Meeting Q3 2024 - Get Ready for Summer
BGTUG Meeting Q3 2024 - Get Ready for Summer
 
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
一比一原版(sfu毕业证书)加拿大西蒙菲莎大学毕业证如何办理
 
chapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someonechapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someone
 
IOT NOTES BASED ON THE ENGINEERING ACADEMICS
IOT NOTES BASED ON THE ENGINEERING ACADEMICSIOT NOTES BASED ON THE ENGINEERING ACADEMICS
IOT NOTES BASED ON THE ENGINEERING ACADEMICS
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
 
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
 
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
Cal Girls The Lalit Jaipur 8445551418 Khusi Top Class Girls Call Jaipur Avail...
 

Random forest

  • 2.  Random forest is a classifier  An ensemble classifier using many decision tree models.  Can be used for classification and regression  Accuracy and variable importance information is provided with the result  A random forest is a collection of unpruned CART-like trees following specific rules for  Tree growing  Tree combination  Self-testing  Post-processing  Trees are grown using binary partitioning
  • 3.  Similar to decision tree with a few differences  For each split-point, the search is not over all variables but just over a part of variables  No pruning necessary. Trees can be grown until each node contain just very few observations  Advantages over decision tree  Better prediction (in general)  No parameter tuning necessary with RF  Terminology  Training size (N)  Total number of attributes (M)  Number of attributes used (m)  Total number of trees (n)
  • 4.  A random seed is chosen which pulls out at random a collection of samples from training dataset while maintaining the class distribution  With this selected dataset, a random set of attributes from original dataset is chosen based on user defined values. All the input variables are not considered because of enormous computation and high chances of over fitting  In a dataset, where M is the total number of input attributes in the dataset, only m attributes are chosen at random for each tree where m<M  The attribute for this set creates the best possible split using the gini index to develop a decision tree model. This process repeats for each of the branches until the termination condition stating that the leaves are the nodes that are too small to split.
  • 5.  Information from random forest  Classification accuracy  Variable importance  Outliers (Classification)  Missing Data Estimation  Error Rates for Random Forest Object  Advantages  No need for pruning trees  Accuracy and variable importance generated automatically  Overfitting is not a problem  Not very sensitive to outliers in training data  Easy to set parameters
  • 6.  Limitations  Regression cant predict beyond range in the training data  Extreme values are not predicted accurately  Applications  Classification  Land cover classification  Cloud screening  Regression  Continuous field mapping  Biomass mapping
  • 7.  Efficient use of Multi-Core Technology  Though it is OS dependent, but the usage of Hadoop guarantees efficient use of multi-core
  • 8.  Its a technique from machine learning for learning a linear classifier from labelled examples  Similar to perceptron algorithm  While perceptron algorithm uses additive weight-update scheme, winnowing uses a multiplicative weight-update scheme  Performs well when many of the features given to the learner turns out to be irrelevant  During training, its shown a sequence of positive and negative examples. From these it learn a decision hyperplane which can be used to novel examples as positive or negative  Uses linear threshold function (like the perceptron training algorithm) as hypothesis and performs incremental updates to its current hypothesis
  • 9.  Initialize the weights w1,…….wn to 1  Both winnow and perceptron algorithm uses the same classification scheme  The winnowing algorithms differs form the perceptron algorithm in its updating scheme.  When misclassifying a positive training example x (i.e. a prediction was negative because w.x was too small)  When misclassifying a negative training example x (i.e. Prediction was positive because w.x was too large)
  • 10. SPAM Example – each email is a Boolean vector indicating which phase appears and which don’t SPAM if at least one of the phrase in S is present
  • 12.  Initialize the weights w1, …..wn = 1 on the n variables  Given an example x = (x1,……..xn), output 1 if  Else output 0  If the algorithm makes a mistake:  On positive – if it predicts 0 when f(x)=1, then for each xi equal to 1, double the value of wi  On negative – if it predicts 1 when f(x)=0, then for each xi equal to 1 cut the value of wi in half
  • 14.  The principle of maximum entropy states that, subject to precisely stated prior data, the probability distribution which best represents the current state of knowledge is the one with the largest entropy.  Commonly used in Natural Language Processing, speech and Information Retrieval  What is maximum entropy classifier?  Probabilistic classifier which belongs to the class of exponential models  Does not assume the features that are conditionally independent of each other  Based on the principle of maximum entropy and forms all models that fit our training data and selects the one which has the largest entropy
  • 15.  A piece of information is testable if it can be determined whether a given distribution is consistent with it  The expectation of variable x is 2.87  And p2 + p3 > 0.6  Are statements of testable information  Maximum entropy procedure consist of seeking the probability distribution which maximizes information entropy, subject to constrains of the information.  Entropy maximization takes place under a single constrain: the sum of probabilities must be one
  • 16.  When to use maximum entropy?  Since it makes minimum assumptions, we use it when we don’t know about the prior distribution  Used when we cannot assume conditional independence of the features  The principle of maximum entropy is commonly applied in two ways to inferential problems  Prior Probabilities: its often used to obtain prior probability distribution for Bayesian inference  Maximum Entropy Models: involved in model specifications which are widely used in natural language processing. Ex. Logistic regression