Scikit-Learn - Permutation - Importance - Py at 38fba057 Scikit-Learn - Scikit-Learn GitHub

This Python file contains code for calculating permutation importance, a technique for evaluating the importance of features in machine learning models. It takes in an already fitted estimator, data X, targets y, and optionally other parameters like the scoring metric, number of repeats, and number of jobs. It calculates a baseline score for the estimator on the data, then repeatedly shuffles each feature column and re-scores to calculate the drop in performance, which is the permutation importance of that feature. It returns a Bunch object containing the mean, standard deviation, and raw scores of feature importances.

Uploaded by

June June

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

Scikit-Learn - Permutation - Importance - Py at 38fba057 Scikit-Learn - Scikit-Learn GitHub

Uploaded by

June June

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

scikit-learn / scikit-learn

Code Issues 1.6k Pull requests 740 Discussions Actions Projects

2beed55847

scikit-learn / sklearn / inspection / _permutation_importance.py

thomasjpfan
FIX Passes global configuration when spawning joblib jobs (#176… …
History

9
contributors

156 lines (128 sloc)

6.41 KB

1 """Permutation importance for estimators."""

2 import numpy as np
3 from joblib import Parallel
4
5 from ..metrics import check_scoring
6 from ..utils import Bunch
7 from ..utils import check_random_state
8 from ..utils import check_array
9 from ..utils.validation import _deprecate_positional_args
10 from ..utils.fixes import delayed
11
12
13 def _weights_scorer(scorer, estimator, X, y, sample_weight):
14 if sample_weight is not None:
15 return scorer(estimator, X, y, sample_weight)
16 return scorer(estimator, X, y)
17
18
19 def _calculate_permutation_scores(estimator, X, y, sample_weight, col_idx,
20 random_state, n_repeats, scorer):
21 """Calculate score when `col_idx` is permuted."""
22 random_state = check_random_state(random_state)
23
24 # Work on a copy of X to to ensure thread-safety in case of threading based
25 # parallelism. Furthermore, making a copy is also useful when the joblib
26 # backend is 'loky' (default) or the old 'multiprocessing': in those cases,
27 # if X is large it will be automatically be backed by a readonly memory map
28 # (memmap). X.copy() on the other hand is always guaranteed to return a
29 # writable data-structure whose columns can be shuffled inplace.
30 X_permuted = X.copy()
31 scores = np.zeros(n_repeats)
32 shuffling_idx = np.arange(X.shape[0])
33 for n_round in range(n_repeats):
34 random_state.shuffle(shuffling_idx)
35 if hasattr(X_permuted, "iloc"):
36 col = X_permuted.iloc[shuffling_idx, col_idx]
37 col.index = X_permuted.index
38 X_permuted.iloc[:, col_idx] = col
39 else:
40 X_permuted[:, col_idx] = X_permuted[shuffling_idx, col_idx]
41 feature_score = _weights_scorer(
42 scorer, estimator, X_permuted, y, sample_weight
43 )
44 scores[n_round] = feature_score
45
46 return scores
47
48
49 @_deprecate_positional_args
50 def permutation_importance(estimator, X, y, *, scoring=None, n_repeats=5,
51 n_jobs=None, random_state=None, sample_weight=None):
52 """Permutation importance for feature evaluation [BRE]_.
53
54 The :term:`estimator` is required to be a fitted estimator. `X` can be the
55 data set used to train the estimator or a hold-out set. The permutation
56 importance of a feature is calculated as follows. First, a baseline metric,
57 defined by :term:`scoring`, is evaluated on a (potentially different)
58 dataset defined by the `X`. Next, a feature column from the validation set
59 is permuted and the metric is evaluated again. The permutation importance
60 is defined to be the difference between the baseline metric and metric from
61 permutating the feature column.
62
63 Read more in the :ref:`User Guide <permutation_importance>`.
64
65 Parameters
66 ----------
67 estimator : object
68 An estimator that has already been :term:`fitted` and is compatible
69 with :term:`scorer`.
70
71 X : ndarray or DataFrame, shape (n_samples, n_features)
72 Data on which permutation importance will be computed.
73
74 y : array-like or None, shape (n_samples, ) or (n_samples, n_classes)
75 Targets for supervised or `None` for unsupervised.
76
77 scoring : string, callable or None, default=None
78 Scorer to use. It can be a single
79 string (see :ref:`scoring_parameter`) or a callable (see
80 :ref:`scoring`). If None, the estimator's default scorer is used.
81
82 n_repeats : int, default=5
83 Number of times to permute a feature.
84
85 n_jobs : int or None, default=None
86 Number of jobs to run in parallel. The computation is done by computing
87 permutation score for each columns and parallelized over the columns.
88 `None` means 1 unless in a :obj:`joblib.parallel_backend` context.
89 `-1` means using all processors. See :term:`Glossary <n_jobs>`
90 for more details.
91
92 random_state : int, RandomState instance, default=None
93 Pseudo-random number generator to control the permutations of each
94 feature.
95 Pass an int to get reproducible results across function calls.
96 See :term: `Glossary <random_state>`.
97
98 sample_weight : array-like of shape (n_samples,), default=None
99 Sample weights used in scoring.
100
101 .. versionadded:: 0.24
102
103 Returns
104 -------
105 result : :class:`~sklearn.utils.Bunch`
106 Dictionary-like object, with the following attributes.
107
108 importances_mean : ndarray, shape (n_features, )
109 Mean of feature importance over `n_repeats`.
110 importances_std : ndarray, shape (n_features, )
111 Standard deviation over `n_repeats`.
112 importances : ndarray, shape (n_features, n_repeats)
113 Raw permutation importance scores.
114
115 References
116 ----------
117 .. [BRE] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
118 2001. https://doi.org/10.1023/A:1010933404324
119
120 Examples
121 --------
122 >>> from sklearn.linear_model import LogisticRegression
123 >>> from sklearn.inspection import permutation_importance
124 >>> X = [[1, 9, 9],[1, 9, 9],[1, 9, 9],
125 ... [0, 9, 9],[0, 9, 9],[0, 9, 9]]
126 >>> y = [1, 1, 1, 0, 0, 0]
127 >>> clf = LogisticRegression().fit(X, y)
128 >>> result = permutation_importance(clf, X, y, n_repeats=10,
129 ... random_state=0)
130 >>> result.importances_mean
131 array([0.4666..., 0. , 0. ])
132 >>> result.importances_std
133 array([0.2211..., 0. , 0. ])
134 """
135 if not hasattr(X, "iloc"):
136 X = check_array(X, force_all_finite='allow-nan', dtype=None)
137
138 # Precompute random seed from the random state to be used
139 # to get a fresh independent RandomState instance for each
140 # parallel call to _calculate_permutation_scores, irrespective of
141 # the fact that variables are shared or not depending on the active
142 # joblib backend (sequential, thread-based or process-based).
143 random_state = check_random_state(random_state)
144 random_seed = random_state.randint(np.iinfo(np.int32).max + 1)
145
146 scorer = check_scoring(estimator, scoring=scoring)
147 baseline_score = _weights_scorer(scorer, estimator, X, y, sample_weight)
148
149 scores = Parallel(n_jobs=n_jobs)(delayed(_calculate_permutation_scores)(
150 estimator, X, y, sample_weight, col_idx, random_seed, n_repeats, scorer
151 ) for col_idx in range(X.shape[1]))
152
153 importances = baseline_score - np.array(scores)
154 return Bunch(importances_mean=np.mean(importances, axis=1),
155 importances_std=np.std(importances, axis=1),
156 importances=importances)

IT (402) Project File Sample PDF
No ratings yet
IT (402) Project File Sample PDF
15 pages
Vi-Carrealtime 20.1 Installation Guide
No ratings yet
Vi-Carrealtime 20.1 Installation Guide
17 pages
9,12,19,68 - ML Assignment-2
No ratings yet
9,12,19,68 - ML Assignment-2
5 pages
AM19_ADL_semi-supervised-model
No ratings yet
AM19_ADL_semi-supervised-model
3 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
SVM
No ratings yet
SVM
8 pages
Implementing Custom Randomsearchcv: 'Red' 'Blue'
No ratings yet
Implementing Custom Randomsearchcv: 'Red' 'Blue'
1 page
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
LSTM - Jupyter Notebook
No ratings yet
LSTM - Jupyter Notebook
7 pages
graph_analysis3_code
No ratings yet
graph_analysis3_code
2 pages
Soft Sensor Code
No ratings yet
Soft Sensor Code
4 pages
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
Soft Sensor Code
No ratings yet
Soft Sensor Code
4 pages
ml lab
No ratings yet
ml lab
23 pages
ANN_EXPERIENTIAL_LEARNING
No ratings yet
ANN_EXPERIENTIAL_LEARNING
43 pages
RandomForest
No ratings yet
RandomForest
8 pages
Machine learning algorithms are generally categorized into three main types
No ratings yet
Machine learning algorithms are generally categorized into three main types
7 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
Practical 6
No ratings yet
Practical 6
8 pages
Vertopal.com Experiment01 Baseline Models Accuracy
No ratings yet
Vertopal.com Experiment01 Baseline Models Accuracy
35 pages
Slides on DataII
No ratings yet
Slides on DataII
26 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Assignment8_22051899-1
No ratings yet
Assignment8_22051899-1
2 pages
21brs1474 ML Lab 2
No ratings yet
21brs1474 ML Lab 2
25 pages
DL JOURNAL - Merged
No ratings yet
DL JOURNAL - Merged
27 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
AD3411-DATA SCIENCE AND ANALYTICS LABORATORY
No ratings yet
AD3411-DATA SCIENCE AND ANALYTICS LABORATORY
27 pages
20MIS1025 - DecisionTree - Ipynb - Colaboratory
No ratings yet
20MIS1025 - DecisionTree - Ipynb - Colaboratory
4 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
graph_analysis2_code
No ratings yet
graph_analysis2_code
2 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
10 pages
data preprocessing
No ratings yet
data preprocessing
9 pages
TheAlgorithms_Python-hashing
No ratings yet
TheAlgorithms_Python-hashing
16 pages
Lab 7
No ratings yet
Lab 7
14 pages
Machine Learning Practicals
No ratings yet
Machine Learning Practicals
7 pages
Lab-5 Report
No ratings yet
Lab-5 Report
11 pages
LTSM Model
No ratings yet
LTSM Model
5 pages
Scikit
No ratings yet
Scikit
4 pages
Seguridad ML
No ratings yet
Seguridad ML
7 pages
Machine
100% (1)
Machine
45 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Apply Logistic Regression To Amazon Reviews Data Set (M)
No ratings yet
Apply Logistic Regression To Amazon Reviews Data Set (M)
11 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
6 pages
6.VAEs For Anomaly Detection in Datasets
No ratings yet
6.VAEs For Anomaly Detection in Datasets
3 pages
ML MANUAL WITH OUTPUTS (2)
No ratings yet
ML MANUAL WITH OUTPUTS (2)
30 pages
Pathloss CNN
No ratings yet
Pathloss CNN
2 pages
Classification Is For Predicting Type and Regression Is For Predicting Value
No ratings yet
Classification Is For Predicting Type and Regression Is For Predicting Value
4 pages
3.1. Cross-Validation - Evaluating Estimator Performance - Scikit-Learn 1.3.0 Documentation
No ratings yet
3.1. Cross-Validation - Evaluating Estimator Performance - Scikit-Learn 1.3.0 Documentation
12 pages
ML Lab Programs
No ratings yet
ML Lab Programs
23 pages
Assignment - 01
No ratings yet
Assignment - 01
4 pages
Exp 5
No ratings yet
Exp 5
8 pages
Module_5
No ratings yet
Module_5
5 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
Assignment 4 Instructions
No ratings yet
Assignment 4 Instructions
5 pages
Merged Document
No ratings yet
Merged Document
49 pages
Utf 8''week4
No ratings yet
Utf 8''week4
15 pages
Neural_Network
No ratings yet
Neural_Network
7 pages
Kabir Khan 1147 - 4
No ratings yet
Kabir Khan 1147 - 4
4 pages
Naive Bayes Project
No ratings yet
Naive Bayes Project
5 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
OOPS Module 3 Notes
No ratings yet
OOPS Module 3 Notes
9 pages
Mapper Rundesign Vol2
No ratings yet
Mapper Rundesign Vol2
455 pages
COU3305 Computer Security Concepts Course Synopsis
No ratings yet
COU3305 Computer Security Concepts Course Synopsis
2 pages
HPE Alletra 9000 HP-UX Implementation Guide
No ratings yet
HPE Alletra 9000 HP-UX Implementation Guide
58 pages
IntSights Retail ECommerce Threat Report V5
100% (1)
IntSights Retail ECommerce Threat Report V5
22 pages
Scrolling Text On LCD of AT89S52 Development Board
No ratings yet
Scrolling Text On LCD of AT89S52 Development Board
9 pages
Classera Logo - Google Search
No ratings yet
Classera Logo - Google Search
1 page
DS-2CD6924G0-IHS NFC Datasheet V5.5.84 20220114
No ratings yet
DS-2CD6924G0-IHS NFC Datasheet V5.5.84 20220114
6 pages
Application of IT
No ratings yet
Application of IT
3 pages
Instructions - FAQ's - Onsite Coding Test
No ratings yet
Instructions - FAQ's - Onsite Coding Test
6 pages
Automation Test Engineer: Master's Program
No ratings yet
Automation Test Engineer: Master's Program
18 pages
Robert Faber - Continuum Card Game
No ratings yet
Robert Faber - Continuum Card Game
34 pages
roll-no-68-mad-pro-report
No ratings yet
roll-no-68-mad-pro-report
15 pages
BSC (Hons) Computer Science
No ratings yet
BSC (Hons) Computer Science
6 pages
Programs
No ratings yet
Programs
3 pages
Need For Speed SE (CD-ROM Classics Edition) - Reference Card
No ratings yet
Need For Speed SE (CD-ROM Classics Edition) - Reference Card
13 pages
Agentic AI
No ratings yet
Agentic AI
26 pages
591 LG 2400018
No ratings yet
591 LG 2400018
16 pages
Referat 3
No ratings yet
Referat 3
3 pages
Spring 2024-Math 3B 18190
No ratings yet
Spring 2024-Math 3B 18190
1 page
Soft Computing Practical Teacher Manual
No ratings yet
Soft Computing Practical Teacher Manual
87 pages
Diablo-4 1
No ratings yet
Diablo-4 1
6 pages
Minimal Audio EULA
No ratings yet
Minimal Audio EULA
3 pages
SCJMapper QGuide.V2.35beta
No ratings yet
SCJMapper QGuide.V2.35beta
36 pages
JNORTH Maths GR12 March 2022 QP and Memo
No ratings yet
JNORTH Maths GR12 March 2022 QP and Memo
27 pages
Restaurant Reservation Website Documentation
No ratings yet
Restaurant Reservation Website Documentation
30 pages
(Ebook) The Personal Efficiency Program: How to Stop Feeling Overwhelmed and Win Back Control of Your Work by Kerry Gleeson ISBN 9780470371312, 0470371315 All Chapters Instant Download
100% (2)
(Ebook) The Personal Efficiency Program: How to Stop Feeling Overwhelmed and Win Back Control of Your Work by Kerry Gleeson ISBN 9780470371312, 0470371315 All Chapters Instant Download
81 pages
[Ebooks PDF] download Selenium Fundamentals Speed up your internal testing by automating user interaction with browsers and web applications 1st Edition Diego Molina full chapters
100% (1)
[Ebooks PDF] download Selenium Fundamentals Speed up your internal testing by automating user interaction with browsers and web applications 1st Edition Diego Molina full chapters
35 pages