Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
22 views

Python Library Functions

The document provides an overview of the main features and functionalities of Pandas, NumPy, Scikit-Learn and TensorFlow libraries for data science and machine learning. It lists and describes commonly used functions and classes in each library for data loading, manipulation, analysis, modeling, evaluation and deployment.

Uploaded by

Omar Akhlaq
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Python Library Functions

The document provides an overview of the main features and functionalities of Pandas, NumPy, Scikit-Learn and TensorFlow libraries for data science and machine learning. It lists and describes commonly used functions and classes in each library for data loading, manipulation, analysis, modeling, evaluation and deployment.

Uploaded by

Omar Akhlaq
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Page | 1

Author: Muhammad Omar Akhlaq


1. PANDAS
Data Reading and Writing:

 read_csv(): Reads data from a CSV file into a DataFrame.


 read_excel(): Reads data from an Excel file into a DataFrame.
 to_csv(): Writes data from a DataFrame to a CSV file.
 to_excel(): Writes data from a DataFrame to an Excel file.

DataFrame Basics:

 head(): Displays the first few rows of the DataFrame.


 tail(): Displays the last few rows of the DataFrame.
 info(): Provides concise summary information about the DataFrame.
 describe(): Generates descriptive statistics of the DataFrame.
 shape: Returns the dimensions (rows, columns) of the DataFrame.
 columns: Returns the column labels of the DataFrame.
 index: Returns the row labels of the DataFrame.

Data Selection and Manipulation:

 loc[]: Accesses a group of rows and columns by labels.


 iloc[]: Accesses a group of rows and columns by integer position.
 drop(): Drops specified rows or columns from the DataFrame.
 fillna(): Fills NaN (missing) values in the DataFrame with specified values.
 groupby(): Groups data based on specified criteria.
 merge(): Merges DataFrames using a database-style join operation.

Data Aggregation and Calculation:

 sum(), mean(), median(): Aggregate functions for calculations.


 max(), min(): Finds maximum and minimum values.
 apply(): Applies a function along an axis of the DataFrame.
 pivot_table(): Creates a spreadsheet-style pivot table as a DataFrame.

Data Cleaning and Handling:

 drop_duplicates(): Drops duplicate rows from the DataFrame.


 rename(): Renames columns or index labels.
 astype(): Converts the data type of a column to another data type.
 set_index(), reset_index(): Sets or resets the DataFrame index.
 Time Series Analysis (for date-time data):
 to_datetime(): Converts a column to datetime format.
 resample(): Performs data aggregation at different time frequencies.

Page | 2
Author: Muhammad Omar Akhlaq
 date_range(): Generates date-time indices.

2. NUMPY
Array Creation:

 np.array(): Creates an array from a Python list or tuple.


 np.zeros(): Generates an array of zeros with a specified shape.
 np.ones(): Generates an array of ones with a specified shape.
 np.arange(): Creates an array with evenly spaced values within a given interval.
 np.linspace(): Generates an array with a specified number of elements within a range.

Array Manipulation:

 np.reshape(): Changes the shape of an array without changing its data.


 np.flatten(), np.ravel(): Flattens a multidimensional array into a 1D array.
 np.concatenate(): Joins arrays along a specified axis.
 np.split(): Splits an array into multiple sub-arrays.
 np.transpose(), ndarray.T: Transposes array dimensions.
 np.vstack(), np.hstack(): Stacks arrays vertically or horizontally.

Array Operations:

 np.sum(), np.mean(), np.median(): Aggregate functions for calculations.


 np.max(), np.min(): Finds maximum and minimum values.
 np.dot(): Performs matrix multiplication.
 np.sort(): Sorts elements in an array.
 np.unique(): Finds unique elements in an array.

Array Indexing and Slicing:

 Slicing: Allows accessing sub-arrays within arrays using indices.


 Fancy Indexing: Uses arrays of indices to access specific elements.
 Mathematical Functions:
 np.sin(), np.cos(), np.tan(): Trigonometric functions.
 np.exp(), np.log(), np.sqrt(): Exponential, logarithmic, and square root functions.
 np.absolute(), np.power(): Absolute value and exponentiation functions.

Random Sampling:

 np.random.rand(): Generates random numbers from a uniform distribution.


 np.random.randn(): Generates random numbers from a normal distribution.
 np.random.randint(): Generates random integers within a specified range.
 np.random.choice(): Picks random elements from an array.

Linear Algebra:
Page | 3
Author: Muhammad Omar Akhlaq
 np.linalg.inv(): Computes the inverse of a matrix.
 np.linalg.det(): Computes the determinant of a matrix.
 np.linalg.solve(): Solves a system of linear equations.

3. SciKit-Learn
Data Preprocessing:
 StandardScaler: Standardizes features by removing the mean and scaling to unit variance.
 MinMaxScaler: Scales features to a given range (typically 0 to 1).
 OneHotEncoder, LabelEncoder: Converts categorical variables into numerical representations.
 train_test_split: Splits datasets into training and testing subsets for model evaluation.

Supervised Learning:
Regression:
 LinearRegression, Ridge, Lasso, ElasticNet: Linear regression and regularized versions.
 Classification:
 LogisticRegression, SVC, RandomForestClassifier, KNeighborsClassifier: Classifiers for different
algorithms.
 DecisionTreeClassifier, GradientBoostingClassifier: Decision tree-based models.

Unsupervised Learning:
Clustering:
 KMeans, DBSCAN, AgglomerativeClustering: Algorithms for clustering data.
 Dimensionality Reduction:
 PCA, TruncatedSVD, FactorAnalysis: Methods for reducing dimensions while preserving important
information.

Model Evaluation and Metrics:


 accuracy_score, precision_score, recall_score: Evaluation metrics for classification models.
 mean_squared_error, mean_absolute_error, r2_score: Evaluation metrics for regression models.
 cross_val_score, GridSearchCV: Tools for cross-validation and hyperparameter tuning.
 confusion_matrix, classification_report: Metrics to assess classification performance.

Pipelines and Feature Selection:


 Pipeline: Chains together multiple steps into a single workflow.
 FeatureUnion: Combines different transformers applied in parallel.
 SelectKBest, SelectFromModel: Feature selection methods based on statistics or model importance.

Ensemble Methods:
 VotingClassifier, VotingRegressor: Combines multiple models' predictions for improved performance.
 BaggingClassifier, RandomForestClassifier: Bootstrap aggregating methods for classification.
 GradientBoostingClassifier, AdaBoostClassifier: Boosting methods for classification.

Neural Network Support:


 MLPClassifier, MLPRegressor: Multi-layer Perceptron models for classification and regression.

Page | 4
Author: Muhammad Omar Akhlaq
Text Analysis:
 CountVectorizer, TfidfVectorizer: Convert text data into numerical feature vectors.
 TfidfTransformer: Applies Term Frequency-Inverse Document Frequency transformation.

Model Serialization:
 pickle: Built-in Python module for serializing and deserializing scikit-learn models.

4. TENSOR FLOW
Core Components:
 Tensors: Fundamental data structures in TensorFlow, similar to multi-dimensional arrays.
 Operations: Functions that manipulate tensors, perform computations, and define the computational
graph.
 Graph: The computational graph defines the operations and dependencies between tensors.

Model Building:
 Keras API: High-level API within TensorFlow for building and training neural networks easily.
 tf.keras.layers: Module containing various layers like Dense, Conv2D, LSTM, etc., for building neural
network architectures.
 tf.keras.Sequential: Allows the sequential stacking of layers to create a model.

Training and Optimization:


 tf.keras.Model.compile(): Compiles the model, defining loss functions, optimizers, and metrics.
 tf.keras.Model.fit(): Trains the model on training data.
 tf.keras.Model.evaluate(): Evaluates the model's performance on a test dataset.
 tf.optimizers: Module containing optimization algorithms like SGD, Adam, RMSprop, etc.

Customization and Extensions:


 tf.GradientTape: Records operations for automatic differentiation and custom gradient computation.
 tf.function: Converts Python functions into graph-based TensorFlow functions for performance
optimization.
 Custom Layers, Metrics, Losses, and Callbacks: Allows the creation of custom components for specific
needs.

Deployment and Serialization:


 tf.saved_model: Tools for saving and loading models in the SavedModel format for deployment.
 tf.keras.Model.save(), tf.keras.models.load_model(): Saving and loading models using the Keras API.

Data Handling:
 tf.data.Dataset: A powerful API for creating input pipelines to handle large datasets efficiently.
 tf.data.experimental: Module containing experimental features for data pipeline handling.

GPU and Distributed Computing:


 tf.device(): Context manager to explicitly specify the device (CPU or GPU) for execution.
 tf.distribute: Module providing tools for distributed training across multiple devices or machines.

Page | 5
Author: Muhammad Omar Akhlaq
Miscellaneous:
 tf.math: Module containing mathematical operations on tensors.
 tf.image: Module for image processing operations in TensorFlow.
 tf.strings: Module for string manipulation operations.
5. KERAS
Model Building:
 Sequential: A linear stack of layers for building sequential models.
 Functional API (tf.keras.Model): Allows creating complex models with shared layers and multiple inputs
or outputs.
 Dense, Conv2D, LSTM, Dropout, etc.: Various layer types for constructing neural network architectures.

Compilation and Configuration:


 compile(): Configures the model for training by specifying the optimizer, loss function, and metrics.
 Optimizers (SGD, Adam, RMSprop, etc.): Algorithms for optimizing model weights during training.
 Loss Functions (mean_squared_error, categorical_crossentropy, etc.): Measures the model's
performance.

Training and Evaluation:


 fit(): Trains the model on training data.
 evaluate(): Evaluates the model's performance on a test dataset.
 predict(): Generates predictions for new data.
 Callbacks (EarlyStopping, ModelCheckpoint, etc.): Tools for customizing training behavior.

Regularization and Optimization:


 Dropout, BatchNormalization: Techniques for regularization and improving training convergence.
 Regularizers: Methods for applying penalties on layer parameters during optimization.

Customization and Extension:


 Layer and Model classes: Allows building custom layers and models by subclassing.
 Custom Callbacks, Custom Losses, Custom Metrics: Enables customizing and extending Keras
functionality.

Serialization and Deployment:


 save() and load_model(): Functions for saving and loading models.
 model_to_json() and model_from_json(): Serialization of model architecture to/from JSON format.

Preprocessing and Utilities:


 preprocessing: Module for data preprocessing techniques like normalization, text tokenization, etc.
 utils: Module containing utility functions for working with Keras models and layers.

GPU and Distributed Computing:


 Keras models can leverage TensorFlow's capabilities for GPU and distributed computing seamlessly.

Page | 6
Author: Muhammad Omar Akhlaq
6. PyTorch
Tensor Operations:
 torch.Tensor: The core data structure, supports various tensor operations similar to NumPy arrays.
 Math operations: Element-wise operations, matrix multiplications, and other mathematical functions.
 Indexing and Slicing: Accessing and manipulating tensor elements.
 Autograd and Dynamic Computation Graph:
 torch.autograd: Automatic differentiation engine for computing gradients.
 torch.autograd.Function: Base class for defining custom autograd operations.
 backward(): Computes gradients of tensors with respect to a given computational graph.

Neural Network Building Blocks:


 torch.nn: Module for building neural network architectures.
 torch.nn.Module: Base class for creating custom neural network modules.
 Layers: Various layers like Linear, Conv2d, LSTM, Dropout, etc.
 Activation functions: ReLU, Sigmoid, Tanh, etc.
 Initialization methods: Different weight initialization techniques.

Optimizers and Loss Functions:


 Optimizers (SGD, Adam, RMSprop, etc.): Algorithms for optimizing model weights.
 torch.optim: Module containing optimization algorithms.
 Loss functions (MSE Loss, CrossEntropyLoss, etc.): Measures the model's performance.

Training and Evaluation:


 torch.nn.functional: Functions for implementing neural network operations.
 torch.utils.data: Tools for handling datasets and data loaders for efficient batching and loading.
 torch.utils.data.Dataset: Base class for creating custom datasets.
 torch.utils.data.DataLoader: Loads data into batches for training and evaluation.

GPU and Distributed Computing:


 torch.device(): Context manager to specify the device (CPU or GPU) for tensor computations.
 torch.distributed: Module for distributed computing across multiple devices or machines.
 Serialization and Deployment:
 torch.save() and torch.load(): Functions for saving and loading models.
 Model serialization and deployment mechanisms for inference in production environments.

Miscellaneous:
 torch.cuda: Module for GPU-related functionalities and operations.
 torchvision: Module containing datasets, model architectures, and image transformation utilities for
computer vision tasks.
 torchtext: Module for text-related utilities and datasets.
Page | 7
Author: Muhammad Omar Akhlaq
7. MatPlotLib
Basic Plotting:
 plt.plot(): Creates line plots.
 plt.scatter(): Generates scatter plots.
 plt.bar(), plt.barh(): Creates vertical and horizontal bar plots.
 plt.hist(): Generates histograms.
 plt.boxplot(): Creates boxplots to visualize data distributions.

Customization and Styling:


 plt.xlabel(), plt.ylabel(): Sets labels for the x and y axes.
 plt.title(): Sets the title of the plot.
 plt.legend(): Adds legends to the plot.
 plt.grid(): Displays grid lines on the plot.
 plt.xlim(), plt.ylim(): Sets the limits of the x and y axes.

Subplots and Layouts:


 plt.subplots(): Creates multiple subplots within a single figure.
 plt.subplot(): Adds individual subplots to a figure with custom configurations.
 plt.tight_layout(): Automatically adjusts subplot parameters for better layout.

Advanced Plot Types:


 plt.contour(), plt.contourf(): Generates contour plots.
 plt.pcolor(), plt.pcolormesh(): Creates pseudocolor plots.
 plt.quiver(): Displays vector fields.
 plt.imshow(): Displays images.

Annotations and Text:


 plt.text(): Adds text at specified coordinates on the plot.
 plt.annotate(): Annotates a specific point on the plot with optional arrow indicators.

Save and Show:


 plt.show(): Displays the plot.
 plt.savefig(): Saves the plot as an image file (PNG, JPG, PDF, etc.).

Specialized Plots:
 plt.pie(): Generates pie charts.
 plt.stem(): Creates stem plots.
 plt.violinplot(): Displays violin plots.
 3D Plotting (with mpl_toolkits.mplot3d):

Page | 8
Author: Muhammad Omar Akhlaq
 Axes3D: Provides 3D axes for plotting.
 plot_surface(): Plots 3D surfaces.
 scatter(): Creates 3D scatter plots.

Interactive and Animation:


 Interactive mode: Enables interactive plotting in supported environments.
 Animation: Module for creating animated plots.
8. Seaborn
Data Visualization:
 sns.lineplot(): Generates line plots with optional estimation and confidence intervals.
 sns.scatterplot(): Creates scatter plots with optional hue and size mapping.
 sns.barplot(), sns.countplot(): Produces bar plots and count plots.
 sns.boxplot(), sns.violinplot(): Displays boxplots and violin plots to show data distributions.

Categorical Data:
 sns.catplot(): Creates categorical plots (scatter, strip, box, violin, etc.) based on data types.
 sns.swarmplot(): Visualizes categorical data along with the distribution of observations.

Distribution Visualization:
 sns.histplot(), sns.kdeplot(): Displays histograms and kernel density estimation plots.
 sns.rugplot(): Shows individual data points as dashes on a plot axis.

Relationship Plots:
 sns.pairplot(): Creates a matrix of scatterplots for examining pairwise relationships in a dataset.
 sns.heatmap(): Generates a heatmap to visualize matrix-like data.

Regression and Model Visualization:


 sns.regplot(), sns.lmplot(): Displays linear regression models.
 sns.residplot(): Plots the residuals of a linear regression model.
 sns.jointplot(): Visualizes the relationship between two variables and their individual distributions.

Color Palettes and Styles:


 sns.set_palette(): Sets the color palette for the plot.
 sns.set_style(): Sets the visual aesthetic styles for the plot.
 sns.color_palette(): Creates color palettes for use in plots.

Axis and Figure-Level Functions:


 FacetGrid and PairGrid: Allow the creation of customized subplots.
 sns.relplot(), sns.lineplot(): Higher-level interfaces to create relational plots.

Themes and Contexts:


 sns.set_theme(): Sets the overall visual theme for the plots.
 sns.plotting_context(): Controls the scaling of plot elements.

Statistical Estimation:
Page | 9
Author: Muhammad Omar Akhlaq
 sns.pointplot(): Visualizes point estimates and confidence intervals.
 sns.barplot(): Displays the central tendency of a numeric variable.

9. Ploty
Basic Plotting:
 go.Scatter(): Generates scatter plots.
 go.Bar(): Creates bar charts.
 go.Histogram(): Displays histograms.
 go.Box(): Generates box plots.
 go.Surface(): Generates 3D surface plots.

Additional Plot Types:


 go.Pie(): Generates pie charts.
 go.Candlestick(): Displays financial candlestick charts.
 go.Heatmap(): Creates heatmaps.
 go.Contour(): Generates contour plots.

Specialized Visualizations:
 go.Scatter3d(): Generates 3D scatter plots.
 go.Choropleth(): Creates choropleth maps.
 go.FigureWidget(): Enables interactive figures with widgets for live updating.

Customization and Layout:


 plotly.graph_objs.Layout: Allows configuring layout settings for the plot.
 update_layout(): Updates layout settings of the plot.

Interactivity and Animation:


 plotly.express: High-level functions for creating interactive plots easily.
 plotly.graph_objs.Figure: Creates interactive figures.
 plotly.graph_objs.Plot() and plotly.graph_objs.iplot(): Render plotly figures in Jupyter notebooks.

Dash Integration:
 dash_core_components.Graph(): Integrates Plotly graphs into Dash web applications.
 dash_html_components.Div(): Creates HTML div elements for organizing the layout in Dash apps.

Export and Sharing:


 plotly.io.write_image(): Saves figures as images (PNG, JPEG, SVG, etc.).
 plotly.io.show(): Displays plots in Jupyter Notebooks or standalone HTML pages.
 plotly.io.write_html(): Saves figures as standalone HTML files.

Page | 10
Author: Muhammad Omar Akhlaq
Themes and Styling:
 plotly.io.templates: Provides built-in templates for different plot styles.
 update_traces(): Allows customization of individual traces within a plot.

Dashboards and Web Apps:


 dash: Integrates Plotly plots into web applications using Dash, a Python web framework.

10. NLTK
Corpus and Text Processing:
 nltk.corpus: Module for accessing built-in corpora and lexical resources.
 nltk.word_tokenize(): Tokenizes text into words or sentences.
 nltk.sent_tokenize(): Tokenizes text into sentences.

Basic Text Processing and Analysis:


 nltk.FreqDist(): Generates frequency distributions of words.
 nltk.Text(): Wraps a sequence of tokens for advanced operations like concordance and similar context
search.

Part-of-Speech Tagging:
 nltk.pos_tag(): Assigns parts of speech (POS) tags to words in a text.

Stemming and Lemmatization:


 nltk.PorterStemmer(), nltk.LancasterStemmer(): Implements stemming algorithms.
 nltk.WordNetLemmatizer(): Lemmatizes words based on WordNet.

Named Entity Recognition (NER):


 nltk.ne_chunk(): Labels named entities such as persons, organizations, locations, etc.

Parsing and Syntax:


 nltk.ChartParser(), nltk.RecursiveDescentParser(): Implements parsers for syntactic analysis.
 nltk.ParentedTree(): Represents constituency-based parse trees.

WordNet Interface:
 nltk.WordNet: Interface to access WordNet, a lexical database for English.
 nltk.synsets(): Retrieves synsets (sets of synonyms) from WordNet.

Text Classification:
 nltk.classify: Module containing various classifiers for text classification.
 nltk.NaiveBayesClassifier(), nltk.DecisionTreeClassifier(): Examples of classifiers.

Machine Learning for NLP:


 nltk.classify.scikitlearn: Integrates NLTK classifiers with scikit-learn for machine learning.

Page | 11
Author: Muhammad Omar Akhlaq
Tokenization and Chunking:
 nltk.chunk: Module for chunking and extracting phrases from sentences.
 nltk.RegexpParser(): Creates chunk parsers using regular expressions.

Sentiment Analysis:
 nltk.sentiment: Module providing sentiment analysis tools and lexicons.

Collocations and Bigrams:


 nltk.collocations: Module for extracting collocations and bigrams from text.
Language Models and Probabilities:
 nltk.probability: Module for implementing probability distributions and frequency estimation.

Page | 12
Author: Muhammad Omar Akhlaq

You might also like