0% found this document useful (0 votes)

356 views

Numpy - Python Package For Data

This document provides an introduction and overview of the Python NumPy library for data science. It discusses NumPy's N-dimensional array data structure (ndarray) which supports efficient array operations and broadcasting. The document also introduces other key Python libraries for data science like pandas, matplotlib, SciPy, scikit-learn, and tools like IPython notebook and Jupyter. It describes how NumPy will be covered in depth in this course on using NumPy for data science applications.

Uploaded by

Daniel N Sherine Foo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

356 views

Numpy - Python Package For Data

Uploaded by

Daniel N Sherine Foo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 9

Python NumPy - Course Introduction

Python is emerging as one of the favorite tools in the field of data science. With
powerful data science libraries like NumPy, SciPy,
pandas, matplotlib, scikit-learn and tools like IPython notebook combined with ease
of programming, Python is proving to be the preferred language for organizations.

This course will introduce you to some of these libraries useful for data science.
You will further take a deep dig on playing with NumPy.

Data Science
Data Science is an interdisciplinary area that extracts insights from data, present
in multiple forms.

To master the field of Data Science, one must possess knowledge on all of the
following fields.

Computer Science

Artificial Intelligence and Machine Learning

Statistics and Mathematics

Domain Knowledge

Knowledge Discovery Process

Knowledge Discovery Process extracts meaningful insights from rawdata. It involves
the following series of steps.

Problem Definition

Data Collection

Data Preprocessing

Data Transformation

Data Mining

Data Analysis

Data Visualization

Python provides many powerful libraries that can be used to perform various tasks
described above.

Python libraries for Data Science.

NumPy
An essential library used for scientific computing in Python.
Holds data in N-dimensional array (ndarray) objects, which can store data in
multiple dimensions.
Supports performing efficient array operations through Broadcasting feature.

pandas
Provides functionality to deal with structured data.
Stores Data in different Primary data structures: Series, DataFrame and Panel.

matplotlib
Widely used for Data Visualization.
Used to generate various types of plots.
SciPy
A collection of efficient numerical algorithms used in Numerical integration,
Signal processing and Optimization.

NLTK
Performs different tasks related to Natural Language Processing.

scikit-learn
Python library used for Machine learning

Jupyter
Provides web based interactive computational environment.
Combines code, rich text, plots, media and mathematical equations together.

Bokeh
Offers interactive Web visualization features.

PyMongo
PyMongo distribution comprises tools for working with MongoDB.
MongoDB is a highly scalable and robust NoSQL database.

Scientific Distributions
Data scientist has to manually install all the python libraries required for
performing various tasks involved in Knowledge Discovery Process.

Drawbacks of Manual Installation

Installing few libraries may require an installation of other dependencies.

Time-consuming task.

Installation of few libraries may be unsuccessful.

Prone to manual errors.

Scientific Distribution.

All draw backs of manual installation could be overcome using any one of the
available Scientific Distributions.

A Scientific distribution is a collection of Python libraries, which provide a

ready to use Python environment.

A Scientific distribution is easy to download, install and use.

Few popular distributions include Anaconda, Enthought Python, PythonXY, WinPython.

In this course, you will learn about Anaconda.

Anaconda
Anaconda is a popular high-performance platform used for data science.

The base version is open source and contains over 100+ packages from Python, R, and
Scala.

Additionally, provides access to over 700+ packages that could be installed and
managed using conda.
Anaconda is available for 32-bit and 64-bit Operating systems: Windows, Linux, and
Mac OSX.

Installing Anaconda
Steps for installing Anaconda

Identify your system's OS and its architecture, i.e., 32-bit or 64-bit.

Go to Anaconda's downloads page.

Select the download section of your OS.

Choose the Python Version, i.e 3.x or 2.x, based on your interest.

Download the installer based on your system architecture.

Optional: Verify data integrity with MD5 or SHA-256.

Install the downloaded file.

Anaconda Navigator
Provides access to various components of Anaconda Distribution.

The following windows appear at the left side of Anaconda Navigator.

Home

Environment

Projects

Learning

Community

Home and Environment Windows

Home Window

Opened by default with root environment.

Enables launching working environment through various modes like Jupyter Notebooks,
Jupyter qt-console, and Sypder IDE.

Environment Window

Shows information about various available environments.

Details of packages installed for each available environment is viewable.

Projects, Learning, and Community Windows

Project Window

Provides tools for managing Anaconda projects.

Learning Window

Provides access to popular Data Science Resources.

Community Window

Provides links to popular Data Science Events, Forums, Blogs, etc.

Anaconda Prompt is the command line tool provided by Anaconda Distribution.

You can access anaconda's default Python interactive interpreter, using command
'python'.

You can also work with Conda, anaconda's package manager.

Command for checking Conda's version.

conda --version
Command for viewing available environments.
conda info --envs

Creating a new environment

By default anaconda comes with root environment.

A new environment testenv, with Python 2.7, can be created using the below command.

conda create --name testenv python=2.7

Command for activating testenv

activate testenv

Command for viewing available packages in testenv.

conda list

Accessing numpy package from current testenv results in ImportError.

You can install the package using conda install.

conda install numpy

Now you can verify the numpy availability with conda list command.

After successful installation, you can access numpy from testenv, without any
errors.

IPython
IPython provides interactive working environment, which is highly convenient and
efficient.

Its major components are:

An interactive Python shell.

A Jupyter kernel that allows working with Python code in various interactive front
ends.

Features of IPython
Python statements and System commands can be executed in IPython.

IPython supports Tab completion feature.

With Magic Methods, IPython enables performing many tasks easily.

IPython caches Input and Output history.

IPython supports Parallel Computing.

Launching Jupyter qt-console

The GIF illustrates the following:

How to open IPython in Jupyter qt-console from Anaconda Navigator.

How to execute Python statements in IPython?

How to run System commands in IPython?

Knowing about an object or a method.

Using Tab completion feature.

Understanding Magic Methods

Magic Methods begin with a single % or double %% symbols.

Line Magic Method: Magic method starting with one % symbol.

Line Magic Method is applicable only on a single line of code.

Cell Magic Method: Magic method starting with two %% symbols.

Cell Magic Method is applicable on multiple lines of code, written in a single

cell.

Starting Jupyter Notebook Server

Jupyter Notebook server can be launched from Anaconda Navigator Home Window. The
Notebook server opens in a browser and displays contents of starting folder.

The displayed page contains the following three tabs.

Files displays folders and files present in starting folder.

Running holds information of notebooks that are running.

Clusters contain information of notebooks running in parallel mode.

Creating a Folder
A folder can be created using Folder option present under New section.

The GIF illustrates the following.

Creating an Untitled folder.

Renaming it to MyJupyterNoteBooks, and

Changing working directory to MyJupyterNoteBooks folder.

Starting a Jupyter Notebook

A Jupyter Notebook can be created by Choosing an available Kernel.

The Kernel enables the environment required for executing the code snippets.

The GIF illustrates

Creation of Untitled Notebook.

Renaming it to MyFirstNoteBook.

Checking it's running status in Files / Running tabs.

Shutting down the notebook MyFirstNoteBook.

About a Notebook Cell

The basic element of a Notebook is Cell.

A user is allowed to write either code snippets or markdown text, inside a cell.

A Markdown Text can be used to embed Normal text, Header Text, Unordered, Ordered
Lists, Hyperlinks, Tables, Images,
Videos, HTML content, and other useful elements inside the Notebook.

Markdown Basics
In this section, you will be writing the following elements in Markdown.

Headers : Continuous 1 to 6 Hash Symbols are used to create Headers.

Emphasizing Text : Asterix *, or underscores _ are used to emphasize the text in

bold or italic.

Markdown Basics
Unordered Lists : Either of the symbols - Asterix *, hypen -, plus + are used.

Ordered Lists : Numbers followed with a dot . and

a space are used.

Nested Unordered Lists : The nested lists are indexed with a minimum of four spaces
and followed with symbols.

Justifying Text of a list element : Two spaces, at the end of each line, are used
to justify multiple lines of text.

Code snippets: Pair of three back quotes are used.

Hyperlinks: Text, written in a pair of square brackets, is linked to a Hyperlink,

specified in a pair of parenthesis.

Reference Links: Text and Reference both are written in two different pairs of
square brackets.

HTML Content : HTML tags can be directly used in Markdown.

Writing Your First Notebook

The above-shown GIF performs the following tasks in the notebook - MyFirstNoteBook.

Defines the string 's' with value Welcome to Jupyter Notebooks!!!.

Displays the string 's'.

Provides the required description.

The above GIF illustrates performing the following, additional tasks in
MyFirstNoteBook.

Determines the length of 's'.

Obtains the slice Jupyter Notebooks from 's'.

Find the number of vowels in 's'.

Filter the words starting with either 'J' or 'N'.

Provides titles as required.

Numpy
NumPy is a Python library, which supports efficient handling of various numerical
operations on arrays holding numeric data.

These arrays are known as N-dimensional arrays or ndarrays.

Ndarrays are capable of holding data elements in multiple dimensions.

Each data element of a ndarray is of fixed size.

All elements of a ndarray are of same data type.

N-dimensional array (ndarray)

N-dimensional array is an object, capable of holding data elements of same type and
of a fixed size in multiple dimensions.

Creation of a 1-D array of five elements, from a list is shown in Example 1.

Example 1

import numpy as np
x = np.array([5, 8,
9, 10,
11]) # using 'array' method

type(x) # Displays type of array 'x'

Output

numpy.ndarray

N-dimensional array (ndarray)...

Creation of a 2-D array from a list of lists is shown in Example 2.
Example 2

y = np.array([[6, 9, 5],
[10, 82, 34]])
print(y)
Output

array([[ 6, 9, 5],
[10, 82, 34]])

ndarray Attributes
Some of the important attributes of a ndarray are

ndim : Returns number of dimensions.

shape: Returns Shape in tuple.

size : Total number of elements.

dtype : Type of each element.

itemsize : Size of each element in Bytes.

nbytes : Total bytes consumed by all elements.

Example 3

print(y.ndim, y.shape, y.size, y.dtype, y.itemsize, y.nbytes)

Output

2 (2, 3) 6 int32 4 24

Numpy dtypes
Numpy supports various data types based on number of bytes required by the data
elements.

Data type can be explicitly specified with dtype argument.

A ndarray, holding float values is defined in Example 4.

Example 4

y = np.array([[6, 9, 5],
[10, 82, 34]],
dtype='float64')
print(y)
print(y.dtype)
Output

array([[ 6., 9., 5.],

[ 10., 82., 34.]])
float64

++++
def array_operations(l):
#Write your code below
x = np.array(l)
print(type(x),
print(x.ndim, x.shape, x.size))

Numpy Array creation

N-dimensional arrays or ndarray can be created in multiple ways in numpy.

Now let us focus on creating ndarray,

From Python built-in datatypes : lists or tuples

Using Numpy array creation methods like ones, ones_like, zeros, zeros_like

Using Numpy numeric sequence generators.

Using Numpy random module.

By reading data from a file.

Import As From Import Import: Problem 1
100% (1)
Import As From Import Import: Problem 1
5 pages
Image Processing
No ratings yet
Image Processing
5 pages
Python3 - Programming-Final Assessment - INCOMPLETO
No ratings yet
Python3 - Programming-Final Assessment - INCOMPLETO
32 pages
Stat
No ratings yet
Stat
5 pages
Django Object-Relational Mapper
No ratings yet
Django Object-Relational Mapper
3 pages
Python List Handson 1
No ratings yet
Python List Handson 1
2 pages
Python Hands On
100% (1)
Python Hands On
11 pages
Fresco
100% (2)
Fresco
17 pages
R Handson
100% (3)
R Handson
3 pages
Scala Constructs: Concepts of Functional Programming
No ratings yet
Scala Constructs: Concepts of Functional Programming
21 pages
Python-Module03-Case Study03
100% (1)
Python-Module03-Case Study03
2 pages
Context
No ratings yet
Context
4 pages
Python 3 Functions and OOPs
No ratings yet
Python 3 Functions and OOPs
7 pages
Unstructured
No ratings yet
Unstructured
37 pages
Python 3 Programming Q & A
No ratings yet
Python 3 Programming Q & A
4 pages
Descriptor
No ratings yet
Descriptor
4 pages
Nodejs Mock Test III
No ratings yet
Nodejs Mock Test III
6 pages
Python 3 Programming
No ratings yet
Python 3 Programming
3 pages
This Study Resource Was
No ratings yet
This Study Resource Was
5 pages
R
No ratings yet
R
15 pages
Python Pandas MCQs
No ratings yet
Python Pandas MCQs
7 pages
Image Classification Hands-On
100% (1)
Image Classification Hands-On
1 page
Data Visualization New
No ratings yet
Data Visualization New
3 pages
Image Classification Handson-Image - Test
No ratings yet
Image Classification Handson-Image - Test
5 pages
Python Qualis
No ratings yet
Python Qualis
6 pages
Python Funstinos and OOPS
No ratings yet
Python Funstinos and OOPS
7 pages
DC - Os
No ratings yet
DC - Os
3 pages
Hands On Python Qualis Pytest
No ratings yet
Hands On Python Qualis Pytest
7 pages
Module 3
No ratings yet
Module 3
2 pages
Stat 2
No ratings yet
Stat 2
3 pages
Python 3 Programming
No ratings yet
Python 3 Programming
6 pages
Unstructured Data Classification Handson
No ratings yet
Unstructured Data Classification Handson
4 pages
Scala - The Diatonic Syallable
No ratings yet
Scala - The Diatonic Syallable
2 pages
Must Know in D3js
100% (1)
Must Know in D3js
1 page
Python 3 Oops Hands On
No ratings yet
Python 3 Oops Hands On
7 pages
E1 Fresco Prob3 Correct
No ratings yet
E1 Fresco Prob3 Correct
1 page
Continuous Integration
No ratings yet
Continuous Integration
6 pages
Python TCS
0% (1)
Python TCS
6 pages
Context Manager 1
No ratings yet
Context Manager 1
1 page
FP Chef-Titan - Python
No ratings yet
FP Chef-Titan - Python
5 pages
Data Handling Using R
No ratings yet
Data Handling Using R
2 pages
Fresco
No ratings yet
Fresco
29 pages
Abstract Class 1
No ratings yet
Abstract Class 1
1 page
Flask-Python Web Framework Hands-On
No ratings yet
Flask-Python Web Framework Hands-On
12 pages
Basics of Statistics and Probability - FP: Statistical Measures
No ratings yet
Basics of Statistics and Probability - FP: Statistical Measures
12 pages
Tensor Flow
No ratings yet
Tensor Flow
2 pages
Unstructured Data Classification
No ratings yet
Unstructured Data Classification
5 pages
ScalaNew Malay
No ratings yet
ScalaNew Malay
4 pages
Unstructured Data Classification
No ratings yet
Unstructured Data Classification
2 pages
Grail
No ratings yet
Grail
23 pages
Gradle Hello or Gradle - Q Hello
No ratings yet
Gradle Hello or Gradle - Q Hello
3 pages
Data Handling in R - Introduction To Dplyr
No ratings yet
Data Handling in R - Introduction To Dplyr
2 pages
AngularJS 1.x Routers and Custom Directives Q&A
No ratings yet
AngularJS 1.x Routers and Custom Directives Q&A
4 pages
Bitbucket
No ratings yet
Bitbucket
2 pages
Q Answer
No ratings yet
Q Answer
11 pages
Azure ML Fresco - Toaz - Info
No ratings yet
Azure ML Fresco - Toaz - Info
28 pages
Redux Async
No ratings yet
Redux Async
3 pages
Num Py
No ratings yet
Num Py
20 pages
2. Python Programming Development Environment Set-up
No ratings yet
2. Python Programming Development Environment Set-up
19 pages
Dsf - Unit II Notes
No ratings yet
Dsf - Unit II Notes
43 pages
Statistics
No ratings yet
Statistics
1 page
Gartner Predicts Procurement Data Challenges and Rapid Change 2025
No ratings yet
Gartner Predicts Procurement Data Challenges and Rapid Change 2025
2 pages
Statistics and Probability Katabasis
No ratings yet
Statistics and Probability Katabasis
7 pages
Q and A For Job Interview
No ratings yet
Q and A For Job Interview
2 pages
PCNE Workbook
No ratings yet
PCNE Workbook
83 pages
Data Handling Using R
No ratings yet
Data Handling Using R
1 page
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
Data Cleansing Using R
0% (1)
Data Cleansing Using R
10 pages
Advanced Regression
No ratings yet
Advanced Regression
13 pages
End-to-End Developer Journey On GKE Ebook 02
No ratings yet
End-to-End Developer Journey On GKE Ebook 02
37 pages
ECE 465 Digital Signals Processing: Assoc - Prof. Pham Van Tuan
No ratings yet
ECE 465 Digital Signals Processing: Assoc - Prof. Pham Van Tuan
19 pages
BR_Muse SDN Orchestrator
No ratings yet
BR_Muse SDN Orchestrator
6 pages
Cryptography and Its Types and Stenography
No ratings yet
Cryptography and Its Types and Stenography
4 pages
Cardiac Ultrasound: Hoan My Medical Corporation Contact
No ratings yet
Cardiac Ultrasound: Hoan My Medical Corporation Contact
11 pages
ABB Manual
No ratings yet
ABB Manual
896 pages
Adopting Robotic Process Automation in Internal Audit
No ratings yet
Adopting Robotic Process Automation in Internal Audit
9 pages
Mil Topic 3 Information Literacy
100% (1)
Mil Topic 3 Information Literacy
19 pages
K7 Electrical Specification Rev07 ENG
No ratings yet
K7 Electrical Specification Rev07 ENG
9 pages
174 - Load-Balancer-lab
No ratings yet
174 - Load-Balancer-lab
29 pages
Bruker-Revize - 2 - PDF
No ratings yet
Bruker-Revize - 2 - PDF
33 pages
OUTPUT DEVICES MONITOR
No ratings yet
OUTPUT DEVICES MONITOR
4 pages
PET328 2020-2021 Test 1
No ratings yet
PET328 2020-2021 Test 1
1 page
Introduction To Processor: Arithmetic Logic Unit
No ratings yet
Introduction To Processor: Arithmetic Logic Unit
1 page
A07D12HAC Datasheet: Quick Specs
No ratings yet
A07D12HAC Datasheet: Quick Specs
3 pages
Report
No ratings yet
Report
19 pages
OPENOCD, Flash Program To ARM Cortex M0 (JTAG) - Stack Overssflow
No ratings yet
OPENOCD, Flash Program To ARM Cortex M0 (JTAG) - Stack Overssflow
3 pages
Eap Chaining With Teap
No ratings yet
Eap Chaining With Teap
7 pages
Money Thesavefrom
No ratings yet
Money Thesavefrom
3 pages
Basics of ICT (1431 5403) AUT23
No ratings yet
Basics of ICT (1431 5403) AUT23
19 pages
Acer Aspire 3050 5050 - Quanta - ZF3
No ratings yet
Acer Aspire 3050 5050 - Quanta - ZF3
37 pages
Yealink YMS Meeting Server Datasheet
No ratings yet
Yealink YMS Meeting Server Datasheet
3 pages
Windows - Break VBA Password in Excel Without Losing Macros and Modules
No ratings yet
Windows - Break VBA Password in Excel Without Losing Macros and Modules
14 pages
Two Day Faculty Development Programme On: "Hands-On Workshop On Matlab, Simulink and Its Applications"
No ratings yet
Two Day Faculty Development Programme On: "Hands-On Workshop On Matlab, Simulink and Its Applications"
2 pages
New Functionality For Opendmis Release 6.9
No ratings yet
New Functionality For Opendmis Release 6.9
6 pages
Miniproject Prajjavalf2
No ratings yet
Miniproject Prajjavalf2
30 pages
IoT World Today - Industrial IoT Platforms The Top 10 List
100% (2)
IoT World Today - Industrial IoT Platforms The Top 10 List
10 pages
Troubleshooting of Windows Ring For Update - Overview
No ratings yet
Troubleshooting of Windows Ring For Update - Overview
13 pages
Q2-W2 Quiz
No ratings yet
Q2-W2 Quiz
3 pages
Com.topten.loader Logcat
No ratings yet
Com.topten.loader Logcat
458 pages
Online Job Application System
0% (2)
Online Job Application System
26 pages