Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
342 views

Numpy - Python Package For Data

This document provides an introduction and overview of the Python NumPy library for data science. It discusses NumPy's N-dimensional array data structure (ndarray) which supports efficient array operations and broadcasting. The document also introduces other key Python libraries for data science like pandas, matplotlib, SciPy, scikit-learn, and tools like IPython notebook and Jupyter. It describes how NumPy will be covered in depth in this course on using NumPy for data science applications.
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
342 views

Numpy - Python Package For Data

This document provides an introduction and overview of the Python NumPy library for data science. It discusses NumPy's N-dimensional array data structure (ndarray) which supports efficient array operations and broadcasting. The document also introduces other key Python libraries for data science like pandas, matplotlib, SciPy, scikit-learn, and tools like IPython notebook and Jupyter. It describes how NumPy will be covered in depth in this course on using NumPy for data science applications.
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 9

Python NumPy - Course Introduction

Python is emerging as one of the favorite tools in the field of data science. With
powerful data science libraries like NumPy, SciPy,
pandas, matplotlib, scikit-learn and tools like IPython notebook combined with ease
of programming, Python is proving to be the preferred language for organizations.

This course will introduce you to some of these libraries useful for data science.
You will further take a deep dig on playing with NumPy.

Data Science
Data Science is an interdisciplinary area that extracts insights from data, present
in multiple forms.

To master the field of Data Science, one must possess knowledge on all of the
following fields.

Computer Science

Artificial Intelligence and Machine Learning

Statistics and Mathematics

Domain Knowledge

Knowledge Discovery Process


Knowledge Discovery Process extracts meaningful insights from rawdata. It involves
the following series of steps.

Problem Definition

Data Collection

Data Preprocessing

Data Transformation

Data Mining

Data Analysis

Data Visualization

Python provides many powerful libraries that can be used to perform various tasks
described above.

Python libraries for Data Science.


NumPy
An essential library used for scientific computing in Python.
Holds data in N-dimensional array (ndarray) objects, which can store data in
multiple dimensions.
Supports performing efficient array operations through Broadcasting feature.

pandas
Provides functionality to deal with structured data.
Stores Data in different Primary data structures: Series, DataFrame and Panel.

matplotlib
Widely used for Data Visualization.
Used to generate various types of plots.
SciPy
A collection of efficient numerical algorithms used in Numerical integration,
Signal processing and Optimization.

NLTK
Performs different tasks related to Natural Language Processing.

scikit-learn
Python library used for Machine learning

Jupyter
Provides web based interactive computational environment.
Combines code, rich text, plots, media and mathematical equations together.

Bokeh
Offers interactive Web visualization features.

PyMongo
PyMongo distribution comprises tools for working with MongoDB.
MongoDB is a highly scalable and robust NoSQL database.

Scientific Distributions
Data scientist has to manually install all the python libraries required for
performing various tasks involved in Knowledge Discovery Process.

Drawbacks of Manual Installation

Installing few libraries may require an installation of other dependencies.

Time-consuming task.

Installation of few libraries may be unsuccessful.

Prone to manual errors.

Scientific Distribution.

All draw backs of manual installation could be overcome using any one of the
available Scientific Distributions.

A Scientific distribution is a collection of Python libraries, which provide a


ready to use Python environment.

A Scientific distribution is easy to download, install and use.

Few popular distributions include Anaconda, Enthought Python, PythonXY, WinPython.

In this course, you will learn about Anaconda.

Anaconda
Anaconda is a popular high-performance platform used for data science.

The base version is open source and contains over 100+ packages from Python, R, and
Scala.

Additionally, provides access to over 700+ packages that could be installed and
managed using conda.
Anaconda is available for 32-bit and 64-bit Operating systems: Windows, Linux, and
Mac OSX.

Installing Anaconda
Steps for installing Anaconda

Identify your system's OS and its architecture, i.e., 32-bit or 64-bit.

Go to Anaconda's downloads page.

Select the download section of your OS.

Choose the Python Version, i.e 3.x or 2.x, based on your interest.

Download the installer based on your system architecture.

Optional: Verify data integrity with MD5 or SHA-256.

Install the downloaded file.

Anaconda Navigator
Provides access to various components of Anaconda Distribution.

The following windows appear at the left side of Anaconda Navigator.

Home

Environment

Projects

Learning

Community

Home and Environment Windows


Home Window

Opened by default with root environment.

Enables launching working environment through various modes like Jupyter Notebooks,
Jupyter qt-console, and Sypder IDE.

Environment Window

Shows information about various available environments.

Details of packages installed for each available environment is viewable.

Projects, Learning, and Community Windows


Project Window

Provides tools for managing Anaconda projects.


Learning Window

Provides access to popular Data Science Resources.


Community Window

Provides links to popular Data Science Events, Forums, Blogs, etc.


Anaconda Prompt is the command line tool provided by Anaconda Distribution.

You can access anaconda's default Python interactive interpreter, using command
'python'.

You can also work with Conda, anaconda's package manager.

Command for checking Conda's version.

conda --version
Command for viewing available environments.
conda info --envs

Creating a new environment


By default anaconda comes with root environment.

A new environment testenv, with Python 2.7, can be created using the below command.

conda create --name testenv python=2.7

Command for activating testenv

activate testenv

Command for viewing available packages in testenv.

conda list

Accessing numpy package from current testenv results in ImportError.

You can install the package using conda install.

conda install numpy

Now you can verify the numpy availability with conda list command.

After successful installation, you can access numpy from testenv, without any
errors.

IPython
IPython provides interactive working environment, which is highly convenient and
efficient.

Its major components are:

An interactive Python shell.

A Jupyter kernel that allows working with Python code in various interactive front
ends.

Features of IPython
Python statements and System commands can be executed in IPython.

IPython supports Tab completion feature.

With Magic Methods, IPython enables performing many tasks easily.


IPython caches Input and Output history.

IPython supports Parallel Computing.

Launching Jupyter qt-console

The GIF illustrates the following:

How to open IPython in Jupyter qt-console from Anaconda Navigator.

How to execute Python statements in IPython?

How to run System commands in IPython?

Knowing about an object or a method.

Using Tab completion feature.

Understanding Magic Methods


Magic Methods begin with a single % or double %% symbols.

Line Magic Method: Magic method starting with one % symbol.

Line Magic Method is applicable only on a single line of code.

Cell Magic Method: Magic method starting with two %% symbols.

Cell Magic Method is applicable on multiple lines of code, written in a single


cell.

Starting Jupyter Notebook Server


Jupyter Notebook server can be launched from Anaconda Navigator Home Window. The
Notebook server opens in a browser and displays contents of starting folder.

The displayed page contains the following three tabs.

Files displays folders and files present in starting folder.

Running holds information of notebooks that are running.

Clusters contain information of notebooks running in parallel mode.

Creating a Folder
A folder can be created using Folder option present under New section.

The GIF illustrates the following.

Creating an Untitled folder.

Renaming it to MyJupyterNoteBooks, and

Changing working directory to MyJupyterNoteBooks folder.

Starting a Jupyter Notebook


A Jupyter Notebook can be created by Choosing an available Kernel.

The Kernel enables the environment required for executing the code snippets.

The GIF illustrates


Creation of Untitled Notebook.

Renaming it to MyFirstNoteBook.

Checking it's running status in Files / Running tabs.

Shutting down the notebook MyFirstNoteBook.

About a Notebook Cell


The basic element of a Notebook is Cell.

A user is allowed to write either code snippets or markdown text, inside a cell.

A Markdown Text can be used to embed Normal text, Header Text, Unordered, Ordered
Lists, Hyperlinks, Tables, Images,
Videos, HTML content, and other useful elements inside the Notebook.

Markdown Basics
In this section, you will be writing the following elements in Markdown.

Headers : Continuous 1 to 6 Hash Symbols are used to create Headers.

Emphasizing Text : Asterix *, or underscores _ are used to emphasize the text in


bold or italic.

Markdown Basics
Unordered Lists : Either of the symbols - Asterix *, hypen -, plus + are used.

Ordered Lists : Numbers followed with a dot . and

a space are used.

Nested Unordered Lists : The nested lists are indexed with a minimum of four spaces
and followed with symbols.

Justifying Text of a list element : Two spaces, at the end of each line, are used
to justify multiple lines of text.

Code snippets: Pair of three back quotes are used.

Hyperlinks: Text, written in a pair of square brackets, is linked to a Hyperlink,


specified in a pair of parenthesis.

Reference Links: Text and Reference both are written in two different pairs of
square brackets.

HTML Content : HTML tags can be directly used in Markdown.

Writing Your First Notebook

The above-shown GIF performs the following tasks in the notebook - MyFirstNoteBook.

Defines the string 's' with value Welcome to Jupyter Notebooks!!!.

Displays the string 's'.

Provides the required description.


The above GIF illustrates performing the following, additional tasks in
MyFirstNoteBook.

Determines the length of 's'.

Obtains the slice Jupyter Notebooks from 's'.

Find the number of vowels in 's'.

Filter the words starting with either 'J' or 'N'.

Provides titles as required.

Numpy
NumPy is a Python library, which supports efficient handling of various numerical
operations on arrays holding numeric data.

These arrays are known as N-dimensional arrays or ndarrays.

Ndarrays are capable of holding data elements in multiple dimensions.

Each data element of a ndarray is of fixed size.

All elements of a ndarray are of same data type.

N-dimensional array (ndarray)


N-dimensional array is an object, capable of holding data elements of same type and
of a fixed size in multiple dimensions.

Creation of a 1-D array of five elements, from a list is shown in Example 1.

Example 1

import numpy as np
x = np.array([5, 8,
9, 10,
11]) # using 'array' method

type(x) # Displays type of array 'x'


Output

numpy.ndarray

N-dimensional array (ndarray)...


Creation of a 2-D array from a list of lists is shown in Example 2.
Example 2

y = np.array([[6, 9, 5],
[10, 82, 34]])
print(y)
Output

array([[ 6, 9, 5],
[10, 82, 34]])

ndarray Attributes
Some of the important attributes of a ndarray are

ndim : Returns number of dimensions.


shape: Returns Shape in tuple.

size : Total number of elements.

dtype : Type of each element.

itemsize : Size of each element in Bytes.

nbytes : Total bytes consumed by all elements.

Example 3

print(y.ndim, y.shape, y.size, y.dtype, y.itemsize, y.nbytes)


Output

2 (2, 3) 6 int32 4 24

Numpy dtypes
Numpy supports various data types based on number of bytes required by the data
elements.

Data type can be explicitly specified with dtype argument.

A ndarray, holding float values is defined in Example 4.

Example 4

y = np.array([[6, 9, 5],
[10, 82, 34]],
dtype='float64')
print(y)
print(y.dtype)
Output

array([[ 6., 9., 5.],


[ 10., 82., 34.]])
float64

++++
def array_operations(l):
#Write your code below
x = np.array(l)
print(type(x),
print(x.ndim, x.shape, x.size))

Numpy Array creation


N-dimensional arrays or ndarray can be created in multiple ways in numpy.

Now let us focus on creating ndarray,

From Python built-in datatypes : lists or tuples

Using Numpy array creation methods like ones, ones_like, zeros, zeros_like

Using Numpy numeric sequence generators.

Using Numpy random module.


By reading data from a file.

You might also like