Numpy - Python Package For Data
Numpy - Python Package For Data
Python is emerging as one of the favorite tools in the field of data science. With
powerful data science libraries like NumPy, SciPy,
pandas, matplotlib, scikit-learn and tools like IPython notebook combined with ease
of programming, Python is proving to be the preferred language for organizations.
This course will introduce you to some of these libraries useful for data science.
You will further take a deep dig on playing with NumPy.
Data Science
Data Science is an interdisciplinary area that extracts insights from data, present
in multiple forms.
To master the field of Data Science, one must possess knowledge on all of the
following fields.
Computer Science
Domain Knowledge
Problem Definition
Data Collection
Data Preprocessing
Data Transformation
Data Mining
Data Analysis
Data Visualization
Python provides many powerful libraries that can be used to perform various tasks
described above.
pandas
Provides functionality to deal with structured data.
Stores Data in different Primary data structures: Series, DataFrame and Panel.
matplotlib
Widely used for Data Visualization.
Used to generate various types of plots.
SciPy
A collection of efficient numerical algorithms used in Numerical integration,
Signal processing and Optimization.
NLTK
Performs different tasks related to Natural Language Processing.
scikit-learn
Python library used for Machine learning
Jupyter
Provides web based interactive computational environment.
Combines code, rich text, plots, media and mathematical equations together.
Bokeh
Offers interactive Web visualization features.
PyMongo
PyMongo distribution comprises tools for working with MongoDB.
MongoDB is a highly scalable and robust NoSQL database.
Scientific Distributions
Data scientist has to manually install all the python libraries required for
performing various tasks involved in Knowledge Discovery Process.
Time-consuming task.
Scientific Distribution.
All draw backs of manual installation could be overcome using any one of the
available Scientific Distributions.
Anaconda
Anaconda is a popular high-performance platform used for data science.
The base version is open source and contains over 100+ packages from Python, R, and
Scala.
Additionally, provides access to over 700+ packages that could be installed and
managed using conda.
Anaconda is available for 32-bit and 64-bit Operating systems: Windows, Linux, and
Mac OSX.
Installing Anaconda
Steps for installing Anaconda
Choose the Python Version, i.e 3.x or 2.x, based on your interest.
Anaconda Navigator
Provides access to various components of Anaconda Distribution.
Home
Environment
Projects
Learning
Community
Enables launching working environment through various modes like Jupyter Notebooks,
Jupyter qt-console, and Sypder IDE.
Environment Window
You can access anaconda's default Python interactive interpreter, using command
'python'.
conda --version
Command for viewing available environments.
conda info --envs
A new environment testenv, with Python 2.7, can be created using the below command.
activate testenv
conda list
Now you can verify the numpy availability with conda list command.
After successful installation, you can access numpy from testenv, without any
errors.
IPython
IPython provides interactive working environment, which is highly convenient and
efficient.
A Jupyter kernel that allows working with Python code in various interactive front
ends.
Features of IPython
Python statements and System commands can be executed in IPython.
Creating a Folder
A folder can be created using Folder option present under New section.
The Kernel enables the environment required for executing the code snippets.
Renaming it to MyFirstNoteBook.
A user is allowed to write either code snippets or markdown text, inside a cell.
A Markdown Text can be used to embed Normal text, Header Text, Unordered, Ordered
Lists, Hyperlinks, Tables, Images,
Videos, HTML content, and other useful elements inside the Notebook.
Markdown Basics
In this section, you will be writing the following elements in Markdown.
Markdown Basics
Unordered Lists : Either of the symbols - Asterix *, hypen -, plus + are used.
Nested Unordered Lists : The nested lists are indexed with a minimum of four spaces
and followed with symbols.
Justifying Text of a list element : Two spaces, at the end of each line, are used
to justify multiple lines of text.
Reference Links: Text and Reference both are written in two different pairs of
square brackets.
The above-shown GIF performs the following tasks in the notebook - MyFirstNoteBook.
Numpy
NumPy is a Python library, which supports efficient handling of various numerical
operations on arrays holding numeric data.
Example 1
import numpy as np
x = np.array([5, 8,
9, 10,
11]) # using 'array' method
numpy.ndarray
y = np.array([[6, 9, 5],
[10, 82, 34]])
print(y)
Output
array([[ 6, 9, 5],
[10, 82, 34]])
ndarray Attributes
Some of the important attributes of a ndarray are
Example 3
2 (2, 3) 6 int32 4 24
Numpy dtypes
Numpy supports various data types based on number of bytes required by the data
elements.
Example 4
y = np.array([[6, 9, 5],
[10, 82, 34]],
dtype='float64')
print(y)
print(y.dtype)
Output
++++
def array_operations(l):
#Write your code below
x = np.array(l)
print(type(x),
print(x.ndim, x.shape, x.size))
Using Numpy array creation methods like ones, ones_like, zeros, zeros_like