Python For Data Science
By Kevin Clark
()
About this ebook
The phenomenon, named the fourth industrial revolution, and also known as Big Data, is bringing profound changes in the world we live in. It is still difficult to make accurate predictions of how the phenomenon will affect our lives and our world, but Big Data will change your personal life, your home, your car, your job, your health, your friendships, your diet, your sleep and your leisure.
Large-scale data, with speed and variety never before imagined makes today's technology difficult to store and process. But what good is a mountain of data if we can't extract value? Behind this phenomenon is electronic data. A few decades ago this was produced by a few types of equipment and had a high cost of storage; today it is produced everywhere, and the cost of storage it is very low, and getting cheaper by the day.
This book brings an introduction to the world of Data Science With Python. It explains analysis techniques using codes and includes over 60 programs that will walk you through Step by Step how to use Python in the world of data science. This hands on approach makes the presentation much more concrete.
Finally, the readers will have a better grip on the complexity of Python, as we will explore extensive use of Python libraries, which hide details behind powerful functions. It is a powerful mystery that we can teach you to solve. Now is your chance to get a hands on approach to data science with Python.
So, what are you waiting for? Buy this book now and start taking steps to learn Python Data Science.
Read more from Kevin Clark
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Hacking : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Ethical Hacking Rating: 5 out of 5 stars5/5Cybersecurity for Beginners : Learn the Fundamentals of Cybersecurity in an Easy, Step-by-Step Guide: 1 Rating: 0 out of 5 stars0 ratingsExcel : The Complete Ultimate Comprehensive Step-By-Step Guide To Learn Excel Programming Rating: 0 out of 5 stars0 ratingsExcel :The Ultimate Comprehensive Step-by-Step Guide to Strategies in Excel Programming (Formulas, Shortcuts and Spreadsheets): 2 Rating: 0 out of 5 stars0 ratings
Related to Python For Data Science
Related ebooks
Python Machine Learning: A Step by Step Beginner’s Guide to Learn Machine Learning Using Python Rating: 0 out of 5 stars0 ratingsPython In - Depth: Use Python Programming Features, Techniques, and Modules to Solve Everyday Problems Rating: 0 out of 5 stars0 ratingsMastering Python Data Analysis Rating: 0 out of 5 stars0 ratingsThe Ultimate Python Programming Guide For Beginner To Intermediate Rating: 5 out of 5 stars5/5Data Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition) Rating: 0 out of 5 stars0 ratingsPython from the Very Beginning Rating: 0 out of 5 stars0 ratingsGetting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsPython for Beginners: Learn It as Easy as Pie Rating: 0 out of 5 stars0 ratingsPython Programming: Your Advanced Guide To Learn Python in 7 Days Rating: 0 out of 5 stars0 ratingsPython 3 Programming: A Beginner Crash Course Guide to Learn Python 3 in 1 Week Rating: 3 out of 5 stars3/5Python: Advanced Guide to Programming Code with Python Rating: 0 out of 5 stars0 ratingsPython Rating: 0 out of 5 stars0 ratingsPython: Advanced Guide to Programming Code with Python: Python Computer Programming, #4 Rating: 0 out of 5 stars0 ratingsEssential Python 3 Rating: 0 out of 5 stars0 ratingsPython: Programming For Intermediates: Learn The Basics Of Python In 7 Days! Rating: 0 out of 5 stars0 ratingsModular Programming with Python Rating: 0 out of 5 stars0 ratingsPython Interview Questions: Ultimate Guide to Success Rating: 0 out of 5 stars0 ratingsConceptual Programming with Python Rating: 4 out of 5 stars4/5
Programming For You
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast! Rating: 5 out of 5 stars5/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Coding with JavaScript For Dummies Rating: 0 out of 5 stars0 ratingsGrokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsSQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Beginning Programming with C++ For Dummies Rating: 4 out of 5 stars4/5Python: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5C# 7.0 All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsC Programming for Beginners: Your Guide to Easily Learn C Programming In 7 Days Rating: 4 out of 5 stars4/5C All-in-One Desk Reference For Dummies Rating: 5 out of 5 stars5/5Python Data Structures and Algorithms Rating: 5 out of 5 stars5/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5Windows 11 For Dummies Rating: 0 out of 5 stars0 ratings
Reviews for Python For Data Science
0 ratings0 reviews
Book preview
Python For Data Science - Kevin Clark
Python For Data Science
© Copyright 2019 by Kevin Clark All rights reserved.
This document is geared towards providing exact and reliable information in regards to the topic and issue covered. The publication is sold with the idea that the publisher is not required to render accounting, officially permitted or otherwise qualified services. If advice is necessary, legal or professional, a practiced individual in the profession should be ordered.
From a Declaration of Principles which was accepted and approved equally by a Committee of the American Bar Association and a Committee of Publishers and Associations.
In no way is it legal to reproduce, duplicate, or transmit any part of this document in either electronic means or in printed format. Recording of this publication is strictly prohibited, and any storage of this document is not allowed unless with written permission from the publisher. All rights reserved.
The information provided herein is stated to be truthful and consistent, in that any liability, in terms of inattention or otherwise, by any usage or abuse of any policies, processes, or directions contained within is the solitary and utter responsibility of the recipient reader. Under no circumstances will any legal responsibility or blame be held against the publisher for any reparation, damages, or monetary loss due to the information herein, either directly or indirectly.
Respective authors own all copyrights not held by the publisher.
The information herein is offered for informational purposes solely and is universal as so. The presentation of the information is without a contract or any type of guarantee assurance.
The trademarks that are used are without any consent, and the publication of the trademark is without permission or backing by the trademark owner. All trademarks and brands within this book are for clarifying purposes only and are owned by the owners themselves, not affiliated with this document.
Contents
Data Science With Python
Introduction
Chapter One: Very Pleasure, Python Language
Very Pleasure, Python Language
Python Distributions
Python Versions - Python 2 versus Python 3
WinPython Installation
Using WinPython in Interactive Mode
Using WinPython in Normal Mode
Python Programming: Getting Started
Variable Names
Expressions
Keyboard Data Entry
Conditional Deviation Instructions: if, else, elif
Code Blocks
Relational and Logical Operators
Elif instruction
Nested Conditionals Instructions
Repeat Instructions (1): while
Vectorization
Repeat Instructions (2): for - range ()
Knowing the Range() Function
Python or R?
Chapter Two: Creating and Using Functions
Creating Functions
Modules
Optional Parameters and the None Value
Procedures
Parameter Values
Predefined Functions
Math Module
Statistics Module
‘Random’ Module
Print () function
Help () function
Chapter Three: Native Data Structures
Lists
Operations on Lists
Methods
Methods Available for Lists
Predefined Functions and Lists
List Comparison
Cloning a List
Lists as Function Arguments
Reference Parameters versus Value Parameters
Tuples
Tuples vs. Lists
Sets
Dictionaries
Basic Operations
Methods and Techniques of Iteration
Chapter Four: Strings and Databases in Text Format
Strings
Methods
String Comparison
String module
Text Files: File Handle
Text Files: Processing Column Separated File
Text Files: Importing an Entire File into a String
Text files: Processing CSV files
Text Files: Recording Files
Text Files: Knowing the Unicode Standard
Text files: UTF-8 versus ANSI
Text Files: Processing JSON Files
Regular Expressions
Search() function
Metacharacters
Findall() Function
Chapter Five: SQL Database and Language
Database
Database Management Systems (DBMS)
SQL
SQL: Basic SELECT
SQL: Joining Tables
SQL: Producing Aggregate Results
Conclusion
Introduction
Data science is the discipline that combines ideas from Statistics and Computer Science to solve the problem of knowledge discovery in databases. In this partnership, Statistics has the role of providing the tools to describe, analyze, summarize, interpret, and make inferences about the data. In turn, Computer Science is concerned with providing efficient technologies for the storage, access, integration, and transformation of data.
That is, the role of Computer Science is to make feasible the analysis of databases, often complex and voluminous, through statistical processes. Among the different technologies used for scientific computing, Python is undoubtedly one of the most prominent. It is a free programming language, extremely versatile and powerful, which has been widely adopted in projects related to data science, both by industry and the academic community.
This book presents the fundamental concepts and techniques for those who wish to start working with Python for data science. The book covers the computational aspects of data science, which means that its main focus is to teach the reader how to develop programs capable of processing databases of different sizes, formats, and degrees of complexity.
The work is intended for all types of professionals involved with data science: biologists, mathematicians, engineers, chemists, administrators, physicists, statisticians, economists, etc., in short, anyone who wants to learn how to develop their own Python scripts to explore databases related to problems in their area of expertise.
Reiterating: the book is not only intended for people with backgrounds in computing but all human beings interested in Python for data science. It is also important to make it clear that the book does not focus on teaching statistics, machine learning, or data mining. In fact, what we intend is to teach the reader to program in the Python language, enabling him, in the future, to develop any type of script in this language, including programs that can analyze large databases through statistical methods or using algorithms for machine learning and data mining. In short, what we want is to make the reader a top-notch pythonist1!
No prerequisites are required for reading the book, although knowledge of some programming language - such as R, MATLAB, C, or even Excel's macro programming - certainly helps speed up the learning process. The book is divided into five very broad chapters. The first three cover the language's rice and beans,
that is, the least you need to know to start developing any kind of Python application. The following chapters deal with themes that are more directly related to data science.
Chapter 1 - Pleasure to meet you, Python Language. It aims to present the Python environment and teach the reader how to create their first programs.
Chapter 2 - Creating and Using Functions. Data analysis is always facilitated with the use of functions. In this chapter, you will discover how to use the basic mathematical and statistical functions of Python and also learn how to create your own reusable functions and modules.
Chapter 3 - Native Data Structures. Data structures are used by programming languages to organize related data sets in memory to make their manipulation simpler and more efficient. This chapter presents the four native data structures of Python: lists, tuples, sets, and dictionaries.
Chapter 4 - Strings and Databases in Text Format. Before statistical techniques can analyze data, it needs to be loaded into the Python environment. This chapter presents the basic techniques for importing text databases structured in different ways: CSV, JSON, column separated file, etc. In addition, the chapter presents the numerous word processing tools offered by Python, from simple string functions to regular expressions.
Chapter 5 - SQL Database and Language. The main purpose of this chapter is to teach you how to query, combine, and explore tables stored in relational databases using the SQL language. Although SQL is more than 35 years old, it remains very relevant and is currently considered one of the key technologies in the area of data science.
Much of the text presented in this book refers to Python code (small programs that demonstrate the concepts presented) and the results produced by the code (usually in the form of printed output on the screen). The following typographical conventions have been adopted:
Constant font width: Used in the listings of Python programs and in the presentation of the contents of structured databases in text files.
Bold font with constant width: Used to represent the names of reserved words, functions, and some operators of the Python language. This convention is adopted both in the lists of programs and in the texts explaining their operation.
Word in double quotation marks
: Used in explanatory texts to highlight names of variables, files, and other objects (DataFrames, arrays, database tables, etc.).
Chapter One: Very Pleasure, Python Language
This chapter aims to make a presentation of the Python environment, as well as teach you how to write your first programs in the language. We begin by explaining what Python is and discussing why this technology has become fundamental to data science. We then present the step-by-step roadmap for installing, configuring, and using WinPython, one of the simplest environments for scientific programming in Python.
From there, the chapter to addresses programming itself through a sequence of lessons that aims to introduce the basic features of the Python language: variables, arithmetic operators, input and output, deviation structures, repetition structures, and code blocks. Closing the chapter, we present a brief comparison between Python and R, the two technologies that currently compete for the most important programming language post for data science.
What is Python?
Python is a general-purpose programming language, which means that it can be used in many different types of projects, ranging from web applications to artificial intelligence systems. The language was created in 1991, with the main philosophy of prioritizing the construction of simple and readable programs (what pythonists call beautiful
programs) over the construction of programs that, although fast, are complicated and difficult to read (the so-called ugly
programs).
Over the next ten years, language has achieved great popularity in both academic and corporate environments. This motivated the emergence of the Python Software Foundation in 2001, an independent, non-profit institution that became responsible for developing new versions of