0% found this document useful (0 votes)

21 views

Date String Manipulations With Python

This document discusses manipulating and visualizing date string data in Python. It shows how to convert date strings to datetime objects to allow grouping and ordering of data by time segments like weeks and quarters. It demonstrates converting a date column in a lightning strike dataset to datetimes, then creating new columns for week, month, quarter and year. Finally, it creates bar charts plotting lightning strikes by week for 2018 and by quarter over three years to understand patterns in the data.

Uploaded by

Mostafa Fathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Date String Manipulations With Python

Uploaded by

Mostafa Fathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Date string manipulations with Python

As a data professional, you can expect

to work with date time objects and date strings. In this video,

we'll continue coding in Python and practice converting,

manipulating, and grouping data. By the end of this video, we'll create

a widely used data visualization, a bar graph that tells

a story with your data. Working with date strings will often

require breaking them down into smaller pieces. Breaking date strings into days, months,

and years allows you to group and order the other data in different ways so

that you can analyze it. Manipulating date and time strings

is a foundational skill in EDA. In this video,

you will learn to convert date strings in the NOAA lightning strike

dataset into datetime objects. We will discuss how to combine

these data objects into different groups by segments of

time such as quarters and weeks. Let's open a Python notebook,

and I'll show you what I mean. Let's begin by importing

Python libraries and packages. To start, import Matplotlib and

Pandas which you've used before. To review,

Pyplot is a very helpful package for creating visualizations like bar,

line, and pie charts. Pandas is one of the more popular packages

in data science because it's specific focus is a series of functions and

commands that help you work with datasets. The last package, Seaborn,

may be new to you. Seaborn is a visualization

library that is easier to use and produces nicer looking charts. For this video, we'll use the NOAA

lightning strike data for the years 2016, 2017, and 2018 to group lightning

strikes by different timeframes. This will help us understand total

lightning strikes by week and quarter. As I mentioned at the beginning of this

video, when manipulating date strings, the best thing to do is to break

down the date information like day, month, and year into parts. This allows us to group the data into
any time series grouping we want. Luckily, there's an easy way to do that

which is to create a datetime object. Now, as you'll recall

from a previous video, this NOAA dataset has three

columns giving us the date, number of lightning strikes, and

latitude and longitude of the strike. For us to manipulate the date column and

its data, we'll first need to convert

it into a datetime data type. We do that by simply

coding df ('date') and making that equal pd.to_datetime with df ('date') input in parentheses. Doing this
conversion

gives us the quickest and most direct path to manipulating

the date string in the date column which currently is in the format of

a four digit year followed by a dash, then the two digit month, a dash,

and lastly the two digit day. Okay, this is the exciting part. Because our dates are converted

into Panda's datetime objects, we can create any sort of

date grouping we want. Let's say for example, we want to group

the lightning strike data by both week and quarter. All we would need to do is

create some new columns. You'll see here we're creating four new

columns, week, month, quarter, and year. With the first line of code,

we're creating a column week by taking

the data in the date column and using the function strftime. This function from

the daytime package formats are datetime data into new strings. In this case, we want the year followed
by

a dash, and the number of weeks out of 52. If we want that string, we need to code it as %Y-W%V. The
percent sign is the command which

tells the datetime format to use the year data in the string. The W implies this is a week,

and the V stands for value, as in a sequential

number running from 1 to 52. The final string output for

the column data will be in this format, 2016-W27. The next line of code gives

us the new month column. The argument is then written as a %Y-%m. This will output the four

digit year followed by a dash, then the two digit month. Essentially, we're removing the last two
digit date from the original date string. Next, we will create a column for

quarters. In this case, a quarter is three months. Many corporations divide their

financial year into quarters. So knowing how to divide data into

quarter years is a very useful skill. In this case,

it only takes one line of code. We'll call the new column quarter,

and we'll use our date column with two underscore

period to create the quarter column. The datetime package has a pre-made code

for dividing datetime into quarters. In the two underscore period argument

field, we only need to place the letter q. After that, we can use the function

strftime to complete the string. For the argument, we put %Y-Q%q. The first Q is placed into the string
to

indicate we are talking about quarters. The percent sign followed by

the lower case q indicates the Pandas that we want the date

formatted into quarters. Our final column will be

the easiest to code of them all. The year column is created by taking

our original date column data and creating a string that includes

only the argument percent sign Y. This creates a column of data

with it with only the year in it. Now that we have formatted some strings,

let's quickly review our work by using the head function we learned

in the previous video. When we run this code,

our four new columns are there, week, month, quarter, and year. They are all formatted

just as we discussed. We can use these new strings

to learn more about the data. For example, let's say we want to group

the number of lightning strikes by weeks. An organization whose employees primarily

work outdoors might be interested in knowing the week to week likelihood

of dealing with lightning strikes. In order to do that,

we'll want to plot a chart. We've reviewed a couple of

charts coded in Python by now. Next, let's code a chart with

a lightning strike data. For plotting the number of lightning

strikes per week, let's use a bar chart. Our graph would be a bit confusing

using all three years of data. So, let's just use the 2018 data and limit our chart to 52 weeks

rather than 156 weeks. We can do this by creating a column

that groups the data by year and then orders it by week. We will then learn more about

the structuring function in another video. For now,

let's focus on plotting this bar chart. We'll use the plt.bar function to plot. Within our argument field, we
select

the x-axis which is our week column, then the y-axis or height,

which we input as a number of strikes. Next, we'll fill in some of

the details of our chart. Using plt.plot,

we will place arguments in the x-label, y-label, and title functions. The arguments are week number,

number of lightning strikes, and number of lightning strikes per week

(2018), respectively. This renders a graph, but

the x-axis labels are all joined together. So, we have a chart, but

the x-axis is difficult to read. So let's fix that. We can do that with

the plt.xticks function. For the rotation, we can put 45, and for

the fontsize, let's scale it down to 8. After we use plt.show,

the x-axis labels are much cleaner. Given our bar chart illustrating

lightning strikes per month in 2018, you could conclude that a group

planning outdoor activities for weeks 32 to 34 might want

a backup plan to move indoors. Of course, this is a broad

generalization to make on behalf of every North American

location in the dataset. But for our purposes and in general, it is a good understanding

of our dataset to have. For our last visualization,

let's plot lightning strikes by quarter. For our visualization,

it will be far easier to work with numbers in millions

such as 25.2 million rather than 25, 154, 365, for example. Let's create a column that divides

the total number of strikes by one million. We do this by typing df_by_quarter, and entering the
relevant column
in the arguments field. In this case, we want number of strikes. Next, we add on .div to

get our division function. Lastly for the argument field,

we enter 1000000. When we run this cell, we have a column that provides the number

of lightning strikes in millions. Next, we'll group the number

of strikes by quarter using the groupby and

reset_index functions. This code divides the number of strikes

into quarters for all three years. Each number is rounded

to the first decimal. The letter m represents one million. As you'll soon discover, this calculation

will help with the visualization. You'll learn more about these

functions in another video. We will plot our chart using

the same format as before. We use the plt.bar with

our x being from our df_by_quarter dataframe,

with quarter in the argument field. For the height, we put the number_of_strikes

column in the argument field. It would be helpful if each

quarter had the total lightning strike count at the top of each bar. To do that, we need to define our own

function, which we will call, addlabels. Let's type addlabels,

then input our two column axes, quarter, and number of strikes

separated by columns and brackets. At the end,

we use the format we created earlier, number_of_strikes formatted to label

the number_of_strikes_by_quarter. To finish the bar chart, we label

the x and y-axis and add the title. Before we show the data visualization, there are a few small things we
want to

add just to make it more friendly to read. Let's set our length and

height to 15 by 5. Next, let's make the bar

labels cleaner by defining those numbers and centering the text. Our bar chart now gives us the number
of

strikes by quarter from 2016 to 2018. To make the information easier to digest,

let's do one more visualization. Here is the code for

plotting a bar chart that groups the total number of strikes

year over year by quarter. Review the code carefully and

consider what each function an argument does in order to create this

final polished bar chart. Each year has assigned its own color to

highlight the differences in quarters. And now we have our chart. Coming up, you'll learn more about
the

different methods for structuring data. I'll see you there.

Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
From Everand
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
Bob Mather
3/5 (1)
S1 CS - U4 Data Ranges - Frequencies - Shifting
No ratings yet
S1 CS - U4 Data Ranges - Frequencies - Shifting
24 pages
Python Matplotlib Data Visualization
No ratings yet
Python Matplotlib Data Visualization
32 pages
Data Aggregation
No ratings yet
Data Aggregation
68 pages
EDA Structuring With Python
No ratings yet
EDA Structuring With Python
8 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Beginners Python Cheat Sheet PCC Matplotlib PDF
100% (1)
Beginners Python Cheat Sheet PCC Matplotlib PDF
2 pages
Python For Data Analytics
No ratings yet
Python For Data Analytics
3 pages
Slicing. Both, Numpy Array Indexing and Slicing Will Be Discussed in The Remainder
No ratings yet
Slicing. Both, Numpy Array Indexing and Slicing Will Be Discussed in The Remainder
50 pages
Beginners Python Cheat Sheet PCC Matplotlib PDF
No ratings yet
Beginners Python Cheat Sheet PCC Matplotlib PDF
2 pages
unit5i
No ratings yet
unit5i
34 pages
HKUST2023 Python HSC Lecture2
No ratings yet
HKUST2023 Python HSC Lecture2
13 pages
PDS Qba
No ratings yet
PDS Qba
12 pages
DS UNIT-VI
No ratings yet
DS UNIT-VI
22 pages
Beginners Python Cheat Sheet PCC Matplotlib BW
No ratings yet
Beginners Python Cheat Sheet PCC Matplotlib BW
2 pages
Beginners Python Cheat Sheet PCC Matplotlib
No ratings yet
Beginners Python Cheat Sheet PCC Matplotlib
2 pages
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
No ratings yet
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
55 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Python - Working With Data - Text Formats
No ratings yet
Python - Working With Data - Text Formats
23 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
Uob Python Lecture2p
No ratings yet
Uob Python Lecture2p
22 pages
4 - Python Concepts
No ratings yet
4 - Python Concepts
21 pages
Unit 4 python
No ratings yet
Unit 4 python
12 pages
Programming With Python: Contents
No ratings yet
Programming With Python: Contents
28 pages
Python Dataviz
No ratings yet
Python Dataviz
16 pages
Week 7 - Data Visualization
No ratings yet
Week 7 - Data Visualization
14 pages
Parsing Dates
No ratings yet
Parsing Dates
3 pages
unit 4
No ratings yet
unit 4
27 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
Tutorial - Time Series Analysis With Pandas - Dataquest
No ratings yet
Tutorial - Time Series Analysis With Pandas - Dataquest
32 pages
Unleashing the Power of TypeScript
From Everand
Unleashing the Power of TypeScript
Steve Kinney
No ratings yet
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Chapter 14
No ratings yet
Chapter 14
5 pages
Essential Python Data Visualization Libraries 1687141550
No ratings yet
Essential Python Data Visualization Libraries 1687141550
16 pages
PYDS 3150713 Unit-4
No ratings yet
PYDS 3150713 Unit-4
59 pages
0A Python Prerequisites
No ratings yet
0A Python Prerequisites
9 pages
Module 7 Python for data analytics Assignment
No ratings yet
Module 7 Python for data analytics Assignment
2 pages
Unit Iv Notes Class 12
No ratings yet
Unit Iv Notes Class 12
22 pages
0_python-sololearn[1]
No ratings yet
0_python-sololearn[1]
165 pages
Introduction To Data Visualization With Matplotlib: Ariel Rokem
No ratings yet
Introduction To Data Visualization With Matplotlib: Ariel Rokem
30 pages
Object Oriented Programming: Numeriano B. Aguado John Ren G. Santos
No ratings yet
Object Oriented Programming: Numeriano B. Aguado John Ren G. Santos
75 pages
RAW Data
No ratings yet
RAW Data
22 pages
Week 4: 2D Arrays and Plotting
No ratings yet
Week 4: 2D Arrays and Plotting
25 pages
Practical File Class - Xii Informatics Practices (New) : 1. How To Create A Series From A List, Numpy Array and Dict?
No ratings yet
Practical File Class - Xii Informatics Practices (New) : 1. How To Create A Series From A List, Numpy Array and Dict?
17 pages
Ex1_Plotting and Visualization using Numpy and Pandas
No ratings yet
Ex1_Plotting and Visualization using Numpy and Pandas
14 pages
Python CheatSheet horizontal
No ratings yet
Python CheatSheet horizontal
2 pages
Introduction to PHP, Part 3, Second Edition
From Everand
Introduction to PHP, Part 3, Second Edition
Adam Majczak
No ratings yet
UNIT - 1 EDA Continuation
No ratings yet
UNIT - 1 EDA Continuation
113 pages
Unit V notes
No ratings yet
Unit V notes
11 pages
Python Exploratory Data Analysis
No ratings yet
Python Exploratory Data Analysis
24 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
No ratings yet
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
19 pages
Learn Excel Functions: Count, Countif, Sum and Sumif
From Everand
Learn Excel Functions: Count, Countif, Sum and Sumif
Rajan
5/5 (4)
Start Predicting In A World Of Data Science And Predictive Analysis
From Everand
Start Predicting In A World Of Data Science And Predictive Analysis
Matthew Abbitt
No ratings yet
ITP107-Final Draft Module (20240529092814)
No ratings yet
ITP107-Final Draft Module (20240529092814)
74 pages
Dates in Python
No ratings yet
Dates in Python
24 pages
Chapter4-Easy and Powerful Dates and Times in Pandas
No ratings yet
Chapter4-Easy and Powerful Dates and Times in Pandas
34 pages
Final%20project%20I%20(final)%20(1)
No ratings yet
Final%20project%20I%20(final)%20(1)
21 pages
Data Visualization - Matplotlib PDF
100% (1)
Data Visualization - Matplotlib PDF
15 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
08 Java Chapter7
No ratings yet
08 Java Chapter7
24 pages
Scientific Notation Practice - W Key
0% (1)
Scientific Notation Practice - W Key
2 pages
Notepad++ Searching and Replacing
No ratings yet
Notepad++ Searching and Replacing
9 pages
Buku Tahun 4
No ratings yet
Buku Tahun 4
182 pages
16 Operator Overloading
No ratings yet
16 Operator Overloading
18 pages
6.operator Overloading and Type Conversion
No ratings yet
6.operator Overloading and Type Conversion
12 pages
Julián Arcas - Bolero PDF
100% (1)
Julián Arcas - Bolero PDF
3 pages
C With Programs
0% (1)
C With Programs
21 pages
Decimal To Binary
No ratings yet
Decimal To Binary
3 pages
Grade 5 Mathematics Sample Worksheet
No ratings yet
Grade 5 Mathematics Sample Worksheet
9 pages
Sample Data - Mini Project - With Formula
No ratings yet
Sample Data - Mini Project - With Formula
69 pages
All About That Bass
No ratings yet
All About That Bass
12 pages
Logic Report
No ratings yet
Logic Report
4 pages
NewsDesignandLayoutPrinciplesbyABH2016RHEPC PDF
No ratings yet
NewsDesignandLayoutPrinciplesbyABH2016RHEPC PDF
113 pages
Lesson 1 3 Decimal Places Recap Lesson
No ratings yet
Lesson 1 3 Decimal Places Recap Lesson
2 pages
Alphabet Workbook
No ratings yet
Alphabet Workbook
108 pages
XII CS Python Assignment 1
No ratings yet
XII CS Python Assignment 1
2 pages
Union in C
No ratings yet
Union in C
3 pages
PPT
No ratings yet
PPT
45 pages
Standard Language
No ratings yet
Standard Language
20 pages
Section 3.1 - Fractions To Decimals
No ratings yet
Section 3.1 - Fractions To Decimals
47 pages
Implementing A C++ Fixed-Point Class For Embedded Systems: Oliver Schloesser
No ratings yet
Implementing A C++ Fixed-Point Class For Embedded Systems: Oliver Schloesser
3 pages
C Aptitude
No ratings yet
C Aptitude
7 pages
Smart DFSORT Tricks
100% (1)
Smart DFSORT Tricks
113 pages
4 20180 Vj-Type Love-Specimen
No ratings yet
4 20180 Vj-Type Love-Specimen
13 pages
Multiplication
No ratings yet
Multiplication
10 pages
Sungha Jung Payphone
No ratings yet
Sungha Jung Payphone
10 pages
How To Convert Rupee Value To Words in Crystal Reports
No ratings yet
How To Convert Rupee Value To Words in Crystal Reports
5 pages
Emojis
No ratings yet
Emojis
171 pages
Number Base Conversion Exercises: Cs 33: Computer Organization
No ratings yet
Number Base Conversion Exercises: Cs 33: Computer Organization
2 pages

Date String Manipulations With Python

Uploaded by

Date String Manipulations With Python

Uploaded by

Date string manipulations with Python

As a data professional, you can expect

we'll continue coding in Python and practice converting,

a widely used data visualization, a bar graph that tells

is a foundational skill in EDA. In this video,

dataset into datetime objects. We will discuss how to combine

these data objects into different groups by segments of

time such as quarters and weeks. Let's open a Python notebook,

and I'll show you what I mean. Let's begin by importing

Python libraries and packages. To start, import Matplotlib and

Pandas which you've used before. To review,

Pyplot is a very helpful package for creating visualizations like bar,

in data science because it's specific focus is a series of functions and

may be new to you. Seaborn is a visualization

strikes by different timeframes. This will help us understand total

lightning strikes by week and quarter. As I mentioned at the beginning of this

video, when manipulating date strings, the best thing to do is to break

which is to create a datetime object. Now, as you'll recall

from a previous video, this NOAA dataset has three

columns giving us the date, number of lightning strikes, and

its data, we'll first need to convert

it into a datetime data type. We do that by simply

gives us the quickest and most direct path to manipulating

into Panda's datetime objects, we can create any sort of

date grouping we want. Let's say for example, we want to group

we're creating a column week by taking

and the V stands for value, as in a sequential

number running from 1 to 52. The final string output for

financial year into quarters. So knowing how to divide data into

quarter years is a very useful skill. In this case,

and we'll use our date column with two underscore

indicate we are talking about quarters. The percent sign followed by

formatted into quarters. Our final column will be

only the argument percent sign Y. This creates a column of data

in the previous video. When we run this code,

just as we discussed. We can use these new strings

the number of lightning strikes by weeks. An organization whose employees primarily

work outdoors might be interested in knowing the week to week likelihood

of dealing with lightning strikes. In order to do that,

we'll want to plot a chart. We've reviewed a couple of

charts coded in Python by now. Next, let's code a chart with

a lightning strike data. For plotting the number of lightning

rather than 156 weeks. We can do this by creating a column

the structuring function in another video. For now,

which we input as a number of strikes. Next, we'll fill in some of

the details of our chart. Using plt.plot,

number of lightning strikes, and number of lightning strikes per week

(2018), respectively. This renders a graph, but

the fontsize, let's scale it down to 8. After we use plt.show,

planning outdoor activities for weeks 32 to 34 might want

a backup plan to move indoors. Of course, this is a broad

generalization to make on behalf of every North American

of our dataset to have. For our last visualization,

let's plot lightning strikes by quarter. For our visualization,

it will be far easier to work with numbers in millions

get our division function. Lastly for the argument field,

of lightning strikes in millions. Next, we'll group the number

of strikes by quarter using the groupby and

reset_index functions. This code divides the number of strikes

into quarters for all three years. Each number is rounded

functions in another video. We will plot our chart using

the same format as before. We use the plt.bar with

our x being from our df_by_quarter dataframe,

column in the argument field. It would be helpful if each

function, which we will call, addlabels. Let's type addlabels,

separated by columns and brackets. At the end,

we use the format we created earlier, number_of_strikes formatted to label

the number_of_strikes_by_quarter. To finish the bar chart, we label

height to 15 by 5. Next, let's make the bar

let's do one more visualization. Here is the code for

plotting a bar chart that groups the total number of strikes

consider what each function an argument does in order to create this

different methods for structuring data. I'll see you there.

You might also like