Date String Manipulations With Python
Date String Manipulations With Python
to work with date time objects and date strings. In this video,
manipulating, and grouping data. By the end of this video, we'll create
a story with your data. Working with date strings will often
require breaking them down into smaller pieces. Breaking date strings into days, months,
and years allows you to group and order the other data in different ways so
that you can analyze it. Manipulating date and time strings
you will learn to convert date strings in the NOAA lightning strike
line, and pie charts. Pandas is one of the more popular packages
commands that help you work with datasets. The last package, Seaborn,
library that is easier to use and produces nicer looking charts. For this video, we'll use the NOAA
lightning strike data for the years 2016, 2017, and 2018 to group lightning
down the date information like day, month, and year into parts. This allows us to group the data into
any time series grouping we want. Luckily, there's an easy way to do that
latitude and longitude of the strike. For us to manipulate the date column and
coding df ('date') and making that equal pd.to_datetime with df ('date') input in parentheses. Doing this
conversion
the date string in the date column which currently is in the format of
a four digit year followed by a dash, then the two digit month, a dash,
and lastly the two digit day. Okay, this is the exciting part. Because our dates are converted
the lightning strike data by both week and quarter. All we would need to do is
create some new columns. You'll see here we're creating four new
columns, week, month, quarter, and year. With the first line of code,
the data in the date column and using the function strftime. This function from
the daytime package formats are datetime data into new strings. In this case, we want the year followed
by
a dash, and the number of weeks out of 52. If we want that string, we need to code it as %Y-W%V. The
percent sign is the command which
tells the datetime format to use the year data in the string. The W implies this is a week,
the column data will be in this format, 2016-W27. The next line of code gives
us the new month column. The argument is then written as a %Y-%m. This will output the four
digit year followed by a dash, then the two digit month. Essentially, we're removing the last two
digit date from the original date string. Next, we will create a column for
quarters. In this case, a quarter is three months. Many corporations divide their
it only takes one line of code. We'll call the new column quarter,
period to create the quarter column. The datetime package has a pre-made code
for dividing datetime into quarters. In the two underscore period argument
field, we only need to place the letter q. After that, we can use the function
strftime to complete the string. For the argument, we put %Y-Q%q. The first Q is placed into the string
to
the lower case q indicates the Pandas that we want the date
the easiest to code of them all. The year column is created by taking
our original date column data and creating a string that includes
with it with only the year in it. Now that we have formatted some strings,
let's quickly review our work by using the head function we learned
our four new columns are there, week, month, quarter, and year. They are all formatted
to learn more about the data. For example, let's say we want to group
using all three years of data. So, let's just use the 2018 data and limit our chart to 52 weeks
that groups the data by year and then orders it by week. We will then learn more about
let's focus on plotting this bar chart. We'll use the plt.bar function to plot. Within our argument field, we
select
the x-axis which is our week column, then the y-axis or height,
we will place arguments in the x-label, y-label, and title functions. The arguments are week number,
the x-axis labels are all joined together. So, we have a chart, but
the x-axis is difficult to read. So let's fix that. We can do that with
the plt.xticks function. For the rotation, we can put 45, and for
the x-axis labels are much cleaner. Given our bar chart illustrating
lightning strikes per month in 2018, you could conclude that a group
location in the dataset. But for our purposes and in general, it is a good understanding
such as 25.2 million rather than 25, 154, 365, for example. Let's create a column that divides
the total number of strikes by one million. We do this by typing df_by_quarter, and entering the
relevant column
in the arguments field. In this case, we want number of strikes. Next, we add on .div to
we enter 1000000. When we run this cell, we have a column that provides the number
to the first decimal. The letter m represents one million. As you'll soon discover, this calculation
will help with the visualization. You'll learn more about these
with quarter in the argument field. For the height, we put the number_of_strikes
quarter had the total lightning strike count at the top of each bar. To do that, we need to define our own
then input our two column axes, quarter, and number of strikes
the x and y-axis and add the title. Before we show the data visualization, there are a few small things we
want to
add just to make it more friendly to read. Let's set our length and
labels cleaner by defining those numbers and centering the text. Our bar chart now gives us the number
of
strikes by quarter from 2016 to 2018. To make the information easier to digest,
final polished bar chart. Each year has assigned its own color to
highlight the differences in quarters. And now we have our chart. Coming up, you'll learn more about
the