1-Python Matplotlib
1-Python Matplotlib
A First Example
We will start with a simple graph. A graph in matplotlib is a two- or three-
dimensional drawing showing a relationship by means of points, a curve, or
amongst others a series of bars. We have two axis: The horizontal X-axis is
representing the independent values and the vertical Y-axis corresponds to the
depended values.
The command %matplotlib inline makes only sense, if you work with
Ipython Notebook. It makes sure, that the graphs will be depicted inside of the
document and not as independent windows:
%matplotlib inline
What we see is a continuous graph, even though we provided discrete data for
the Y values. By adding a format string to the function call of plot, we can
create a graph with discrete values, in our case blue circle markers. The format
string defines the way how the discrete points have to be rendered.
import matplotlib.pyplot as plt
plt.plot([-1, -4.5, 16, 23], "ob")
plt.show()
The following format string characters are accepted to control the line style or
marker:
=============================================
character description
=============================================
'-' solid line style
'--' dashed line style
'-.' dash-dot line style
':' dotted line style
'.' point marker
',' pixel marker
'o' circle marker
'v' triangle_down marker
'^' triangle_up marker
'<' triangle_left marker
'>' triangle_right marker
'1' tri_down marker
'2' tri_up marker
'3' tri_left marker
'4' tri_right marker
's' square marker
'p' pentagon marker
'*' star marker
'h' hexagon1 marker
'H' hexagon2 marker
'+' plus marker
'x' x marker
'D' diamond marker
'd' thin_diamond marker
'|' vline marker
'_' hline marker
===============================================
==================
character color
==================
'b' blue
'g' green
'r' red
'c' cyan
'm' magenta
'y' yellow
'k' black
'w' white
==================
As you may have guessed already, you can X values to the plot function. We
will pass the multiples of 3 up 21 including the 0 to plot in the following
example:
Labels on Axes
We can improve the appearance of our graph by adding labels to the axes. This
can be done with the ylabel and xlabel function of pyplot.
days = list(range(1,9))
celsius_min = [19.6, 24.1, 26.7, 28.3, 27.5,
30.5, 32.8, 33.1]
celsius_max = [24.8, 28.9, 31.3, 33.0, 34.9,
35.6, 38.4, 39.2]
plt.xlabel('Day')
plt.ylabel('Degrees Celsius')
plt.plot(days, celsius_min,
days, celsius_min, "oy",
days, celsius_max,
days, celsius_max, "or")
print("The current limits for the axes are:")
print(plt.axis())
print("We set the axes to the following
values:")
xmin, xmax, ymin, ymax = 0, 10, 14, 45
print(xmin, xmax, ymin, ymax)
plt.axis([xmin, xmax, ymin, ymax])
plt.show()
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(0, 2 * np.pi, 50,
endpoint=True)
F = np.sin(X)
plt.plot(X,F)
startx, endx = -0.1, 2*np.pi + 0.1
starty, endy = -1.1, 1.1
plt.axis([startx, endx, starty, endy])
plt.show()
We can use linewidth to set the width of a line as the name implies.
import numpy as np
import matplotlib.pyplot as plt
n = 256
X = np.linspace(-np.pi,np.pi,n,endpoint=True)
Y = np.sin(2*X)
plt.plot (X, Y, color='blue', alpha=1.00)
plt.fill_between(X, 0, Y, color='blue',
alpha=.1)
plt.show()
If True, interpolate between the two lines to find the precise point of intersection.
interpolate Otherwise, the start and end points of the filled region will only occur on explicit
values in the x array.
import numpy as np
import matplotlib.pyplot as plt
n = 256
X = np.linspace(-np.pi,np.pi,n,endpoint=True)
Y = np.sin(2*X)
plt.plot (X, Y, color='blue', alpha=1.00)
plt.fill_between(X, Y, 1, color='blue',
alpha=.1)
plt.show()
All we have to do to create a legend for lines, which already exist on the axes,
is to simply call the function "legend" with an iterable of strings, one for each
legend item. For example:
If we add a label to the plot function, the value will be used as the label in the
legend command. The only argument the legend function will still need is the
location argument "loc":
If we add a label to the plot function, the values will be used in the legend
command:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 25, 1000)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, '-b', label='sine')
plt.plot(x, y2, '-r', label='cosine')
plt.legend(loc='upper left')
plt.ylim(-1.5, 2.0)
plt.show()
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(0, 25, 1000)
F1 = np.sin(0.5 * X)
F2 = 3 * np.cos(0.8*X)
plt.plot(X, F1, label="$sin(0.5 * x)$")
plt.plot(X, F2, label="$3 sin(x)$")
plt.legend(loc='upper right')
In many cases we don't know what the result may look like before you plot it.
It could be for example, that the legend will overshadow an important part of
the lines. If you don't know what the data may look like, it may be best to use
'best' as the argument for loc. Matplotlib will automatically try to find the best
possible location for the legend:
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(0, 25, 1000)
F1 = np.sin(0.5 * X)
F2 = 3 * np.cos(0.8*X)
plt.plot(X, F1, label="$sin(0.5 * x)$")
plt.plot(X, F2, label="$3 sin(x)$")
plt.legend(loc='best')
We can see
in the
following two
examples,
that
loc='best'
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(-2 * np.pi, 2 * np.pi, 70,
endpoint=True)
F1 = np.sin(0.5*X)
F2 = -3 * np.cos(0.8*X)
plt.xticks( [-6.28, -3.14, 3.14, 6.28],
[r'$-2\pi$', r'$-\pi$', r'$+\pi$', r'$+2\pi$'])
plt.yticks([-3, -1, 0, +1, 3])
plt.plot(X, F1, label="$sin(0.5x)$")
plt.plot(X, F2, label="$-3 cos(0.8x)$")
plt.legend(loc='best')
plt.show()
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(-2 * np.pi, 2 * np.pi, 70,
endpoint=True)
F1 = np.sin(0.5*X)
F2 = 3 * np.cos(0.8*X)
plt.xticks( [-6.28, -3.14, 3.14, 6.28],
[r'$-2\pi$', r'$-\pi$', r'$+\pi$', r'$+2\pi$'])
plt.yticks([-3, -1, 0, +1, 3])
plt.plot(X, F1, label="$sin(0.5x)$")
plt.plot(X, F2, label="$3 cos(0.8x)$")
plt.legend(loc='best')
plt.show()
Annotations
Let's assume that you are especially interested in the value of
3∗sin(3∗pi/4).
import numpy as np
print(3 * np.sin(3 * np.pi/4))
2.12132034356
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(-2 * np.pi, 3 * np.pi, 70,
endpoint=True)
F1 = np.sin(X)
F2 = 3 * np.sin(X)
ax = plt.gca()
plt.xticks( [-6.28, -3.14, 3.14, 6.28],
[r'$-2\pi$', r'$-\pi$', r'$+\pi$', r'$+2\pi$'])
plt.yticks([-3, -1, 0, +1, 3])
x = 3 * np.pi / 4
plt.scatter([x,],[3 * np.sin(x),], 50, color
='blue')
plt.annotate(r'$(3\sin(\frac{3\pi}{4}),\frac{3}{\
sqrt{2}})$',
xy=(x, 3 * np.sin(x)),
xycoords='data',
xytext=(+20, +20),
textcoords='offset points',
fontsize=16,
arrowprops=dict(facecolor='blue'))
plt.plot(X, F1, label="$sin(x)$")
plt.plot(X, F2, label="$3 sin(x)$")
plt.legend(loc='lower left')
plt.show()
Matplotlib, Histograms
This deals with histograms. It's hard to imagine that you open a newspaper or
magazin without seeing some histograms telling you about the number of
smokers in certain age groups, the number of births per year and so on. It's a
great way to depict facts without having to use too many words, but on the
downside they can be used to manipulate or lie with statistics" as well.
What is a histogram? definition can be: It's a graphical representation
of a frequency distribution of some numerical data. Rectangles with
equal sizes in the horizontal directions have heights with the corresponding
frequencies.
Let's increase the number of bins. 10 bins is not a lot, if you imagine, that we
have 10,000 random values. To do so, we set the keyword parameter bins to
100:
plt.hist(gaussian_numbers, bins=100)
plt.show()
Another important keyword parameter of hist is "normed". "normed" is
optional and the default value is 'False'. If it is set to 'True', the first element of
the return tuple will be the counts normalized to form a probability density,
i.e., "n/(len(x)`dbin)", ie the integral of the histogram will sum to 1.
If both the parameters 'normed' and 'stacked' are set to 'True', the sum of the
histograms is normalized to 1.
plt.hist(gaussian_numbers,
bins=100,
normed=True,
stacked=True,
edgecolor="#6A9662",
color="#DDFFDD")
plt.show()
plt.hist(gaussian_numbers,
bins=100,
normed=True,
stacked=True,
cumulative=True)
plt.show()
Bar Plots
bars = plt.bar([1,2,3,4], [1,4,9,16])
bars[0].set_color('green')
plt.show()
f=plt.figure()
ax=f.add_subplot(1,1,1)
ax.bar([1,2,3,4], [1,4,9,16])
children = ax.get_children()
children[3].set_color('g')
import matplotlib.pyplot as plt
import numpy as np
years = ('2010', '2011', '2012', '2013', '2014')
visitors = (1241, 50927, 162242, 222093, 296665 / 8
* 12)
index = np.arange(len(visitors))
bar_width = 1.0
plt.bar(index, visitors, bar_width, color="green")
plt.xticks(index + bar_width / 2, years) # labels
get centered
plt.show()
Contour Plot
A contour line or isoline of a function of two variables is a curve along which
the function has a constant value.
We can also say in a more general way that a contour line of a function with
two variables is a curve which connects points with the same values.
import numpy as np
xlist = np.linspace(-3.0, 3.0, 3)
ylist = np.linspace(-3.0, 3.0, 4)
X, Y = np.meshgrid(xlist, ylist)
Z = np.sqrt(X**2 + Y**2)
print(Z)
import numpy as np
import matplotlib.pyplot as plt
xlist = np.linspace(-3.0, 3.0, 100)
ylist = np.linspace(-3.0, 3.0, 100)
X, Y = np.meshgrid(xlist, ylist)
Z = np.sqrt(X**2 + Y**2)
plt.figure()
cp = plt.contourf(X, Y, Z)
plt.colorbar(cp)
plt.title('Filled Contours Plot')
plt.xlabel('x (cm)')
plt.ylabel('y (cm)')
plt.show()
Individual Colours
import numpy as np
import matplotlib.pyplot as plt
xlist = np.linspace(-3.0, 3.0, 100)
ylist = np.linspace(-3.0, 3.0, 100)
X, Y = np.meshgrid(xlist, ylist)
Z = np.sqrt(X**2 + Y**2)
plt.figure()
contour = plt.contourf(X, Y, Z)
plt.clabel(contour, colors = 'k', fmt = '%2.1f',
fontsize=12)
c = ('#ff0000', '#ffff00', '#0000FF', '0.6', 'c',
'm')
contour_filled = plt.contourf(X, Y, Z, colors=c)
plt.colorbar(contour)
plt.title('Filled Contours Plot')
plt.xlabel('x (cm)')
plt.ylabel('y (cm)')
plt.show()
Levels
The levels were decided automatically by contour and contourf so far. They can
be defined manually, by providing a list of levels as a fourth parameter.
Contour lines will be drawn for each value in the list, if we use contour. For
contourf, there will be filled colored regions between the values in the list.
import numpy as np
import matplotlib.pyplot as plt
xlist = np.linspace(-3.0, 3.0, 100)
ylist = np.linspace(-3.0, 3.0, 100)
X, Y = np.meshgrid(xlist, ylist)
Z = np.sqrt(X ** 2 + Y ** 2 )
plt.figure()
levels = [0.0, 0.2, 0.5, 0.9, 1.5, 2.5, 3.5]
contour = plt.contour(X, Y, Z, levels, colors='k')
plt.clabel(contour, colors = 'k', fmt = '%2.1f',
fontsize=12)
contour_filled = plt.contourf(X, Y, Z, levels)
plt.colorbar(contour_filled)
plt.title('Plot from level list')
plt.xlabel('x (cm)')
plt.ylabel('y (cm)')
plt.show()
Series
import pandas as pd
data = [100, 120, 140, 180, 200, 210, 214]
s = pd.Series(data, index=range(len(data)))
s.plot()
We will
experiment now with a Series which has an index consisting of alphabetical
values.
import pandas as pd
cities = {"name": ["London", "Berlin", "Madrid",
"Rome",
"Paris", "Vienna", "Bucharest", "Hamburg",
"Budapest", "Warsaw", "Barcelona",
"Munich", "Milan"],
"population": [8615246, 3562166, 3165235,
2874038,
2273305, 1805681, 1803425, 1760433,
1754000, 1740119, 1602386, 1493900,
1350680],
"area" : [1572, 891.85, 605.77, 1285,
105.4, 414.6, 228, 755,
525.2, 517, 101.9, 310.4,
181.8]
}
city_frame = pd.DataFrame(cities,
columns=["population", "area"],
index=cities["name"])
print(city_frame)
The following code plots our DataFrame city_frame. We will multiply the area
column by 1000, because otherwise the "area" line would not be visible or in
other words would be overlapping with the x axis:
city_frame["area"] *= 1000
city_frame.plot()
This plot is not coming up to our expectations, because not all the city names
appear on the x axis. We can change this by defining the xticks explicitly with
"range(len((city_frame.index))". Furthermore, we have to set use_index to
True, so that we get city names and not numbers from 0 to
len((city_frame.index):
city_frame.plot(xticks=range(len(city_frame.index),
use_index=True)
city_frame.plot(xticks=range(len(city_frame.index))
,
use_index=True,
rot=90)
We multiplied the area column by 1000 to get a proper output. Instead of this,
we could have used twin axes. We will demonstrate this in the following
example. We will recreate the city_frame DataFrame to get the original area
column:
city_frame = pd.DataFrame(cities,
columns=["population", "area"],
index=cities["name"])
print(city_frame)
To get a twin axes represenation of our diagram, we need subplots from the
module matplotlib and the function "twinx":
We can
also create twin axis directly in Pandas without the aid of Matplotlib. We
demonstrate this in the code of the following program:
Let's add another axes to our city_frame. We will add a column with the
population density, i.e. the number of people per square kilometre:
city_frame["density"] = city_frame["population"] /
city_frame["area"]
city_frame
%matplotlib inline
import pandas as pd
data_path = "data1/"
data = pd.read_csv(data_path +
"python_course_monthly_history.txt",
quotechar='"',
thousands=",",
delimiter=r"\s+")
def unit_convert(x):
value, unit = x
if unit == "MB":
value *= 1024
elif unit == "GB":
value *= 1048576 # i.e. 1024 **2
return value
b_and_u= data[["Bandwidth", "Unit"]]
bandwidth = b_and_u.apply(unit_convert, axis=1)
del data["Unit"]
data["Bandwidth"] = bandwidth
month_year = data[["Month", "Year"]]
month_year = month_year.apply(lambda x: x[0] + " "
+ str(x[1]),
axis=1)
data["Month"] = month_year
del data["Year"]
data.set_index("Month", inplace=True)
del data["Bandwidth"]
data[["Unique visitors", "Number of
visits"]].plot(use_index=True,
rot=90,
xticks=range(1,
len(data.index),4))
The previous code returned the following:
To create bar plots with Pandas is as easy as plotting line plots. All we have to
do is add the keyword parameter "kind" to the plot method and set it to "bar".
A Simple Example
import pandas as pd
data = [100, 120, 140, 180, 200, 210, 214]
s = pd.Series(data, index=range(len(data)))
s.plot(kind="bar")
import pandas as pd
data = [100, 120, 140, 180, 200, 210, 214]
s = pd.Series(data, index=range(len(data)))
s.plot(kind="bar")
We received the following result:
progs[:6].plot(kind="bar")
progs.plot(kind="bar")
import pandas as pd
fruits = ['apples', 'pears', 'cherries', 'bananas']
series = pd.Series([20, 30, 40, 10],
index=fruits,
name='series')
series.plot.pie(figsize=(6, 6))
It looks ugly that we see the y label "Percentage" inside our pie
plot. We can remove it by calling "plt.ylabel('')"
Text(0,0.5,'')