Applied Charting with Matplotlib (Subplots, Histogram, Box and Whisker Plot, Heatmap, Animation)
Original Source: https://www.coursera.org/specializations/data-science-python
Subplots
plt.subplot
returns a subplot axes at the given grid position.
Call signature: subplot(nrows, ncols, index, **kwargs)
In the current figure, create and return an .Axes
, at position index of a (virtual) grid of nrows by ncols axes. Indexes go from 1 to nrows * ncols
, incrementing in row-major order.
If nrows, ncols and index are all less than 10, they can also be given as a single, concatenated, three-digit number.
For example, subplot(2, 3, 3)
and subplot(233)
both create an .Axes
at the top right corner of the current figure, occupying half of the figure height and a third of the figure width.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
# set default figure size to (14, 8)
plt.rcParams['figure.figsize'] = (14.0, 8.0)
linear_data = np.array([1,2,3,4,5,6,7,8])
exponential_data = linear_data**2
plt.figure()
# subplot with 1 row, 2 columns, and current axis is 1st subplot axes
ax1 = plt.subplot(1, 2, 1)
plt.plot(linear_data, '-o')
# subplot with 1 row, 2 columns, and current axis is 2nd subplot axes
# pass sharey=ax1 to ensure the two subplots share the same y axis
ax2 = plt.subplot(1, 2, 2, sharey=ax1)
plt.plot(exponential_data, '-o')
plt.show()
plt.subplots
creates a figure and a set of subplots.
plt.subplots(nrows=1, ncols=1, sharex=False, sharey=False, squeeze=True, subplot_kw=None, gridspec_kw=None, **fig_kw)
This utility wrapper makes it convenient to create common layouts of subplots, including the enclosing figure object, in a single call.
# create a 3x3 grid of subplots
fig, ((ax1,ax2,ax3), (ax4,ax5,ax6), (ax7,ax8,ax9)) = plt.subplots(3, 3, sharex=True, sharey=True)
# plot the linear_data on the 5th subplot axes
ax5.plot(linear_data, '-')
plt.show()
Histograms
# create 2x2 grid of axis subplots
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex=True)
axs = [ax1,ax2,ax3,ax4]
# draw n = 10, 100, 1000, and 10000 samples from the normal distribution and plot corresponding histograms
for n in range(0,len(axs)):
sample_size = 10**(n+1)
sample = np.random.normal(loc=0.0, scale=1.0, size=sample_size)
axs[n].hist(sample, bins=100)
axs[n].set_title('n={}'.format(sample_size))
plt.show()
matplotlib.gridspec.Gridspec
is a class that specifies the geometry of the grid that a subplot will be placed. The location of grid is determined by similar way as the SubplotParams.
# use gridspec to partition the figure into subplots
import matplotlib.gridspec as gridspec
plt.figure()
gspec = gridspec.GridSpec(3, 3)
top_histogram = plt.subplot(gspec[0, 1:])
side_histogram = plt.subplot(gspec[1:, 0])
lower_right = plt.subplot(gspec[1:, 1:])
Y = np.random.normal(loc=0.0, scale=1.0, size=10000)
X = np.random.random(size=10000)
lower_right.scatter(X, Y)
# by setting density to True, historgram would be normalized to form a probability density
top_histogram.hist(X, bins=100, density=True)
side_histogram.hist(Y, bins=100, orientation='horizontal', density=True)
# flip the side histogram's x axis
side_histogram.invert_xaxis()
plt.show()
Box and Whisker Plots
plt.boxplot(x)
makes a box and whisker plot for each column of x or each vector in sequence x. The box extends from the lower to upper quartile values of the data, with a line at the median. The whiskers extend from the box to show the range of the data. Flier points are those past the end of the whiskers.
import pandas as pd
normal_sample = np.random.normal(loc=0.0, scale=1.0, size=10000)
random_sample = np.random.random(size=10000)
gamma_sample = np.random.gamma(2, size=10000)
df = pd.DataFrame({'normal': normal_sample,
'random': random_sample,
'gamma': gamma_sample})
(fig, (ax1, ax2, ax3)) = plt.subplots(3, 1, sharex=True)
ax1.hist(df['normal'], bins=100)
ax1.set_title('Normal Distribution')
ax2.hist(df['random'], bins=100)
ax2.set_title('Random Distribution')
ax3.hist(df['gamma'], bins=100)
ax3.set_title('Gamma Distribution')
plt.show()
plt.figure()
# create a boxplot of the normal data, assign the output to a variable to supress output
plt.boxplot([ df['normal'], df['random'], df['gamma'] ], whis='range')
plt.show()
# if `whis` argument isn't passed, boxplot defaults to showing 1.5*interquartile (IQR) whiskers with outliers
plt.figure()
plt.boxplot([ df['normal'], df['random'], df['gamma'] ] )
plt.show()
Heatmaps
A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors.
plt.figure()
Y = np.random.normal(loc=0.0, scale=1.0, size=10000)
X = np.random.random(size=10000)
plt.hist2d(X, Y, bins=25)
# add a colorbar legend
plt.colorbar()
plt.show()
Animations
matplotlib.animation.FuncAnimation(fig, func)
makes an animation by repeatedly calling a function func.
Input of the function would be the current step.
import matplotlib.animation as animation
n = 100
x = np.random.randn(n)
%matplotlib notebook
%matplotlib notebook
# create the function that will do the plotting, where curr is the current frame
def update(curr):
# check if animation is at the last frame, and if so, stop the animation
if curr == n:
a.event_source.stop()
plt.cla()
bins = np.arange(-4, 4, 0.5)
plt.hist(x[:curr], bins=bins)
plt.axis([-4,4,0,30])
plt.title('Sampling the Normal Distribution')
plt.ylabel('Frequency')
plt.xlabel('Value')
plt.annotate('n = {}'.format(curr), [3,27])
fig = plt.figure()
a = animation.FuncAnimation(fig, update, interval=100)
Leave a Comment