Applied Charting with Matplotlib (Subplots, Histogram, Box and Whisker Plot, Heatmap, Animation)

Original Source: https://www.coursera.org/specializations/data-science-python

Subplots

plt.subplot returns a subplot axes at the given grid position.

Call signature: subplot(nrows, ncols, index, **kwargs)

In the current figure, create and return an .Axes, at position index of a (virtual) grid of nrows by ncols axes. Indexes go from 1 to nrows * ncols, incrementing in row-major order.

If nrows, ncols and index are all less than 10, they can also be given as a single, concatenated, three-digit number.

For example, subplot(2, 3, 3) and subplot(233) both create an .Axes at the top right corner of the current figure, occupying half of the figure height and a third of the figure width.

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

# set default figure size to (14, 8)
plt.rcParams['figure.figsize'] = (14.0, 8.0)
linear_data = np.array([1,2,3,4,5,6,7,8])
exponential_data = linear_data**2

plt.figure()

# subplot with 1 row, 2 columns, and current axis is 1st subplot axes
ax1 = plt.subplot(1, 2, 1)
plt.plot(linear_data, '-o')

# subplot with 1 row, 2 columns, and current axis is 2nd subplot axes
# pass sharey=ax1 to ensure the two subplots share the same y axis
ax2 = plt.subplot(1, 2, 2, sharey=ax1)
plt.plot(exponential_data, '-o')

plt.show()

png

plt.subplots creates a figure and a set of subplots.

plt.subplots(nrows=1, ncols=1, sharex=False, sharey=False, squeeze=True, subplot_kw=None, gridspec_kw=None, **fig_kw)

This utility wrapper makes it convenient to create common layouts of subplots, including the enclosing figure object, in a single call.

# create a 3x3 grid of subplots
fig, ((ax1,ax2,ax3), (ax4,ax5,ax6), (ax7,ax8,ax9)) = plt.subplots(3, 3, sharex=True, sharey=True)

# plot the linear_data on the 5th subplot axes
ax5.plot(linear_data, '-')

plt.show()

png

Histograms

# create 2x2 grid of axis subplots
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex=True)
axs = [ax1,ax2,ax3,ax4]

# draw n = 10, 100, 1000, and 10000 samples from the normal distribution and plot corresponding histograms
for n in range(0,len(axs)):
    sample_size = 10**(n+1)
    sample = np.random.normal(loc=0.0, scale=1.0, size=sample_size)
    axs[n].hist(sample, bins=100)
    axs[n].set_title('n={}'.format(sample_size))

plt.show()

png

matplotlib.gridspec.Gridspec is a class that specifies the geometry of the grid that a subplot will be placed. The location of grid is determined by similar way as the SubplotParams.

# use gridspec to partition the figure into subplots
import matplotlib.gridspec as gridspec

plt.figure()
gspec = gridspec.GridSpec(3, 3)

top_histogram = plt.subplot(gspec[0, 1:])
side_histogram = plt.subplot(gspec[1:, 0])
lower_right = plt.subplot(gspec[1:, 1:])

Y = np.random.normal(loc=0.0, scale=1.0, size=10000)
X = np.random.random(size=10000)

lower_right.scatter(X, Y)
# by setting density to True, historgram would be normalized to form a probability density
top_histogram.hist(X, bins=100, density=True)
side_histogram.hist(Y, bins=100, orientation='horizontal', density=True)

# flip the side histogram's x axis
side_histogram.invert_xaxis()

plt.show()

png

Box and Whisker Plots

plt.boxplot(x) makes a box and whisker plot for each column of x or each vector in sequence x. The box extends from the lower to upper quartile values of the data, with a line at the median. The whiskers extend from the box to show the range of the data. Flier points are those past the end of the whiskers.

import pandas as pd
normal_sample = np.random.normal(loc=0.0, scale=1.0, size=10000)
random_sample = np.random.random(size=10000)
gamma_sample = np.random.gamma(2, size=10000)

df = pd.DataFrame({'normal': normal_sample,
                   'random': random_sample,
                   'gamma': gamma_sample})
(fig, (ax1, ax2, ax3)) = plt.subplots(3, 1, sharex=True)
ax1.hist(df['normal'], bins=100)
ax1.set_title('Normal Distribution')
ax2.hist(df['random'], bins=100)
ax2.set_title('Random Distribution')
ax3.hist(df['gamma'], bins=100)
ax3.set_title('Gamma Distribution')
plt.show()

png

plt.figure()
# create a boxplot of the normal data, assign the output to a variable to supress output
plt.boxplot([ df['normal'], df['random'], df['gamma'] ], whis='range')
plt.show()

png

# if `whis` argument isn't passed, boxplot defaults to showing 1.5*interquartile (IQR) whiskers with outliers
plt.figure()
plt.boxplot([ df['normal'], df['random'], df['gamma'] ] )
plt.show()

png

Heatmaps

A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors.

plt.figure()

Y = np.random.normal(loc=0.0, scale=1.0, size=10000)
X = np.random.random(size=10000)

plt.hist2d(X, Y, bins=25)

# add a colorbar legend
plt.colorbar()

plt.show()

png

Animations

matplotlib.animation.FuncAnimation(fig, func) makes an animation by repeatedly calling a function func.
Input of the function would be the current step.

import matplotlib.animation as animation

n = 100
x = np.random.randn(n)
%matplotlib notebook
%matplotlib notebook
# create the function that will do the plotting, where curr is the current frame
def update(curr):
    # check if animation is at the last frame, and if so, stop the animation
    if curr == n:
        a.event_source.stop()
    plt.cla()
    bins = np.arange(-4, 4, 0.5)
    plt.hist(x[:curr], bins=bins)
    plt.axis([-4,4,0,30])
    plt.title('Sampling the Normal Distribution')
    plt.ylabel('Frequency')
    plt.xlabel('Value')
    plt.annotate('n = {}'.format(curr), [3,27])
fig = plt.figure()
a = animation.FuncAnimation(fig, update, interval=100)

gif

Leave a Comment