4.1. Plotting Data with Pandas#

4.1.1. Importing the DataFrame#

This notebook begins Part 3 of this textbook. Here, we will build upon our skills from Parts 1 and 2, and begin exploring how to visualize data in Pandas. Pandas sits on top of Matplotlib, one of the standard libraries used by data scientists for plotting data. As we will see in the next notebooks, you can also leverage other, more robust graphing libraries through Pandas. For now, though, let’s start with the basics. In this notebook, we will explore how to create three types of graphs: bar (and barh), pie, and scatter. I will also introduce you to some of the more recent features of Pandas 1.3.0, that allow you to control the graph a bit more.

Before we do any of that, however, let’s import pandas and our data.

import pandas as pd
df = pd.read_csv("../data/titanic.csv")
df
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns

4.1.2. Bar and Barh Charts with Pandas#

With our data imported successfully, let’s jump right in with bar charts. Bar charts a great way to visualize qualitative data quantitatively. To demonstrate what I mean by this, let’s consider if we wanted to know how many male passengers were on the Titanic relative to female passengers. I could grab all the value counts and look at the numbers by calling .value_counts(), as in the example below.

df['Sex'].value_counts()
male      577
female    314
Name: Sex, dtype: int64

This kind of raw numerical data is useful, but it is often difficult to present visually to audiences. For this reason, it is quite common to have the raw numerical data available, but to give the audience a quick sense of the numbers visually. We can take that initial code we see above and append two other methods to it .plot.bar() and we get the following result.

df['Sex'].value_counts().plot.bar()
<matplotlib.axes._subplots.AxesSubplot at 0x1b2877751c0>
../_images/130308ae20033e17dd42d5c7c794f7d8080f2bbc032087f2b9e5f6673e717cf7.png

Not bad, but this chart is quite staid. For one thing, we don’t even have a title! Let’s fix that. We can pass a keyword argument of title. This will take a string.

df['Sex'].value_counts().plot.bar(title="Passengers on the Titanic")
<matplotlib.axes._subplots.AxesSubplot at 0x1b2e64cdf10>
../_images/f482a75ae43218b2ff9b95db1c512e7b1299158a438a0cd7acddd3ae943e5f6a.png

We have another serious issue, though. Both types of gender are represented with the same color. This can be difficult for audiences to decipher in some instances, so let’s change that. We can pass the keyword argument of color which will take a list of colors.

df['Sex'].value_counts().plot.bar(title="Passengers on the Titanic", color=["blue", "red"])
<matplotlib.axes._subplots.AxesSubplot at 0x1b289804040>
../_images/534d2f23279c268fe48bf2f06cacb35be3cf966700a6f36eecf3d5975ba125c0.png
df['Sex'].value_counts().plot.bar(title="Passengers on the Titanic", color=["blue", "red"])
<matplotlib.axes._subplots.AxesSubplot at 0x1b28a8d52e0>
../_images/534d2f23279c268fe48bf2f06cacb35be3cf966700a6f36eecf3d5975ba125c0.png

We can do the same thing with a barh graph, or a bar-horizontal graph.

df['Sex'].value_counts().plot.barh(title="Passengers on the Titanic", color=["blue", "red"])
<matplotlib.axes._subplots.AxesSubplot at 0x1b28a93d460>
../_images/2ba5fc707b5b2428e6bd458770f596303baa74c28cec4fcbadf76d778573b682.png

4.1.3. Pie Charts with Pandas#

df['Sex'].value_counts().plot.pie()
<matplotlib.axes._subplots.AxesSubplot at 0x1b28a990250>
../_images/bf86a27aad9651442d0358d3bb846b9f22f479cb7282c2fc2384c3daa76d4112.png
df['Sex'].value_counts().plot.pie(figsize=(6, 6))
<matplotlib.axes._subplots.AxesSubplot at 0x1b28a9db490>
../_images/b80d8da181411d387c3cf7106d7a3e24551a94eba68110bfbd9b4ac291940871.png

Let’s say I was interested in the title of the genders not being lowercase. I can add in some custom labels to the data as a keyword argument, labels, which takes a list.

df['Sex'].value_counts().plot.pie(labels=["Male", "Female"])
<matplotlib.axes._subplots.AxesSubplot at 0x1b28aa1b100>
../_images/79f4bec96f225d42d109d08a5355558bc9cd5a368ef23057b65e829142351434.png

Now that we have our labels as we want them, let’s give thee audience a bit of a better experience. Let’s allow them to easily see the percentage of each gender, not just visually, but quantitatively. To do this, we can pass the keyword argument, autopct, which will take a string. In this case, we can pass in the argument “%.2f” which is a formatted string. This argument will convert our data into a percentage.

df['Sex'].value_counts().plot.pie(labels=["Male", "Female"], autopct="%.2f")
<matplotlib.axes._subplots.AxesSubplot at 0x1b28aa0ecd0>
../_images/85a7057f81ceda5bf4fc8c458251f563feed0cae620d8809dd4573f3e19c4748.png

4.1.4. Scatter Plots with Pandas#

Scatter plots allow us to plot qualitative data quantitatively in relation to two numerical attributes. Let’s imagine that we are interested in exploring all passengers, something qualitative. Now, we want to know how each passenger relates to other passengers on two numerical, or quantitative attributes, e.g. the age of the passenger and the fare that they paid. Both of these are quantitative. We can therefore represent each person as a point on the scatter plot and plot them in relation to their fare (vertical, or y axis) and age (horizontal, or x axis) on the graph.

In Pandas we can do this by passing two keyword arguments, x and y and set them both equal to the DataFramee column we want, e.g. “Age” and “Fare”.

df.plot.scatter(x="Age", y="Fare")
<matplotlib.axes._subplots.AxesSubplot at 0x1b28a99f9a0>
../_images/00b4dbefadf9e9e1322afc57e857c1148471c57b69970d263aa775c944d7a47a.png

That looks good, but we can do better. Let’s try to color coordinate this data. Let’s say we are interested in seeing not only the passenger’s age and fare, but we’re also interested in color-coordinating the graph so that their Pclass effects the color of each plot. We can do this by passing a few new keyword arguments.

  1. c=”Pclass” => c will be the column that affects the color

  2. cmap=”virdis” => will be the color map we want to use (these are built into Pandas)

df.plot.scatter(x="Age", y="Fare", c="Pclass",cmap="viridis")
<matplotlib.axes._subplots.AxesSubplot at 0x1b28aaf2310>
../_images/0beb3ff248c1253631d7541a32fd9e9042f20bf103be49525e0cbe60a5731bdf.png

This is starting to look a lot better now. But let’s say we didn’t want to represent our data as a series of marginally changing numbers. When we pass a DataFrame column to c as a set of numbers, Pandas presumes that that number corresponds to a gradient change in color. But passenger class is not a gradient change, it is a integral change, meaning no one will be Pclass 1.2. They will be 1, 2, or 3. In order to fix this graph, we can make a few changes. First, we can use df.loc that we met in a previous notebook to grab all classes. Now, we know there are three. We can convert these from numerical representations of the class into string representations, e.g. First, Second, and Third.

Next, we can convert that entire column from a string column into a Pandas Categorical Class.

df.loc[(df.Pclass == 1),'Pclass']="First"
df.loc[(df.Pclass == 2),'Pclass']="Second"
df.loc[(df.Pclass == 3),'Pclass']="Third"

We can now see that our data has now been altered in the Pclass column.

df
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 Third Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 First Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 Third Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 First Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 Third Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 Second Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 First Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 Third Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 First Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 Third Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns

Now that our data is successfully converted into a string, you might be thinking that we can run the same code as before and we should see the data divided between strings, rather than a gradient shift between floats. If we execute the cell below, however, we get a rather large and scary looking error. (Scroll down to see the solution).

df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=50)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:4239, in Axes._parse_scatter_color_args(c, edgecolors, kwargs, xsize, get_next_color_func)
   4238 try:  # Is 'c' acceptable as PathCollection facecolors?
-> 4239     colors = mcolors.to_rgba_array(c)
   4240 except ValueError:

File ~\anaconda3\lib\site-packages\matplotlib\colors.py:340, in to_rgba_array(c, alpha)
    339 else:
--> 340     return np.array([to_rgba(cc, alpha) for cc in c])

File ~\anaconda3\lib\site-packages\matplotlib\colors.py:340, in <listcomp>(.0)
    339 else:
--> 340     return np.array([to_rgba(cc, alpha) for cc in c])

File ~\anaconda3\lib\site-packages\matplotlib\colors.py:185, in to_rgba(c, alpha)
    184 if rgba is None:  # Suppress exception chaining of cache lookup failure.
--> 185     rgba = _to_rgba_no_colorcycle(c, alpha)
    186     try:

File ~\anaconda3\lib\site-packages\matplotlib\colors.py:261, in _to_rgba_no_colorcycle(c, alpha)
    260         return c, c, c, alpha if alpha is not None else 1.
--> 261     raise ValueError(f"Invalid RGBA argument: {orig_c!r}")
    262 # tuple color.

ValueError: Invalid RGBA argument: 'Third'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=50)

File ~\AppData\Roaming\Python\Python38\site-packages\pandas\plotting\_core.py:1636, in PlotAccessor.scatter(self, x, y, s, c, **kwargs)
   1553 def scatter(self, x, y, s=None, c=None, **kwargs):
   1554     """
   1555     Create a scatter plot with varying marker point size and color.
   1556 
   (...)
   1634         ...                       colormap='viridis')
   1635     """
-> 1636     return self(kind="scatter", x=x, y=y, s=s, c=c, **kwargs)

File ~\AppData\Roaming\Python\Python38\site-packages\pandas\plotting\_core.py:917, in PlotAccessor.__call__(self, *args, **kwargs)
    915 if kind in self._dataframe_kinds:
    916     if isinstance(data, ABCDataFrame):
--> 917         return plot_backend.plot(data, x=x, y=y, kind=kind, **kwargs)
    918     else:
    919         raise ValueError(f"plot kind {kind} can only be used for data frames")

File ~\AppData\Roaming\Python\Python38\site-packages\pandas\plotting\_matplotlib\__init__.py:71, in plot(data, kind, **kwargs)
     69         kwargs["ax"] = getattr(ax, "left_ax", ax)
     70 plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 71 plot_obj.generate()
     72 plot_obj.draw()
     73 return plot_obj.result

File ~\AppData\Roaming\Python\Python38\site-packages\pandas\plotting\_matplotlib\core.py:288, in MPLPlot.generate(self)
    286 self._compute_plot_data()
    287 self._setup_subplots()
--> 288 self._make_plot()
    289 self._add_table()
    290 self._make_legend()

File ~\AppData\Roaming\Python\Python38\site-packages\pandas\plotting\_matplotlib\core.py:1070, in ScatterPlot._make_plot(self)
   1068 else:
   1069     label = None
-> 1070 scatter = ax.scatter(
   1071     data[x].values,
   1072     data[y].values,
   1073     c=c_values,
   1074     label=label,
   1075     cmap=cmap,
   1076     norm=norm,
   1077     **self.kwds,
   1078 )
   1079 if cb:
   1080     cbar_label = c if c_is_column else ""

File ~\anaconda3\lib\site-packages\matplotlib\__init__.py:1565, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs)
   1562 @functools.wraps(func)
   1563 def inner(ax, *args, data=None, **kwargs):
   1564     if data is None:
-> 1565         return func(ax, *map(sanitize_sequence, args), **kwargs)
   1567     bound = new_sig.bind(ax, *args, **kwargs)
   1568     auto_label = (bound.arguments.get(label_namer)
   1569                   or bound.kwargs.get(label_namer))

File ~\anaconda3\lib\site-packages\matplotlib\cbook\deprecation.py:358, in _delete_parameter.<locals>.wrapper(*args, **kwargs)
    352 if name in arguments and arguments[name] != _deprecated_parameter:
    353     warn_deprecated(
    354         since, message=f"The {name!r} parameter of {func.__name__}() "
    355         f"is deprecated since Matplotlib {since} and will be removed "
    356         f"%(removal)s.  If any parameter follows {name!r}, they "
    357         f"should be pass as keyword, not positionally.")
--> 358 return func(*args, **kwargs)

File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:4401, in Axes.scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, plotnonfinite, **kwargs)
   4397 if len(s) not in (1, x.size):
   4398     raise ValueError("s must be a scalar, or the same size as x and y")
   4400 c, colors, edgecolors = \
-> 4401     self._parse_scatter_color_args(
   4402         c, edgecolors, kwargs, x.size,
   4403         get_next_color_func=self._get_patches_for_fill.get_next_color)
   4405 if plotnonfinite and colors is None:
   4406     c = np.ma.masked_invalid(c)

File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:4245, in Axes._parse_scatter_color_args(c, edgecolors, kwargs, xsize, get_next_color_func)
   4242         raise invalid_shape_exception(c.size, xsize)
   4243     # Both the mapping *and* the RGBA conversion failed: pretty
   4244     # severe failure => one may appreciate a verbose feedback.
-> 4245     raise ValueError(
   4246         f"'c' argument must be a color, a sequence of colors, or "
   4247         f"a sequence of numbers, not {c}")
   4248 else:
   4249     if len(colors) not in (0, 1, xsize):
   4250         # NB: remember that a single color is also acceptable.
   4251         # Besides *colors* will be an empty array if c == 'none'.

ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['Third' 'First' 'Third' 'First' 'Third' 'Third' 'First' 'Third' 'Third'
 'Second' 'Third' 'First' 'Third' 'Third' 'Third' 'Second' 'Third'
 'Second' 'Third' 'Third' 'Second' 'Second' 'Third' 'First' 'Third'
 'Third' 'Third' 'First' 'Third' 'Third' 'First' 'First' 'Third' 'Second'
 'First' 'First' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second' 'Third'
 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third'
 'First' 'Second' 'First' 'First' 'Second' 'Third' 'Second' 'Third'
 'Third' 'First' 'First' 'Third' 'First' 'Third' 'Second' 'Third' 'Third'
 'Third' 'Second' 'Third' 'Second' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Second' 'Third' 'Third'
 'Third' 'First' 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'Third'
 'First' 'First' 'Second' 'Second' 'Third' 'Third' 'First' 'Third' 'Third'
 'Third' 'Third' 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'Third'
 'Third' 'Third' 'Third' 'Second' 'First' 'Third' 'Second' 'Third'
 'Second' 'Second' 'First' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Third' 'Third' 'Second' 'Second' 'Second' 'First' 'First' 'Third'
 'First' 'Third' 'Third' 'Third' 'Third' 'Second' 'Second' 'Third' 'Third'
 'Second' 'Second' 'Second' 'First' 'Third' 'Third' 'Third' 'First'
 'Third' 'Third' 'Third' 'Third' 'Third' 'Second' 'Third' 'Third' 'Third'
 'Third' 'First' 'Third' 'First' 'Third' 'First' 'Third' 'Third' 'Third'
 'First' 'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'Second' 'Third'
 'Second' 'Third' 'First' 'Third' 'First' 'Third' 'Third' 'Second'
 'Second' 'Third' 'Second' 'First' 'First' 'Third' 'Third' 'Third'
 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Third' 'First' 'Third' 'Second' 'Third' 'Second' 'Third' 'First' 'Third'
 'Second' 'First' 'Second' 'Third' 'Second' 'Third' 'Third' 'First'
 'Third' 'Second' 'Third' 'Second' 'Third' 'First' 'Third' 'Second'
 'Third' 'Second' 'Third' 'Second' 'Second' 'Second' 'Second' 'Third'
 'Third' 'Second' 'Third' 'Third' 'First' 'Third' 'Second' 'First'
 'Second' 'Third' 'Third' 'First' 'Third' 'Third' 'Third' 'First' 'First'
 'First' 'Second' 'Third' 'Third' 'First' 'First' 'Third' 'Second' 'Third'
 'Third' 'First' 'First' 'First' 'Third' 'Second' 'First' 'Third' 'First'
 'Third' 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'First'
 'Third' 'Third' 'Third' 'Second' 'Third' 'First' 'First' 'Second' 'Third'
 'Third' 'First' 'Third' 'First' 'First' 'First' 'Third' 'Third' 'Third'
 'Second' 'Third' 'First' 'First' 'First' 'Second' 'First' 'First' 'First'
 'Second' 'Third' 'Second' 'Third' 'Second' 'Second' 'First' 'First'
 'Third' 'Third' 'Second' 'Second' 'Third' 'First' 'Third' 'Second'
 'Third' 'First' 'Third' 'First' 'First' 'Third' 'First' 'Third' 'First'
 'First' 'Third' 'First' 'Second' 'First' 'Second' 'Second' 'Second'
 'Second' 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Third' 'Third'
 'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'Third' 'Second' 'Third'
 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'First' 'First' 'Third'
 'Third' 'First' 'Third' 'First' 'Third' 'First' 'Third' 'Third' 'First'
 'Third' 'Third' 'First' 'Third' 'Second' 'Third' 'Second' 'Third'
 'Second' 'First' 'Third' 'Third' 'First' 'Third' 'Third' 'Third' 'Second'
 'Second' 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second'
 'Third' 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Second' 'Third'
 'Third' 'Second' 'Second' 'Second' 'Third' 'Third' 'Third' 'Third'
 'Third' 'Third' 'Third' 'Second' 'Second' 'Third' 'Third' 'First' 'Third'
 'Second' 'Third' 'First' 'First' 'Third' 'Second' 'First' 'Second'
 'Second' 'Third' 'Third' 'Second' 'Third' 'First' 'Second' 'First'
 'Third' 'First' 'Second' 'Third' 'First' 'First' 'Third' 'Third' 'First'
 'First' 'Second' 'Third' 'First' 'Third' 'First' 'Second' 'Third' 'Third'
 'Second' 'First' 'Third' 'Third' 'Third' 'Third' 'Second' 'Second'
 'Third' 'First' 'Second' 'Third' 'Third' 'Third' 'Third' 'Second' 'Third'
 'Third' 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Third' 'Third'
 'First' 'First' 'Third' 'Third' 'First' 'Third' 'First' 'Third' 'Third'
 'Third' 'Third' 'Third' 'First' 'First' 'Second' 'First' 'Third' 'Third'
 'Third' 'Third' 'First' 'First' 'Third' 'First' 'Second' 'Third' 'Second'
 'Third' 'First' 'Third' 'Third' 'First' 'Third' 'Third' 'Second' 'First'
 'Third' 'Second' 'Second' 'Third' 'Third' 'Third' 'Third' 'Second'
 'First' 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Second' 'First'
 'First' 'Second' 'Second' 'Third' 'Second' 'First' 'Second' 'Third'
 'Third' 'Third' 'First' 'First' 'First' 'First' 'Third' 'Third' 'Third'
 'Second' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second'
 'First' 'First' 'Third' 'Third' 'Third' 'Second' 'First' 'Third' 'Third'
 'Second' 'First' 'Second' 'First' 'Third' 'First' 'Second' 'First'
 'Third' 'Third' 'Third' 'First' 'Third' 'Third' 'Second' 'Third' 'Second'
 'Third' 'Third' 'First' 'Second' 'Third' 'First' 'Third' 'First' 'Third'
 'Third' 'First' 'Second' 'First' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Second' 'Third' 'Third' 'Second' 'Second' 'Third' 'First' 'Third'
 'Third' 'Third' 'First' 'Second' 'First' 'Third' 'Third' 'First' 'Third'
 'First' 'First' 'Third' 'Second' 'Third' 'Second' 'Third' 'Third' 'Third'
 'First' 'Third' 'Third' 'Third' 'First' 'Third' 'First' 'Third' 'Third'
 'Third' 'Second' 'Third' 'Third' 'Third' 'Second' 'Third' 'Third'
 'Second' 'First' 'First' 'Third' 'First' 'Third' 'Third' 'Second'
 'Second' 'Third' 'Third' 'First' 'Second' 'First' 'Second' 'Second'
 'Second' 'Third' 'Third' 'Third' 'Third' 'First' 'Third' 'First' 'Third'
 'Third' 'Second' 'Second' 'Third' 'Third' 'Third' 'First' 'First' 'Third'
 'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'First' 'Third' 'First'
 'First' 'Third' 'Third' 'Third' 'Second' 'Second' 'First' 'First' 'Third'
 'First' 'First' 'First' 'Third' 'Second' 'Third' 'First' 'Second' 'Third'
 'Third' 'Second' 'Third' 'Second' 'Second' 'First' 'Third' 'Second'
 'Third' 'Second' 'Third' 'First' 'Third' 'Second' 'Second' 'Second'
 'Third' 'Third' 'First' 'Third' 'Third' 'First' 'First' 'First' 'Third'
 'Third' 'First' 'Third' 'Second' 'First' 'Third' 'Second' 'Third' 'Third'
 'Third' 'Second' 'Second' 'Third' 'Second' 'Third' 'First' 'Third'
 'Third' 'Third' 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Third'
 'Third' 'Third' 'Second' 'Third' 'Second' 'Third' 'Third' 'Third' 'Third'
 'First' 'Third' 'First' 'First' 'Third' 'Third' 'Third' 'Third' 'Third'
 'Third' 'First' 'Third' 'Second' 'Third' 'First' 'Third' 'Second' 'First'
 'Third' 'Third' 'Third' 'Second' 'Second' 'First' 'Third' 'Third' 'Third'
 'First' 'Third' 'Second' 'First' 'Third' 'Third' 'Second' 'Third' 'Third'
 'First' 'Third' 'Second' 'Third' 'Third' 'First' 'Third' 'First' 'Third'
 'Third' 'Third' 'Third' 'Second' 'Third' 'First' 'Third' 'Second' 'Third'
 'Third' 'Third' 'First' 'Third' 'Third' 'Third' 'First' 'Third' 'Second'
 'First' 'Third' 'Third' 'Third' 'Third' 'Third' 'Second' 'First' 'Third'
 'Third' 'Third' 'First' 'Second' 'Third' 'First' 'First' 'Third' 'Third'
 'Third' 'Second' 'First' 'Third' 'Second' 'Second' 'Second' 'First'
 'Third' 'Third' 'Third' 'First' 'First' 'Third' 'Second' 'Third' 'Third'
 'Third' 'Third' 'First' 'Second' 'Third' 'Third' 'Second' 'Third' 'Third'
 'Second' 'First' 'Third' 'First' 'Third']
../_images/00f7e748ed2a754e2a5fff734ee83bb7bfaa67c07b7a3dcbbbecb4a610e6d405.png

Keeping this massive error in the textbook is essential, despite its size being rather annoying. It tells us a lot of information about the problem. When we try and pass a keyword argument of c, Pandas is expecting a series of numbers (which will correspond to gradient shifts in the cmap), a list of colors, or a Pandas Categorical column. To change our data to a list of colors, let’s convert our data into three different colors.

df.loc[(df.Pclass == "First"),'Pclass']="red"
df.loc[(df.Pclass == "Second"),'Pclass']="blue"
df.loc[(df.Pclass == "Third"),'Pclass']="green"
df.plot.scatter(x="Age", y="Fare", c="Pclass")
<matplotlib.axes._subplots.AxesSubplot at 0x1b286ac0c10>
../_images/a54ab3f7bca52178a20f3b2dda551b85cf166e09ead9a91bcbe6b7ad54fe78a9.png

Now, our plots are all color coordinated. But I don’t like this. It doesn’t have a nice ledger to read. Instead, we should convert this data into a Categorical Column. To do this, let’s first get our data back into First, Second, and Third class format.

df.loc[(df.Pclass == "red"),'Pclass']="First"
df.loc[(df.Pclass == "blue"),'Pclass']="Second"
df.loc[(df.Pclass == "green"),'Pclass']="Third"

Now, let’s try this again by first converting Pclass into a Categorical type.

df['Pclass'] = df.Pclass.astype('category')
df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis")
<matplotlib.axes._subplots.AxesSubplot at 0x1b28e20b4f0>
../_images/d8dc08f1a13e473cef2e7790042e61e31dcc81400994eeedb223d39a52b7d6f6.png

Now, like magic, we have precisely what we want to see. But we can do even better! Let’s say we don’t like the size of the nodes (points) on the graph. We want to see smaller nodes to distinguish better between the points. We can pass another keyword argument, s, which stands for size. This expects an integer.

df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=5)
<matplotlib.axes._subplots.AxesSubplot at 0x1b28f28f100>
../_images/e3edb8d128952bb0f52ea90056969db536881a0202383ebdd2c7eca008d1114f.png

To make it a bit easier to read, let’s also adjust the size a bit. We can do this by passing the keyword argument, figsize, that we saw above with pie chars.

df.plot.scatter(x="Age", y="Fare", c="Pclass", cmap="viridis", s=5, figsize=(15,5))
<matplotlib.axes._subplots.AxesSubplot at 0x1b28f308340>
../_images/039af6e5ae7b1f12c38587f97b535372fe19b3438f68109e5d5e8a20325441bb.png

By now, you should have a good sense of how to create simple bar, pie, and scatter charts. In the next few notebooks, we will be looking at other ways of leveraging Pandas to produce visualizations, such as using plotly and social networks with networkx.