Published on

# How to create an Age Distribution Graph Using Python, Pandas and Seaborn

Have your ever wondered how to create an age distribution graph using Python, Pandas and Seaborn? If so, keep reading in order to find out how!

Figure 1: Here the graph we'll learn to build in this tutorial

### Setup

First, here is the GitHub repo for this tutorial: Kaggle Titanic Project

We'll be working with the contents in the file `age-distribution-graph.ipynb` for this tutorial.

Note: We'll be working with Jupyter Notebook for this tutorial so if you don't have it installed you can do so in the official Jupyter website

### Development

After opening up `age-distribution-graph.ipynb` you'll notice that the code is divided up into blocks that can be run individually.

Let's go through each code block one by one:

``````import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import os
warnings.filterwarnings("ignore")

``````

Here we are importing all the neccessary libraries for constructing the Histograph that we're about to build. We'll be using `Seaborn` to create the Histograph using its `histplot` method(more on that method in their docs page) The `warnings.filterwarnings("ignore")` line is making sure to never print warnings that match an ordered list of filter specifications(more on `warnings.filter()` in their official docs page)

Next, we add the following code block:

``````def read_data():
return train_data, test_data

``````

Here we're defining the `read_data()` method, which is responsible for loading the data contained in a `.csv` file into a Pandas `DataFrame` object(more on `DataFrame` in their official docs). Now the `train_data` variable contains the training data and the `test_data` variable containing the testing data.

Next we can add the following code:

``````def survived_age_table(feature):
sns.histplot(data=train_data, x='Age', hue='Survived', palette=['yellow', 'green']).set_title(f"{feature} Vs Survived")
plt.legend(labels=['Died', 'Survived'])
plt.show()

``````

This method is responsible for creating the age distribution graph. Here are some more details about it:

• First we create the histogram by calling the method `sns.histplot()`(more on this method can be found in their official docs).
• The `data` parameter takes an input data structure, which is a `pandas.DataFrame` in our case.
• The `x` parameter specifies the variable subject to being counted, which in this case is the `Age` variable. Assigning a variable to the `hue` parameter, `Survived` in our case, would be an instance of conditional subsetting, whereby a seperate histogram containing its own unique values and colors will be rendered in the same graph.
• The `palette` parameter is a way to choose the colors to use when mapping the `hue` variable.
• Finally, we can set the title of the histogram via `set_title()`
• The `plt.legend()` method is a way to customize the legends displayed in the legend box located in the top right of the histogram.
• Lastly, `plt.show()` displays our histogram.

And here is our finished histogram:

Figure 2: Our Finished Histogram