Hello everyone, in this blog post I'll cover some Summary Functions that the Pandas Python Data Analysis Library offers. Let's get started!
List of Summary Functions
describe() function is used to obtain a informative statistical summary of a given Pandas DataFrame. The data that is displayed if the DataFrame contains numerical columns include the following(which is shown only the numerical columns):
- count - Amount of not-null values
- mean - Average value of the column values
- std - Standard deviation of the column values
- 25% - Shows the value of the 25th percentile
- 50% - Shows the value of the 50th percentile
- 75% - Shows the value of the 75th percentile
- max- Maximum value contained in the column values
Note: The percentile value of the column data indicate how many of the values that are less than a given percentile. A percentile is a value on a scale of 100 that indicates the percent of a dataset that is equal to or below it
Info() function is used to display information about the DataFrame that it is used on. The information provided by it include:
- Number of columns in the DataFrame
- Column labels
- Column data types
- Memory usage
- Range index
- Number of cells in each column
info() function does not have a return value
Value Counts Function
value_counts() function returns a series containing counts of unique values.
The output object will be ordered in descending fashion. This means the first element is the most frequently-occurring element.
Note: this function excludes NA values by default
Well that's it for this post! Thanks for following along in this article and if you have any questions or concerns please feel free to post a comment in this post and I will get back to you when I find the time.