Quickest way to find correlation in a pandas dataframe

Suresh Sarda
2 min readApr 9, 2019

Well, pandas provides a method to do that. If you have a dataframe, simply calling df.corr() will give you a correlation matrix which you can then understand and plot. Here’s an example using UCI Health Disease Dataset:

df = pd.read_csv('../input/heart.csv')
corr = df.corr(method='pearson')

You can provide different methods to find correlation: pearson, kendall, spearman or pass a custom callable function.

The next step is you would want to visualize these correlations

import seaborn as sns
sns.heatmap(corr)

This gives a default heat-map which looks something like this:

Correlation between variables

Typically, these heat maps are mirrors across the diagonal so you can get rid of one of them.

# zero_like gives a zero numpy array similar to what is passed as first argument
# np.triu_indices_from gives the upper triangle indices (read triangle-upper-indices)
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Colors
cmap = sns.diverging_palette(240, 10, as_cmap=True)
# Plotting the heatmap
sns.heatmap(corr, mask=mask, linewidths=.5, cmap=cmap, center=0)
Correlation between variables (I have renamed the data frame so the labels look good)

That’s it! This will get you started with correlation.

See the complete notebook here. Follow me on Twitter and LinkedIn.

--

--