Friday, January 1, 2016

Generating a Correlation Coefficient with python

Generating a Correlation Coefficient with Python

Introduction:
This is a simple practice to find the correlation between "internetuserate" and "polityscore" which are the variables of "gapminder" data set.

Python output 1:
association between polityscore and internet use rate
(0.36438422712027008, 3.14535959202636e-06)

R-square
0.132775864974

---
From the scatter plot and the Pearson Correlation Coefficient = 0.36438422712027008, we can conclude that those two variables do not have significant linear relationship (neither positive or negative). The R-square value is 0.132775864974 and it means only about 13.3% of internet use rate and be predicted and explained bt polityscore.

Python Code:
import pandas 
import numpy
import seaborn
import scipy
import matplotlib.pyplot as plt

gapminder=pandas.read_csv('gapminder.csv',low_memory=False)
print('number of rows and columns of gapminder')
print(len(gapminder))
print(len(gapminder.columns))


gapminder['internetuserate']=gapminder['internetuserate'].convert_objects(convert_numeric=True)
gapminder['polityscore']=gapminder['polityscore'].convert_objects(convert_numeric=True)

data=pandas.DataFrame()
data[['polityscore','internetuserate']]=gapminder[['polityscore','internetuserate']].dropna()



scat=seaborn.regplot(x="polityscore",y="internetuserate",fit_reg=False,data=data)
plt.xlabel("polityscore")
plt.ylabel("internetuserate")
plt.title('Scatterplot for the Association Between Internet Use Rate and PolityScore')

print('association between polityscore and internet use rate')
print(scipy.stats.pearsonr(data['internetuserate'],data['polityscore']))

print('R-square')
r_sq=scipy.stats.pearsonr(data['internetuserate'],data['polityscore'])[0]*\
scipy.stats.pearsonr(data['internetuserate'],data['polityscore'])[0]
print(r_sq)






No comments:

Post a Comment