Dataset: gapminder.csv
Response Variable: Internet Use Rate
Explanatory Variable: Polity Score
The two variables are both quantitative and the explanatory variable "polityscore" is going to be standardized to be mean 0.
Python Outout
Mean of Standardized polityscore (mean tends to be zero)
-3.91681166451608e-16
-----------
Conclusion:
From the output, b1 is 1.6043 and its P-value is less than 0.05. It is significant and the null hypothesis of b1 is zero is rejected. The intercept is 32.2811 (%, internet use rate). Therefore the polity score (the democracy level) did affect the internet use rate according to the simple linear regression analysis.
And the model is : Y=32.2811 + 1.6043X + c.
Python Code:
import numpy
import pandas
import statsmodels.formula.api as smf
import seaborn
import matplotlib.pyplot as plt
data = pandas.read_csv('gapminder.csv')
data['internetuserate'] = pandas.to_numeric(data['internetuserate'], errors='coerce')
data['polityscore'] = pandas.to_numeric(data['polityscore'], errors='coerce')
data['polityscore']=data['polityscore'].dropna()
m = numpy.mean(data['polityscore'])
data['polityscore2']=data['polityscore']-m
data['polityscore2']=data['polityscore2']
a=numpy.mean(data['polityscore2'])
print('Mean of Standardized polityscore')
print(a)
scat1 = seaborn.regplot(x="polityscore2", y="internetuserate", scatter=True, data=data)
plt.xlabel('polityscore')
plt.ylabel('Internet Use Rate')
plt.title ('Scatterplot for the Association Between Polityscore and Internet Use Rate')
print(scat1)
print ("OLS regression model for the association between urban rate and internet use rate")
reg1 = smf.ols('internetuserate ~ polityscore2', data=data).fit()
print (reg1.summary())
No comments:
Post a Comment