Introduction:
Dataset: gapminder.csv
Predictors: 'internetuserate','urbanrate','employrate','lifeexpectancy'
Targets: polityscore
"polityscore" reflects the democracy level of a country. The score ranges from -10 to 10. 10 marks means the country is the most democratic. I divided it into 2 levels :[-10,0),[0,10),which return as 0 and 1 respectively,
Results:
Data Partitioning:
-predictors in training dataset: 4 variables and 91 observations
-predictors in test dataset: 4 variables and 61 observations
-target in training dataser: 1 variable and 91 observations
-target in test dataset: 1 variable and 61 observations
Training-test ratio: 0.6
Confusion matrix for the target_test sample:
[[ 6, 18],
[ 9, 28]]
True Negative=6
True Positive =28
False Negative =9
False Positive=18
Accuracy=0.5901639344262295
Binary Decision Tree:
Python Code:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
import sklearn.metrics
data = pd.read_csv("gapminder.csv")
data['polityscore'] = data['polityscore'].convert_objects(convert_numeric=True)
data['internetuserate'] = data['internetuserate'].convert_objects(convert_numeric=True)
data['urbanrate'] = data['urbanrate'].convert_objects(convert_numeric=True)
data['employrate'] = data['employrate'].convert_objects(convert_numeric=True)
data['lifeexpectancy'] = data['lifeexpectancy'].convert_objects(convert_numeric=True)
data_clean = data.dropna()
data_clean.dtypes
data_clean.describe()
def politysco (row):
if row['polityscore'] <= 0 :
return 0
elif row['polityscore'] <= 10:
return 1
data_clean['polityscore'] = data_clean.apply (lambda row: politysco (row),axis=1)
predictors = data_clean[['internetuserate','urbanrate','employrate','lifeexpectancy',]]
targets =data_clean['polityscore']
pred_train, pred_test, tar_train, tar_test = train_test_split(predictors, targets, test_size=.4)
print(pred_train.shape)
print(pred_test.shape)
print(tar_train.shape)
print(tar_test.shape)
#Build model on training data
classifier=DecisionTreeClassifier()
classifier=classifier.fit(pred_train,tar_train)
predictions=classifier.predict(pred_test)
sklearn.metrics.confusion_matrix(tar_test,predictions)
sklearn.metrics.accuracy_score(tar_test, predictions)
#Displaying the decision tree
from sklearn import tree
#from StringIO import StringIO
from io import StringIO
#from StringIO import StringIO
from IPython.display import Image
out = StringIO()
tree.export_graphviz(classifier, out_file=out)
import pydotplus
graph=pydotplus.graph_from_dot_data(out.getvalue())
Image(graph.create_png())
No comments:
Post a Comment