Information Gain

关于信息增益计算

Informatio_Gain = Entropy(parent) - [weighted_average] * Entropy(children)

Grade	Bumpiness	Speed_Limit?	Speed
steep	bumpy	yes	slow
steep	smooth	yes	slow
flat	bumpy	no	fast
steep	smooth	no	fast

在这个数据集中, parent为Speed, 我们需要计算Entropy(parent)的值.
关于熵的计算, 请查看此链接Entropy.

Grade	Bumpiness	Speed_Limit?	Speed
steep	bumpy	yes	slow
steep	smooth	yes	slow
flat	bumpy	no	fast
steep	smooth	no	fast

手动统计概率:

用Python实现, 并存入List, 便于计算:

plist = [2 / 3, 1 / 3]

将其带入Entropy()得到答案:

➜ test ✗ python3 entropy.py
0.9182958340544896

后面计算时, 取小数点后四位, 四舍五入, 0.9183.

手动统计概率:

plist = [1]

将其带入Entropy()得到答案:

➜ test ✗ python3 entropy.py
0.0

Information_Gain = Entropy(parent) - [weighted_average] * Entropy(children)