<meter id="pryje"><nav id="pryje"><delect id="pryje"></delect></nav></meter>

<label id="pryje"></label>

新聞中心

EEPW首頁 > 消費(fèi)電子 > 設(shè)計(jì)應(yīng)用 > 機(jī)器學(xué)習(xí)：決策樹--python

機(jī)器學(xué)習(xí)：決策樹--python

作者：時(shí)間：2018-07-24 來源：網(wǎng)絡(luò)

加入技術(shù)交流群
- 掃碼加入
  和技術(shù)大咖面對(duì)面交流
  海量資料庫查詢

收藏

今天，我們介紹機(jī)器學(xué)習(xí)里比較常用的一種分類算法，決策樹。決策樹是對(duì)人類認(rèn)知識(shí)別的一種模擬，給你一堆看似雜亂無章的數(shù)據(jù)，如何用盡可能少的特征，對(duì)這些數(shù)據(jù)進(jìn)行有效的分類。

本文引用地址：http://www.ex-cimer.com/article/201807/383543.htm

決策樹借助了一種層級(jí)分類的概念，每一次都選擇一個(gè)區(qū)分性最好的特征進(jìn)行分類，對(duì)于可以直接給出標(biāo)簽 label 的數(shù)據(jù)，可能最初選擇的幾個(gè)特征就能很好地進(jìn)行區(qū)分，有些數(shù)據(jù)可能需要更多的特征，所以決策樹的深度也就表示了你需要選擇的幾種特征。

在進(jìn)行特征選擇的時(shí)候，常常需要借助信息論的概念，利用最大熵原則。

決策樹一般是用來對(duì)離散數(shù)據(jù)進(jìn)行分類的，對(duì)于連續(xù)數(shù)據(jù)，可以事先對(duì)其離散化。

在介紹決策樹之前，我們先簡單的介紹一下信息熵，我們知道，熵的定義為：

我們先構(gòu)造一些簡單的數(shù)據(jù)：

from sklearn import datasets

import numpy as np

import matplotlib.pyplot as plt

import math

import operator

def Create_data():

dataset = [[1, 1, 'yes'],

[1, 1, 'yes'],

[1, 0, 'no'],

[0, 1, 'no'],

[0, 1, 'no'],

[3, 0, 'maybe']]

feat_name = ['no surfacing', 'flippers']

return dataset, feat_name

然后定義一個(gè)計(jì)算熵的函數(shù)：

def Cal_entrpy(dataset):

n_sample = len(dataset)

n_label = {}

for featvec in dataset:

current_label = featvec[-1]

if current_label not in n_label.keys():

n_label[current_label] = 0

n_label[current_label] += 1

shannonEnt = 0.0

for key in n_label:

prob = float(n_label[key]) / n_sample

shannonEnt -= prob * math.log(prob, 2)

return shannonEnt

要注意的是，熵越大，說明數(shù)據(jù)的類別越分散，越呈現(xiàn)某種無序的狀態(tài)。

下面再定義一個(gè)拆分?jǐn)?shù)據(jù)集的函數(shù)：

def Split_dataset(dataset, axis, value):

retDataSet = []

for featVec in dataset:

if featVec[axis] == value:

reducedFeatVec = featVec[:axis]

reducedFeatVec.extend(featVec[axis+1 :])

retDataSet.append(reducedFeatVec)

return retDataSet

結(jié)合前面的幾個(gè)函數(shù)，我們可以構(gòu)造一個(gè)特征選擇的函數(shù)：

def Choose_feature(dataset):

num_sample = len(dataset)

num_feature = len(dataset[0]) - 1

baseEntrpy = Cal_entrpy(dataset)

best_Infogain = 0.0

bestFeat = -1

for i in range (num_feature):

featlist = [example[i] for example in dataset]

uniquValus = set(featlist)

newEntrpy = 0.0

for value in uniquValus:

subData = Split_dataset(dataset, i, value)

prob = len(subData) / float(num_sample)

newEntrpy += prob * Cal_entrpy(subData)

info_gain = baseEntrpy - newEntrpy

if (info_gain > best_Infogain):

best_Infogain = info_gain

bestFeat = i

return bestFeat

然后再構(gòu)造一個(gè)投票及計(jì)票的函數(shù)

def Major_cnt(classlist):

class_num = {}

for vote in classlist:

if vote not in class_num.keys():

class_num[vote] = 0

class_num[vote] += 1

Sort_K = sorted(class_num.iteritems(),

key = operator.itemgetter(1), reverse=True)

return Sort_K[0][0]

有了這些，就可以構(gòu)造我們需要的決策樹了：

def Create_tree(dataset, featName):

classlist = [example[-1] for example in dataset]

if classlist.count(classlist[0]) == len(classlist):

return classlist[0]

if len(dataset[0]) == 1:

return Major_cnt(classlist)

bestFeat = Choose_feature(dataset)

bestFeatName = featName[bestFeat]

myTree = {bestFeatName: {}}

del(featName[bestFeat])

featValues = [example[bestFeat] for example in dataset]

uniqueVals = set(featValues)

for value in uniqueVals:

subLabels = featName[:]

myTree[bestFeatName][value] = Create_tree(Split_dataset

(dataset, bestFeat, value), subLabels)

return myTree

def Get_numleafs(myTree):

numLeafs = 0

firstStr = myTree.keys()[0]

secondDict = myTree[firstStr]

for key in secondDict.keys():

if type(secondDict[key]).__name__ == 'dict' :

numLeafs += Get_numleafs(secondDict[key])

else:

numLeafs += 1

return numLeafs

def Get_treedepth(myTree):

max_depth = 0

firstStr = myTree.keys()[0]

secondDict = myTree[firstStr]

for key in secondDict.keys():

if type(secondDict[key]).__name__ == 'dict' :

this_depth = 1 + Get_treedepth(secondDict[key])

else:

this_depth = 1

if this_depth > max_depth:

max_depth = this_depth

return max_depth

我們也可以把決策樹繪制出來:

def Plot_node(nodeTxt, centerPt, parentPt, nodeType):

Create_plot.ax1.annotate(nodeTxt, xy=parentPt,

xycoords='axes fraction',

xytext=centerPt, textcoords='axes fraction',

va=center, ha=center, bbox=nodeType, arrowprops=arrow_args)

def Plot_tree(myTree, parentPt, nodeTxt):

numLeafs = Get_numleafs(myTree)

Get_treedepth(myTree)

firstStr = myTree.keys()[0]

cntrPt = (Plot_tree.xOff + (1.0 + float(numLeafs))/2.0/Plot_tree.totalW,

Plot_tree.yOff)

Plot_midtext(cntrPt, parentPt, nodeTxt)

Plot_node(firstStr, cntrPt, parentPt, decisionNode)

secondDict = myTree[firstStr]

Plot_tree.yOff = Plot_tree.yOff - 1.0/Plot_tree.totalD

for key in secondDict.keys():

if type(secondDict[key]).__name__=='dict':

Plot_tree(secondDict[key],cntrPt,str(key))

else:

Plot_tree.xOff = Plot_tree.xOff + 1.0/Plot_tree.totalW

Plot_node(secondDict[key], (Plot_tree.xOff, Plot_tree.yOff),

cntrPt, leafNode)

Plot_midtext((Plot_tree.xOff, Plot_tree.yOff), cntrPt, str(key))

Plot_tree.yOff = Plot_tree.yOff + 1.0/Plot_tree.totalD

def Create_plot (myTree):

fig = plt.figure(1, facecolor = 'white')

fig.clf()

axprops = dict(xticks=[], yticks=[])

Create_plot.ax1 = plt.subplot(111, frameon=False, **axprops)

Plot_tree.totalW = float(Get_numleafs(myTree))

Plot_tree.totalD = float(Get_treedepth(myTree))

Plot_tree.xOff = -0.5/Plot_tree.totalW; Plot_tree.yOff = 1.0;

Plot_tree(myTree, (0.5,1.0), '')

plt.show()

def Plot_midtext(cntrPt, parentPt, txtString):

xMid = (parentPt[0] - cntrPt[0]) / 2.0 + cntrPt[0]

yMid = (parentPt[1] - cntrPt[1]) / 2.0 + cntrPt[1]

Create_plot.ax1.text(xMid, yMid, txtString)

def Classify(myTree, featLabels, testVec):

firstStr = myTree.keys()[0]

secondDict = myTree[firstStr]

featIndex = featLabels.index(firstStr)

for key in secondDict.keys():

if testVec[featIndex] == key:

if type(secondDict[key]).__name__ == 'dict' :

classLabel = Classify(secondDict[key],featLabels,testVec)

else:

classLabel = secondDict[key]

return classLabel

最后，可以測(cè)試我們的構(gòu)造的決策樹分類器：

decisionNode = dict(boxstyle=sawtooth, fc=0.8)

leafNode = dict(boxstyle=round4, fc=0.8)

arrow_args = dict(arrowstyle=-)

myData, featName = Create_data()

S_entrpy = Cal_entrpy(myData)

new_data = Split_dataset(myData, 0, 1)

best_feat = Choose_feature(myData)

myTree = Create_tree(myData, featName[:])

num_leafs = Get_numleafs(myTree)

depth = Get_treedepth(myTree)

Create_plot(myTree)

predict_label = Classify(myTree, featName, [1, 0])

print(the predict label is: , predict_label)

print(the decision tree is: , myTree)

print(the best feature index is: , best_feat)

print(the new dataset: , new_data)

print(the original dataset: , myData)

print(the feature names are: , featName)

print(the entrpy is:, S_entrpy)

print(the number of leafs is: , num_leafs)

print(the dpeth is: , depth)

print(All is well.)

構(gòu)造的決策樹最后如下所示：

<a target='_blank'><img src='https://ad.eepw.com.cn/www/delivery/avw.php?zoneid=114&cb=INSERT_RANDOM_NUMBER_HERE&n=a7a83b30' border='0' alt='' /></a>
<a target='_blank'><img src='https://ad.eepw.com.cn/www/delivery/avw.php?zoneid=115&cb=INSERT_RANDOM_NUMBER_HERE&n=a3d98779' border='0' alt='' /></a>
<a target='_blank'><img src='https://ad.eepw.com.cn/www/delivery/avw.php?zoneid=116&cb=INSERT_RANDOM_NUMBER_HERE&n=abca108c' border='0' alt='' /></a>
<a target='_blank'><img src='https://ad.eepw.com.cn/www/delivery/avw.php?zoneid=117&cb=INSERT_RANDOM_NUMBER_HERE&n=a1775170' border='0' alt='' /></a>
<a target='_blank'><img src='https://ad.eepw.com.cn/www/delivery/avw.php?zoneid=118&cb=INSERT_RANDOM_NUMBER_HERE&n=a449048b' border='0' alt='' /></a>

關(guān)鍵詞：

評(píng)論

相關(guān)推薦

DAC811右對(duì)齊數(shù)據(jù)格式的尋址電路連接

設(shè)計(jì)方案 DAC811 對(duì)齊數(shù)據(jù) 格式尋址連接 | 2009-07-06

足球芯片在歐洲杯上“開掛了”！

物聯(lián)網(wǎng)與傳感器足球芯片傳感器 | 2024-06-26

意法半導(dǎo)體發(fā)布高能效智能慣性測(cè)量單元，提供工業(yè)產(chǎn)品壽命保證

物聯(lián)網(wǎng)與傳感器意法半導(dǎo)體慣性測(cè)量單元 IMU | 2024-06-25

同步數(shù)字體制(SDH)數(shù)字交叉連接(SDXC)矩陣的設(shè)計(jì)原理

資源下載 | 2007-02-14

應(yīng)用的RF4CE協(xié)議的射頻遙控器系統(tǒng)

視頻 Renesas RF4CE 射頻遙控 | 2012-07-03

討論一下ULIP下的開發(fā)

leon3867 | 2004-12-24

智能家居解決方案

視頻 Renesas 智能家居 | 2012-07-03

180度E-Bike解決方案（基于78K0R MCU）

視頻 Renesas E-Bike | 2012-07-03

哪位朋友分析一下linux/arch/arm下的代碼

NewMoreOne | 2004-12-25

圖像匹配最大互相關(guān)算法的專用ASIC硬件實(shí)現(xiàn)方法研究

資源下載 | 2007-02-14

BOE（京東方）亮相青島國際顯示大會(huì) “屏之物聯(lián)”引領(lǐng)產(chǎn)業(yè)創(chuàng)新升維發(fā)展

光電顯示 BOE 京東方青島國際顯示大會(huì) 屏之物聯(lián) | 2024-06-25

在vxworks下有sniffer工具嗎

leon3867 | 2004-12-25

U盤電路原理圖

設(shè)計(jì)方案原理 | 2009-07-06

圖象處理技術(shù)在車牌識(shí)別中的應(yīng)用

資源下載 | 2007-02-14

微機(jī)數(shù)字觸發(fā)器的研制

資源下載 | 2007-02-14

OpenAI終止對(duì)中國提供API服務(wù)，國產(chǎn)大模型發(fā)布應(yīng)對(duì)方案

智能計(jì)算 openAI API 大模型 | 2024-06-26

網(wǎng)上GIF格式中的圖像和圖形圖片篩選

資源下載 | 2007-02-14

DAC0832與8031的雙緩沖器同步方式接口

設(shè)計(jì)方案 DAC0832 緩沖器同步方式接口 | 2009-07-06

串口驅(qū)動(dòng)疑問？？？

herry01 | 2004-12-25

英特爾中國開源技術(shù)委員會(huì)一周年：多元驅(qū)動(dòng)，共筑繁榮生態(tài)

智能計(jì)算英特爾開源英特爾中國開源技術(shù)委員會(huì) | 2024-06-25

有幾個(gè)人讀懂了linux/arch/arm下的代碼？

NewMoreOne | 2004-12-25

臺(tái)達(dá)電子與TI成立創(chuàng)新聯(lián)合實(shí)驗(yàn)室，瞄準(zhǔn)GaN

電源與新能源臺(tái)達(dá)電子 TI GaN | 2024-06-26

奕斯偉計(jì)算公司在最新的RISC-V邊緣計(jì)算SoC中將SiFive CPU、Imagination GPU和自有NPU結(jié)合集成

智能計(jì)算奕斯偉 RISC-V SiFive CPU Imagination GPU NPU | 2024-06-25

DAC811和4位微機(jī)的接口圖

設(shè)計(jì)方案 DAC811 微機(jī) 接口 | 2009-07-06

中國商用車品牌福田計(jì)劃在巴西引進(jìn)五款車型并在當(dāng)?shù)厣a(chǎn)

國際視野福田巴西 | 2024-06-25

周鴻祎：華為鴻蒙OS一定會(huì)成功因?yàn)檫@三點(diǎn)原因

嵌入式系統(tǒng) 鴻蒙os 華為操作系統(tǒng) 周鴻祎 | 2024-06-26

180度E-Bike解決方案（基于R8C MCU）

視頻 Renesas E-Bike MCU | 2012-07-03

78K0微波爐解決方案

視頻 Renesas 微波爐 78K0 | 2012-07-03

AD7520可克服“毛刺”現(xiàn)象的接口電路

設(shè)計(jì)方案 AD7520 克服毛刺現(xiàn)象接口 | 2009-07-06

實(shí)例分享！BMS采樣板針對(duì)低頻磁場(chǎng)抗擾測(cè)試解決方案

測(cè)試測(cè)量 BMS 抗擾測(cè)試 | 2024-06-26

焦點(diǎn)

推薦視頻

技術(shù)專區(qū)

看屁屁www成人影院,亚洲人妻成人图片,亚洲精品成人午夜在线,日韩在线欧美成人 (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })();