資源描述:
《決策樹生成原理》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在工程資料-天天文庫。
1、決策樹生成原理AbstractThispaperdetailstheID3classificationalgorithm.Verysimply,ID3buildsadecisiontreefromafixedsetofexamples.Theresultingtreeisusedtoclassifyfuturesamples.Theexamplehasseveralattributesandbelongstoaclass(likeyesorno).Theleafnodesofthedecisiont
2、reecontaintheclassnamewhereasanon-leafnodeisadecisionnode?Thedecisionnodeisanattributetestwitheachbranch(toanotherdecisiontree)beingapossiblevalueoftheattribute.ID3usesinformationgaintohelpitdecidewhichattributegoesintoadecisionnode?Theadvantageoflearn
3、ingadecisiontreeisthataprogram,ratherthanaknowledgeengineer,elicitsknowledgefromanexpert?IntroductionJ.RossQuinlanoriginallydevelopedID3attheUniversityofSydney?HefirstpresentedID3in1975inabook,MachineLearning,vol.1,no.1.ID3isbasedofftheConceptLearningS
4、ystem(CLS)algorithm.ThebasicCLSalgorithmoverasetoftraininginstancesC:Step1:IfallinstancesinCarcpositive,thencreateYESnodeandhalt.IfallinstancesinCarenegative,createaNOnodeandhalt.Otherwiseselectafeature,Fwithvaluesvl,vnandcreateadecisionnode?Step2:Part
5、itionthetraininginstancesinCintosubsetsCl,C2,…,CnaccordingtothevaluesofV.Step3:applythealgorithmrecursivelytoeachofthesetsCi.Note,thetrainer(theexpert)decideswhichfeaturetoselect.ID3improvesonCLSbyaddingafeatureselectionheuristic?ID3searchesthroughthea
6、ttributesofthetraininginstancesandextractstheattributethatbestseparatesthegivenexamples?IftheattributeperfectlyclassifiesthetrainingsetsthenID3stops;otherwiseitrecursivelyoperatesonthen(wheren=numberofpossiblevaluesofanattribute)partitionedsubsetstoget
7、their”best”attribute.Thealgorithmusesagreedysearch,thatis,itpicksthebestattributeandneverlooksbacktoreconsiderearlierchoices?DiscussionID3isanonincrementalalgorithm,meaningitderivesitsclassesfromafixedsetoftraininginstances-Anincrementalalgorithmrevise
8、sthecurrentconceptdefinition,ifnecessary,withanewsample?Theclassescreatedby1D3areinductive,thatis,givenasmallsetoftraininginstances,thespecificclassescreatedbyID3areexpectedtoworkforallfutureinstances?Thedistributionoftheunknownsmustbet