資源描述:
《樸素貝葉斯分類算法的改進(jìn)研究research onnaive bayesian classifieralgorithm》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、樸素貝葉斯分類算法的改進(jìn)研究ResearchonNaiveBayesianClassifierAlgorithm-->樸素貝葉斯分類算法的改進(jìn)研究ResearchonNaiveBayesianClassifierAlgorithm摘要NBC模型具有計算簡單,分類性能優(yōu)越等特點,而受到各類科學(xué)工的青睞,成為目前應(yīng)用最廣泛的分類器之一關(guān)于其應(yīng)用和研究也成為一個熱點。然而,在實際應(yīng)用中,條件獨立性的假設(shè)難以得到滿足,削弱NBC模型的分類效果。本文針對不同的數(shù)據(jù)類型分別從特征變量的提取和特征變量的篩選的角度提出了樸素貝葉斯分類器的兩種改進(jìn)模型:基于費希爾
2、判別的樸素貝葉斯分類模型和基于R型聚類的樸素貝葉斯分類模型?;谫M希爾判別的樸素貝葉斯分類模型FI-NBC,利用費希爾判別提取獨立特征的性質(zhì),對原來的屬性集做費希爾判別,萃取判別式,構(gòu)建近似滿足獨立性假設(shè)的新屬性集,使用NBC模型對新的屬性集進(jìn)行分類。通過UCI數(shù)據(jù)集上的對照實驗,結(jié)果表明:FI-NBC分類模型相對于NBC模型而言具有較好的分類效果?;谙嚓P(guān)性測度和R型聚類的樸素貝葉斯分類模型RC-NBC,首先利用本文定義的相關(guān)性測度作為屬性間的相似系數(shù)對R型聚類做了改進(jìn),利用改進(jìn)的R型聚類方法將原屬性集劃分為若干子集,從每個子集中挑選典型屬性構(gòu)
3、建新的屬性集,用NBC模型對新的數(shù)據(jù)集進(jìn)行分類,實驗結(jié)果表明提高了分類準(zhǔn)確率。關(guān)鍵詞:數(shù)據(jù)挖掘;樸素貝葉斯分類;費希爾判別;R型聚類;互信息[Abstract]NaïveBayesianclassifierptionofconditionattributesindependentofeachother,plestructure,highclassificationaccuracy,littleconsumptionofrunningtimeandstoragespaceandsolidtheoreticalfoundationofmat
4、hematics,isoneoftheefficientclassifiers.Therefore,theresearchandapplicationofnaiveBayesianclassifierispopularnoanypracticalcases,theperformanceofnaïvebayesianclassifierisaffectedfortheviolationoftheassumptionofconditionalindependence.Tprovedclassifiers,naivebayesianclass
5、ifierbasedonfisherdiscriminantanalysisandnaivebayesianclassifierbasedonmutualinformationandR-typeclusteringanalysesareproposedfromtheperspectiveoffeatureselectionfordatasetsofdifferenttypes.NaïveBayesianclassifierbasedonfisherdiscriminantanalysis,FI-NBC,constructsnetheor
6、iginalpropertysetusingfisherdiscriminantanalysis.Naivebayesianclassifierisbuiltontheneeetstheassumptionofconditionalindependenceapproximately.TheexperimentalresultsonUCIdatasetsshoanceofFI-NBCisbetterthannaivebayesianclassifieronthefeasibledataset.Naïvebayesianclassifier
7、basedonmutualinformationandR-typeclusteringanalyses,RC-NBC,changestheR-typeclusteringbymeasuresthecorrelationofpropertiesthroughmutualinformation.TheoriginattributesetisclassifiedintosomeindependentattributesubsetsbythchangedR-typeclustering.Selectonetypicalattributesfromeach
8、sub-constructtoformaneentsonUCIdatasetsshoanceofRC-NBCimprovessignif