資源描述:
《數(shù)據(jù)挖掘關(guān)聯(lián)規(guī)則》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在教育資源-天天文庫。
1、Chapter4:MiningFrequentPatterns,AssociationandCorrelationsBasicconceptsandaroadmapScalablefrequentitemsetminingmethodsMiningvariouskindsofassociationrulesConstraint-basedassociationminingFromassociationtocorrelationanalysisMiningcolossalpatternsSummary2021/7/191DataMining:ConceptsandTechniquesWhat
2、IsFrequentPatternAnalysis?Frequentpattern:apattern(asetofitems,subsequences,substructures,etc.)thatoccursfrequentlyinadatasetFirstproposedbyAgrawal,Imielinski,andSwami[AIS93]inthecontextoffrequentitemsetsandassociationruleminingMotivation:FindinginherentregularitiesindataWhatproductswereoftenpurch
3、asedtogether?—Beeranddiapers?!WhatarethesubsequentpurchasesafterbuyingaPC?WhatkindsofDNAaresensitivetothisnewdrug?Canweautomaticallyclassifywebdocuments?ApplicationsBasketdataanalysis,cross-marketing,catalogdesign,salecampaignanalysis,Weblog(clickstream)analysis,andDNAsequenceanalysis.2021/7/192Da
4、taMining:ConceptsandTechniques關(guān)聯(lián)規(guī)則挖掘關(guān)聯(lián)規(guī)則挖掘的典型案例:購物籃問題在商場中擁有大量的商品(項目),如:牛奶、面包等,客戶將所購買的商品放入到自己的購物籃中。通過發(fā)現(xiàn)顧客放入購物籃中的不同商品之間的聯(lián)系,分析顧客的購買習(xí)慣哪些物品經(jīng)常被顧客購買?同一次購買中,哪些商品經(jīng)常會被一起購買?一般用戶的購買過程中是否存在一定的購買時間序列?具體應(yīng)用:利潤最大化商品貨架設(shè)計:更加適合客戶的購物路徑貨存安排:實現(xiàn)超市的零庫存管理用戶分類:提供個性化的服務(wù)2021/7/193DataMining:ConceptsandTechniques關(guān)聯(lián)規(guī)則挖掘簡單的說,關(guān)聯(lián)規(guī)
5、則挖掘就是發(fā)現(xiàn)大量數(shù)據(jù)中項集之間有趣的關(guān)聯(lián)在交易數(shù)據(jù)、關(guān)系數(shù)據(jù)或其他信息載體中,查找存在于項目集合或?qū)ο蠹现g的頻繁模式、關(guān)聯(lián)、相關(guān)性、或因果結(jié)構(gòu)。應(yīng)用購物籃分析、交叉銷售、產(chǎn)品目錄設(shè)計、聚集、分類等兩種策略:1。商品放近,增加銷量2。商品放遠(yuǎn),增加其他商品的銷量2021/7/194DataMining:ConceptsandTechniquesWhyIsFreq.PatternMiningImportant?Freq.pattern:AnintrinsicandimportantpropertyofdatasetsFoundationformanyessentialdatamining
6、tasksAssociation,correlation,andcausalityanalysisSequential,structural(e.g.,sub-graph)patternsPatternanalysisinspatiotemporal,multimedia,time-series,andstreamdataClassification:discriminative,frequentpatternanalysisClusteranalysis:frequentpattern-basedclusteringDatawarehousing:icebergcubeandcube-g
7、radientSemanticdatacompression:fasciclesBroadapplications2021/7/195DataMining:ConceptsandTechniques關(guān)聯(lián)規(guī)則挖掘形式化定義給定:?設(shè)I={i1,i2,…,im}是項(item)的集合。若干項的集合,稱為項集(ItemSets)?記D為交易(transaction)T(或事務(wù))的集合,這里交易T是項的集合,并且T?I。對應(yīng)每一