資源描述:
《WEB日志和子空間聚類挖掘算法研究》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫(kù)。
1、華中科技大學(xué)博士學(xué)位論文WEB日志和子空間聚類挖掘算法研究姓名:胡蓉申請(qǐng)學(xué)位級(jí)別:博士專業(yè):計(jì)算機(jī)軟件與理論指導(dǎo)教師:盧炎生20080601華中科技大學(xué)博士學(xué)位論文查詢的結(jié)果,增加結(jié)果集的可控性,提高用戶的決策效率。在分析高維數(shù)據(jù)空間的子空間Skyline查詢存在的困難的基礎(chǔ)上,設(shè)計(jì)一種新穎而緊湊的結(jié)構(gòu)—子空間Skyline簇,通過在Skyline查詢中引入聚類算法,巧妙地結(jié)合子空間Skyline查詢的優(yōu)點(diǎn)和聚類技術(shù)各自的優(yōu)點(diǎn)。在一般的Skyline查詢算法中有幾點(diǎn)要求,即漸進(jìn)性、正確性、高效性、公正性、用戶友好性和可擴(kuò)展性?;谂判虻淖涌臻gSkyl
2、ine聚類算法SSSCM和基于閾值的子空間Skyline聚類算法TSSCM利用最近鄰居點(diǎn)以及排序?qū)kyline查詢的作用,并且受到top-k查詢算法的啟發(fā),滿足Skyline查詢的這幾點(diǎn)要求。在兩個(gè)真實(shí)數(shù)據(jù)集和兩個(gè)模擬數(shù)據(jù)集上進(jìn)行實(shí)驗(yàn),結(jié)果表明這兩種算法能夠高效地返回結(jié)果,TSSCM算法的性能更優(yōu)。關(guān)鍵詞:WEB日志挖掘,查詢?cè)~翻譯,子空間聚類,樣式相似性,Skyline查詢,子空間Skyline簇II華中科技大學(xué)博士學(xué)位論文AbstractDataminingistoidentifyvalid,novel,potentiallyusefuland
3、ultimatelyunderstandablepatternsindata.Withtherapiddevelopmentofinformationtechnologies,datagainedfrommanyfieldsaregrowingexponentiallyeveryday.Especially,largescaleandcomplexdataaregeneratedinmanyapplications,suchaswebapplications,naturalscience,andelectronicbusinessetc.Howtoh
4、elpusersextractknowlegefromthesedataeffectivelyisanurgentproblemthatshouldbesolved.Thus,ithasveryimportanttheoreticalandpracticalsignificancetoconsidertheneedofapplicationsandthedatacharacteristicsofdifferentfieldstodesigneffectiveminingalgorithmsforsuchlargescaleandhighdimensi
5、onaldata.Fortheproblemofminingtranslationsofwebqueriesfromwebclick-throughdata,theframeworkMTQCleveragesweblogsasaneffectivecorpustominewebquerytranslations.Basedontheanalysisofweblogswhicharecollectedfromtheinteractioninformationbetweenwebusersandsearchengines,MTQCfullyleverag
6、esthebilingualURLpairsandqueriesrelatedtotheseURLs.Itisatwo-stepminingprocess.First,itidentifiesbilingualURLpairs,thenitmatchesquerytranslationpairs.TwoalgorithmsnamedMTQC-1andMTQC-2arebasedontheframework.Theythushavemanygoodproperties,suchasrequirenocrawlingorwordssegmentation
7、,cancapturepopulartranslations,canextractsemanticallyrelevanttranslationstoimproveCross-LingualInformationRetrieval.Theexperimentsconductedinthelargescaleandrealclick-throughdatashowthatcomparedtothestate-of-the-arttranslationalogirthms,theproposedalgorithmsareeffectiveintransl
8、atingoutofvocabularyqueriesandpopularqueries.Forthepro