資源描述:
《Uncertain Data Mining- An Example in Clustering Location Data》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、UncertainDataMining:AnExampleinClusteringLocationDataMichaelChau1,ReynoldCheng2,BenKao3,andJackeyNg11SchoolofBusiness,TheUniversityofHongKong,Pokfulam,HongKongmchau@business.hku.hk,jackeyng@hkusua.hku.hk2DepartmentofComputing,HongKongPolytechnicUniversity
2、,Kowloon,HongKongcsckcheng@comp.polyu.edu.hk3DepartmentofComputerScience,TheUniversityofHongKong,Pokfulam,HongKongkao@cs.hku.hkAbstract.Datauncertaintyisaninherentpropertyinvariousapplicationsduetoreasonssuchasoutdatedsourcesorimprecisemeasurement.Whendat
3、amin-ingtechniquesareappliedtothesedata,theiruncertaintyhastobeconsideredtoobtainhighqualityresults.WepresentUK-meansclustering,analgorithmthatenhancestheK-meansalgorithmtohandledatauncertainty.WeapplyUK-meanstotheparticularpatternofmoving-objectuncertain
4、ty.Experimentalre-sultsshowthatbyconsideringuncertainty,aclusteringalgorithmcanproducemoreaccurateresults.1IntroductionInapplicationsthatrequireinteractionwiththephysicalworld,suchaslocation-basedservices[6]andsensormonitoring[3],datauncertaintyisaninhere
5、ntpropertyduetomeasurementinaccuracy,samplingdiscrepancy,outdateddatasources,orotherer-rors.Althoughmuchresearchefforthasbeendirectedtowardsthemanagementofuncertaindataindatabases,fewresearchershaveaddressedtheissueofminingun-certaindata.Wenotethatwithunc
6、ertainty,datavaluesarenolongeratomic.Toap-plytraditionaldataminingtechniques,uncertaindatahastobesummarizedintoatomicvalues.Unfortunately,discrepancyinthesummarizedrecordedvaluesandtheactualvaluescouldseriouslyaffectthequalityoftheminingresults.Figure1ill
7、us-tratesthisproblemwhenaclusteringalgorithmisappliedtomovingobjectswithlocationuncertainty.Ifwesolelyrelyontherecordedvalues,manyobjectscouldpossiblybeputintowrongclusters.Evenworse,eachmemberofaclusterwouldchangetheclustercentroids,thusresultinginmoreer
8、rors.Fig.1.(a)Thereal-worlddataarepartitionedintothreeclusters(a,b,c).(b)Therecordedlocationsofsomeobjects(shaded)arenotthesameastheirtruelocation,thuscreatingclustersa’,b’,c’andc’’.(c)Whenlineuncertaintyisconsidere