資源描述:
《Introduction to Data Mining》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、DataMining:IntroductionLectureNotesforChapter1IntroductiontoDataMiningbyTan,Steinbach,Kumar?Tan,Steinbach,KumarIntroductiontoDataMining4/18/2004?#?WhyMineData?CommercialViewpoint?Lotsofdataisbeingcollectedandwarehoused–Webdata,e-commerce–purchasesatdepartment/groce
2、rystores–Bank/CreditCardtransactions?Computershavebecomecheaperandmorepowerful?CompetitivePressureisStrong–Providebetter,customizedservicesforanedge(e.g.inCustomerRelationshipManagement)?Tan,Steinbach,KumarIntroductiontoDataMining4/18/2004?#?WhyMineData?ScientificV
3、iewpoint?Datacollectedandstoredatenormousspeeds(GB/hour)–remotesensorsonasatellite–telescopesscanningtheskies–microarraysgeneratinggeneexpressiondata–scientificsimulationsgeneratingterabytesofdata?Traditionaltechniquesinfeasibleforrawdata?Dataminingmayhelpscientist
4、s–inclassifyingandsegmentingdata–inHypothesisFormationMiningLargeDataSets-Motivation?Thereisofteninformation“hidden”inthedatathatisnotreadilyevident?Humananalystsmaytakeweekstodiscoverusefulinformation?Muchofthedataisneveranalyzedatall4,000,0003,500,0003,000,000The
5、DataGap2,500,0002,000,000Totalnewdisk(TB)since19951,500,0001,000,000Numberof500,000analysts019951996199719981999?Tan,Steinbach,KumarFrom:R.Grossman,C.Kamath,V.Kumar,“DataMiningforScientificandEngineeringApplications”IntroductiontoDataMining4/18/2004?#?WhatisDataMin
6、ing??ManyDefinitions–Non-trivialextractionofimplicit,previouslyunknownandpotentiallyusefulinformationfromdata–Exploration&analysis,byautomaticorsemi-automaticmeans,oflargequantitiesofdatainordertodiscovermeaningfulpatterns?Tan,Steinbach,KumarIntroductiontoDataMinin
7、g4/18/2004?#?Whatis(not)DataMining??WhatisnotData?WhatisDataMining?Mining?–Lookupphone–CertainnamesaremorenumberinphoneprevalentincertainUSdirectorylocations(O’Brien,O’Rurke,O’Reilly…inBostonarea)–QueryaWeb–Grouptogethersimilarsearchenginefordocumentsreturnedbyinfo
8、rmationaboutsearchengineaccordingto“Amazon”theircontext(e.g.Amazonrainforest,Amazon.com,)?Tan,Steinbach,KumarIntroductiontoDataMining4/18/2004?#?