資源描述:
《hadoop-數(shù)據(jù)挖掘研究組》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在行業(yè)資料-天天文庫。
1、HadoopIntroducingInstallationandConfiguration數(shù)據(jù)挖掘研究組DataMiningGroup@XiamenUniversityADistributeddata-intensiveProgrammingFrameworkHDFSMapReduceHadoopDistributedstorageParallelcomputing數(shù)據(jù)挖掘研究組DataMiningGroup@XiamenUniversityIntroducingtoHDFSHadoopDistributedFileSystem(HDFS)Anopen-sourceimp
2、lementationofGFShasmanysimilaritieswithdistributedfilesystems.However,comesdifferenceswithit.HDFSishighlyfault-tolerantandisdesignedtobedeployedonlow-costhardware.HDFSprovideshighthroughputaccesstoapplicationdataandissuitableforapplicationsthathavelargedatasets.數(shù)據(jù)挖掘研究組DataMiningGroup@Xiame
3、nUniversityHowitworks?FeaturesofitAnimportantfeatureofthedesign:dataisnevermovedthroughthenamenode.Instead,alldatatransferoccursdirectlybetweenclientsanddatanodes數(shù)據(jù)挖掘研究組DataMiningGroup@XiamenUniversityMapReduce?Let’stalkitnexttime………數(shù)據(jù)挖掘研究組DataMiningGroup@XiamenUniversity“RunningHadoop?”W
4、hatmeansforit?“RunningHadoop”meansrunningasetofdaemons.NameNodeDataNodeSecondaryNameNodeJobTrackerTaskTracker數(shù)據(jù)挖掘研究組DataMiningGroup@XiamenUniversityWhoWorksforwho?HDFSMapReduceHadoopNameNodeSecNDTaskTrackerJobTrackerDataNodeNameNodeHadoopemploysamaster/slavearchitectureforbothdistributeds
5、torageanddistributedcomputation.NameNodeisthemasterofHDFSthatdirectstheslaveDataNodedaemonstoperformthelow-levelI/OtasksNameNodeisthebookkeeperofHDFSkeepstrackofhowyour?lesarebrokendowninto?leblockskeepstrackoftheoverallhealthofthedistributed?lesystemDataNodereadingandwritingHDFSblocksforc
6、lientscommunicatewithotherDataNodestoreplicateitsdatablocksforredundancy數(shù)據(jù)挖掘研究組DataMiningGroup@XiamenUniversityNameNodeandDataNodeSecondaryNameNodeSNNisanassistantdaemonformonitoringthestateoftheclusterHDFSdiffersfromtheNameNodeinthatthisprocessdoesn’treceiveorrecordanyreal-timechangestoHD
7、FScommunicateswiththeNameNodetotakesnapshotsoftheHDFSmetadataRecovery:NameNodefailure????Werecon?guretheclustertousetheSNNastheprimaryNameNodeJobTrackertheliaisonbetweenyourapplicationandHadoopsubmityourcodetoyourcluster,theJobTrackerdeterminestheexecutionplan