資源描述:
《Hadoop Framework for Data 》由會員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、1Hadoop:AFrameworkforData-IntensiveDistributedComputingCS561-Spring2012WPI,MohamedY.Eltabakh2WhatisHadoop??Hadoopisasoftwareframeworkfordistributedprocessingoflargedatasetsacrosslargeclustersofcomputers?Hadoopisopen-sourceimplementationforGoogleMapReduce?Hadoopisbasedonasimpleprogrammin
2、gmodelcalledMapReduce?Hadoopisbasedonasimpledatamodel,anydatawillfit?Hadoopframeworkconsistsontwomainlayers?Distributedfilesystem(HDFS)?Executionengine(MapReduce)3HadoopInfrastructure?Hadoopisadistributedsystemlikedistributeddatabases?However,thereareseveralkeydifferencesbetweenthetwoin
3、frastructures?Datamodel?Computingmodel?Costmodel?Designobjectives4HowDataModelisDifferent?DistributedDatabasesHadoop?Dealwithtablesandrelations?Dealwithflatfilesinanyformat?Musthaveaschemafordata?Noschemafordata?Datafragmentation&partitioning?Filesaredivideautomaticallyintoblocks5HowCom
4、putingModelisDifferent?DistributedDatabasesHadoop?Notionofatransaction?TransactionpropertiesACID?Notionofajobdividedintotasks?Map-Reducecomputingmodel?Distributedtransaction?Everytaskiseitheramaporreduce6Hadoop:BigPictureHigh-levellanguagesExecutionengineDistributedlight-weightDBCentral
5、izedtoolforcoordinationDistributedFilesystemHDFS+MapReduceareenoughtohavethingsworking7WhatisNext??HadoopDistributedFileSystem(HDFS)?MapReduceLayer?Examples?WordCount?Join?FaultToleranceinHadoop8HDFS:HadoopDistributedFileSystem!Singlenamenodeandmanydatanodes!Namenodemaintainsthefilesyst
6、emmetadata!Filesaresplitintofixedsizedblocksandstoredondatanodes(Default64MB)!Datablocksarereplicatedforfaulttoleranceandfastaccess(Defaultis3)!Datanodesperiodicallysendheartbeatstonamenode?HDFSisamaster-slavearchitecture?Master:namenode?Slaves:datanodes(100sor1000sofnodes)9HDFS:DataPla
7、cementandReplicationDatanodescanbeorganizedintoracks?Defaultplacementpolicy:Wheretoputagivenblock??Firstcopyiswrittentothenodecreatingthefile(writeaffinity)?Secondcopyiswrittentoadatanodewithinthesamerack?Thirdcopyiswrittentoadatanodeinadifferentrack?Objectives:loadbalancing,fa