資源描述:
《Automatic CPU-GPU CommunicationManagement and Optimization自動(dòng)CPU-GPU通信 管理與優(yōu)化》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫(kù)。
1、AutomaticCPU-GPUCommunicationManagementandOptimizationThomasB.JablinPrakashPrabhuJamesA.JablinyNickP.JohnsonStephenR.BeardDavidI.AugustPrincetonUniversity,Princeton,NJyBrownUniversity,Providence,RIftjablin,pprabhu,npjohnso,sbeard,augustg@cs.princeton.edujjablin@cs.brown.eduAb
2、stractmemories.Unfortunately,notallcommunicationmanagementisef?cient;cycliccommunicationpatternsarefrequentlyordersofTheperformancebene?tsofGPUparallelismcanbeenormous,magnitudeslowerthanacyclicpatterns[15].Transformingcyclicbutunlockingthisperformancepotentialischallenging.T
3、heap-communicationpatternstoacyclicpatternsiscalledOptimizingplicabilityandperformanceofGPUparallelizationsislimitedbyCommunication.Na¨?velycopyingdatatoGPUmemory,spawn-thecomplexitiesofCPU-GPUcommunication.ToaddresstheseingaGPUfunction,andcopyingtheresultsbacktoCPUmemorycomm
4、unicationsproblems,thispaperpresentsthe?rstfullyauto-yieldscycliccommunicationpatterns.CopyingdatatotheGPUmaticsystemformanagingandoptimizingCPU-GPUcommunca-inthepreheader,spawningmanyGPUfunctions,andcopyingthetion.Thissystem,calledtheCPU-GPUCommunicationMan-resultbacktoCPUme
5、moryintheloopexityieldsanacycliccom-ager(CGCM),consistsofarun-timelibraryandasetofcom-municationpattern.Incorrectcommunicationoptimizationcausespilertransformationsthatworktogethertomanageandoptimizeprogramstoaccessstaleorinconsistentdata.CPU-GPUcommunicationwithoutdependingo
6、nthestrengthofThispaperpresentsCPU-GPUCommunicationManagerstaticcompile-timeanalysesoronprogrammer-suppliedannota-(CGCM),the?rstfullyautomaticsystemformanagingandop-tions.CGCMeasesmanualGPUparallelizationsandimprovesthetimizingCPU-GPUcommunication.Automaticallymanagingandappl
7、icabilityandperformanceofautomaticGPUparallelizations.optimizingcommunicationincreasesprogrammeref?ciencyandFor24programs,CGCM-enabledautomaticGPUparallelizationprogramcorrectness.Italsoimprovestheapplicabilityandperfor-yieldsawholeprogramgeomeanspeedupof5.36xoverthebestmance
8、ofautomaticGPUparallelization.sequentialCPU-onlyexecution.CGCMmanage