資源描述:
《Tuning a Monte Carlo Algorithm on GPUs》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、TechnicalNewsfromThePortlandGroupPGIHomePageFebruary2010TuningaMonteCarloAlgorithmonGPUsbyMathewColgrove,PGICustomerSupportEngineeringSincetheintroductionofPGICUDAFortranlatelastyear,we'veseenadramaticriseinthenumberofcustomersusingthisnewextensiontotheFortranlanguage.Asthemoderatorofthe
2、PGIUserForum,Ihavebeenverybusyansweringquestionsaboutthelanguage,andnotingthosequestionsthatseemtobeaskedoftenormaybeofinteresttothewidercommunity.ForthisinstallmentofthePGInsider,IhaveimplementedtheMonteCarloIntegrationalgorithmtohighlightsomeofthetips,tricks,andtrapsofprogrammingforthe
3、GPU.MonteCarloIntegrationFormysamplecodeIchoseasimpleMonteCarloIntegrationalgorithmtocomputetheapproximatevalueofpi.Thecodefirstcreatesalistofrandompointswithinasquare.Eachpointisevaluatedusingthefunction:f(x,y)=(x^2+y^2<1)?1:0Thepointsarethensummed.Theapproximatevalueofpicanthenbecalcul
4、atedbymultiplyingfourtimesthevolumeofthesquarewiththemeanvalueforf(x,y).Thecodeitselfisverysimpletounderstandandthealgorithmishighlyparallelbecauseeachf(x,y)calculationcanbeperformedindependently.Besidesthis,thealgorithmusesasumreductionandrequiresarandomnumbergenerator(RNG).Bothofwhichp
5、resentinterestingproblems.ForthefollowingexamplessourcecodeisavailablefordownloadfromthePGIwebsite.HereisabasicFortranimplementationofaMonteCarloIntegrationalgorithm.!Performthefunctionf(x,y)=(x^2+y^2<1)?1:0doi=1,NtempVal=X(i)*X(i)+Y(i)*Y(i)if(tempVal<1)thentemp(i)=1elsetemp(i)=0endifend
6、do!SumtheresultssumA=0sumSq=0doi=1,NsumA=sumA+temp(i)sumSq=sumSq+(temp(i)*temp(i))enddo!calulatethemeanmeanA=sumA/real(N);meanSq=sumSq/real(N);!approximatepiresults%estimate=meanA*volume*4results%variance=(meanSq-meanA*meanA)/(N-1)BaselinePerformanceFirst,let'sstartwiththehostversionofth
7、ecodetogetourbaselineperformance.Wecompiledthecodewithauto-parallelizationenabledandranwithfourthreads.Thesystemwe'reusingisanIntelCorei7(singlesocketfourcoreNehalem)withanattachedNVIDIAS1070(fourTeslaC1060cards).ThecompilerversionusedisPGI10.2.(Foralistingoftheflagsusedi