資源描述:
《基于大數(shù)據(jù)架構(gòu)的數(shù)據(jù)采集與存儲(chǔ)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫(kù)。
1、分類號(hào)學(xué)號(hào)M201676111學(xué)校代碼10487密級(jí)1碩士學(xué)位論文基于大數(shù)據(jù)架構(gòu)的數(shù)據(jù)采集與存儲(chǔ)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)學(xué)位申請(qǐng)人:湯如學(xué)科專業(yè):軟件工程指導(dǎo)教師:方少紅副教授答辯日期:2018.12.17AThesisSubmittedinPartialFulfillmentoftheRequirementsfortheDegreeofMasterofEngineeringDesignandImplementationofDataCollectionandStorageSystemBasedonBigDataArchitectureCandidate:TangRuMajor:SoftwareEngi
2、neeringSupervisor:Assoc.Prof.FangShaohongHuazhongUniversityofScience&TechnologyWuhan430074,P.R.ChinaDecember,2018華中科技大學(xué)碩士學(xué)位論文摘要在web3.0的時(shí)代下,網(wǎng)民與互聯(lián)網(wǎng)之間在生活的各個(gè)層面緊密聯(lián)結(jié),各類信息數(shù)據(jù)呈指數(shù)型增長(zhǎng),數(shù)據(jù)影響到各行各業(yè),人工智能飛速發(fā)展。而數(shù)據(jù)作為AI智能化的基石,呈現(xiàn)的形式往往又是雜亂無(wú)章的,這就使得數(shù)據(jù)分析或是AI智能化模型訓(xùn)練的難度大大提高。因此,對(duì)數(shù)據(jù)規(guī)整化的采集與存儲(chǔ)對(duì)于數(shù)據(jù)分析與人工智能化的發(fā)展,迫在眉睫。首先,對(duì)相關(guān)領(lǐng)域的國(guó)內(nèi)外研究現(xiàn)狀
3、進(jìn)行分析,針對(duì)目前已有的存儲(chǔ)中間件架構(gòu)的不足,提出在現(xiàn)實(shí)應(yīng)用生產(chǎn)中采集與存儲(chǔ)的需求,并針對(duì)這一系列需求提出了基于大數(shù)據(jù)架構(gòu)的數(shù)據(jù)采集與存儲(chǔ)的系統(tǒng)。其次,研究了數(shù)據(jù)采集的關(guān)鍵技術(shù),對(duì)已有框架技術(shù)以及中間件進(jìn)行選型。提出了服務(wù)間數(shù)據(jù)對(duì)接與爬蟲(chóng)主動(dòng)爬取倆種方案進(jìn)行數(shù)據(jù)采集。針對(duì)數(shù)據(jù)清洗與存儲(chǔ)速度的不一致性提出采用消息隊(duì)列中間件方式,進(jìn)行高峰時(shí)期數(shù)據(jù)量請(qǐng)求消峰。然后,本系統(tǒng)對(duì)于正式生產(chǎn)環(huán)境中系統(tǒng)的高可用性需求,針對(duì)現(xiàn)有存儲(chǔ)中間件進(jìn)行設(shè)計(jì),采用分布式與數(shù)據(jù)回寫鉤子,進(jìn)行了宕機(jī)策略設(shè)計(jì),保證系統(tǒng)的高可用性,全天24小時(shí)不間斷服務(wù),數(shù)據(jù)至少3天不丟失保證數(shù)據(jù)的可靠性與服務(wù)的穩(wěn)定性。最后,采用SpringBo
4、ot框架搭建實(shí)現(xiàn)了這樣一個(gè)基于大數(shù)據(jù)架構(gòu)的數(shù)據(jù)采集與存儲(chǔ)的web系統(tǒng)。綜上,對(duì)基于大數(shù)據(jù)架構(gòu)的數(shù)據(jù)采集與存儲(chǔ)技術(shù)研究,在已有的大數(shù)據(jù)存儲(chǔ)的中間件的基礎(chǔ)上實(shí)現(xiàn)了一個(gè)大流量,高并發(fā),高可用的數(shù)據(jù)采集與存儲(chǔ)系統(tǒng)。關(guān)鍵詞:數(shù)據(jù)采集數(shù)據(jù)存儲(chǔ)高可用性穩(wěn)定性I華中科技大學(xué)碩士學(xué)位論文AbstractNetizensandwebarecloselyconnectioninalloverthelevelsintheeraofInternet3.0.Variouskindsofinformationareexplosivegrowing,dataisinfiltratedintoallwalksoflife,an
5、dartificialintelligenceisdevelopingrapidly.Thedataswhictisthebaseofartificialintelligenceisofteninamess,whichmakesthedifficultyofdataanalysisorAIintelligentmodeltraininggreatlyimproved.Therefore,thecollectionandstorageofdataregularizationisurgentforthedevelopmentofdataanalysisandartificialintelligen
6、ce.Atfirst,itanalyzestheresearchstatusinrelatedfieldsathomeandabroad,andputsforwardtherequirementsofacquisitionandstorageinreal-worldapplicationproductionfortheexistingshortageofstoragemiddlewarearchitecture,andproposesabigdataarchitecturebasedonthisseriesofrequirements——Datacollectionandstoragesyst
7、em.Secondly,thekeytechnologiesofdataacquisitionarestudied,andtheexistingframeworktechnologyandmiddlewareareselected.Datacollectionisproposedfordatadockingandcrawling.Inviewoftheinconsistencybetweendat