machine translation introduction

machine translation introduction

ID:39716889

大小:2.30 MB

頁數(shù):64頁

時間:2019-07-10

machine translation introduction_第1頁
machine translation introduction_第2頁
machine translation introduction_第3頁
machine translation introduction_第4頁
machine translation introduction_第5頁
資源描述:

《machine translation introduction》由會員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。

1、SpellCorrection&MachineTranslationContent?SpellCorrection?MachineTranslationIntroduction?StatisticalMachineTranslation:IBMModels?Phrase-BasedTranslationModelsSpellCorrectionIntroduction?Givenaword,wearetryingtochoosethemostlikelyspellingcorrectionfor

2、thatword(the"correction"maybetheoriginalworditself).Thereisnowaytoknowforsure(forexample,should"lates"becorrectedto"late"or"latest"?),whichsuggestsweuseprobabilities.Wewillsaythatwearetryingtofindthecorrectionc,outofallpossiblecorrections,thatmaximiz

3、estheprobabilityofcgiventheoriginalwordw:?argmaxcP(c

4、w)?ByBayes'Theoremthisisequivalentto:?argmaxcP(w

5、c)P(c)/P(w)?SinceP(w)isthesameforeverypossiblec,wecanignoreit,giving:?argmaxcP(w

6、c)P(c)SpellCorrectionIntroduction?Therearethreepartsofthisexpressio

7、n.?P(c),theprobabilitythataproposedcorrectioncstandsonitsown.Thisiscalledthelanguagemodel?SoP("the")wouldhavearelativelyhighprobability,whileP("zxzxzxzyyy")wouldbenearzero.?P(w

8、c),theprobabilitythatwwouldbetypedinatextwhentheauthormeantc.Thisistheerr

9、ormodel?argmaxc,thecontrolmechanism,whichsaystoenumerateallfeasiblevaluesofc,andthenchoosetheonethatgivesthebestcombinedprobabilityscore.Howtowork?P(c)?1Wewillreadabigtextfile,big.txt,whichconsistsofaboutamillionwords?2extracttheindividualwordsfromth

10、efile?3trainaprobabilitymodel,whichisafancywayofsayingwecounthowmanytimeseachwordoccurs?enumeratingthepossiblecorrectionscofagivenwordw?editdistance:thenumberofeditsitwouldtaketoturnoneintotheother?Theliteratureonspellingcorrectionclaimsthat80to95%of

11、spellingerrorsareaneditdistanceof1?Forawordoflengthn,therewillbendeletions,n-1transpositions,26nalterations,and26(n+1)insertions,foratotalof54n+25(ofwhichafewaretypicallyduplicates).Howtowork?P(w

12、c)?mistakingonevowelforanotherismoreprobablethanmistak

13、ingtwoconsonants;makinganerroronthefirstletterofawordislessprobable,etc.?definedatrivialmodelthatsaysallknownwordsofeditdistance1areinfinitelymoreprobablethanknownwordsofeditdistance2,andinfinitelylessprobablethanaknownwordofeditdistance0.?Thefunctio

當(dāng)前文檔最多預(yù)覽五頁,下載文檔查看全文

此文檔下載收益歸作者所有

當(dāng)前文檔最多預(yù)覽五頁,下載文檔查看全文
溫馨提示:
1. 部分包含數(shù)學(xué)公式或PPT動畫的文件,查看預(yù)覽時可能會顯示錯亂或異常,文件下載后無此問題,請放心下載。
2. 本文檔由用戶上傳,版權(quán)歸屬用戶,天天文庫負(fù)責(zé)整理代發(fā)布。如果您對本文檔版權(quán)有爭議請及時聯(lián)系客服。
3. 下載前請仔細(xì)閱讀文檔內(nèi)容,確認(rèn)文檔內(nèi)容符合您的需求后進(jìn)行下載,若出現(xiàn)內(nèi)容與標(biāo)題不符可向本站投訴處理。
4. 下載文檔時可能由于網(wǎng)絡(luò)波動等原因無法下載或下載錯誤,付費(fèi)完成后未能成功下載的用戶請聯(lián)系客服處理。