資源描述:
《machine translation introduction》由會員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、SpellCorrection&MachineTranslationContent?SpellCorrection?MachineTranslationIntroduction?StatisticalMachineTranslation:IBMModels?Phrase-BasedTranslationModelsSpellCorrectionIntroduction?Givenaword,wearetryingtochoosethemostlikelyspellingcorrectionfor
2、thatword(the"correction"maybetheoriginalworditself).Thereisnowaytoknowforsure(forexample,should"lates"becorrectedto"late"or"latest"?),whichsuggestsweuseprobabilities.Wewillsaythatwearetryingtofindthecorrectionc,outofallpossiblecorrections,thatmaximiz
3、estheprobabilityofcgiventheoriginalwordw:?argmaxcP(c
4、w)?ByBayes'Theoremthisisequivalentto:?argmaxcP(w
5、c)P(c)/P(w)?SinceP(w)isthesameforeverypossiblec,wecanignoreit,giving:?argmaxcP(w
6、c)P(c)SpellCorrectionIntroduction?Therearethreepartsofthisexpressio
7、n.?P(c),theprobabilitythataproposedcorrectioncstandsonitsown.Thisiscalledthelanguagemodel?SoP("the")wouldhavearelativelyhighprobability,whileP("zxzxzxzyyy")wouldbenearzero.?P(w
8、c),theprobabilitythatwwouldbetypedinatextwhentheauthormeantc.Thisistheerr
9、ormodel?argmaxc,thecontrolmechanism,whichsaystoenumerateallfeasiblevaluesofc,andthenchoosetheonethatgivesthebestcombinedprobabilityscore.Howtowork?P(c)?1Wewillreadabigtextfile,big.txt,whichconsistsofaboutamillionwords?2extracttheindividualwordsfromth
10、efile?3trainaprobabilitymodel,whichisafancywayofsayingwecounthowmanytimeseachwordoccurs?enumeratingthepossiblecorrectionscofagivenwordw?editdistance:thenumberofeditsitwouldtaketoturnoneintotheother?Theliteratureonspellingcorrectionclaimsthat80to95%of
11、spellingerrorsareaneditdistanceof1?Forawordoflengthn,therewillbendeletions,n-1transpositions,26nalterations,and26(n+1)insertions,foratotalof54n+25(ofwhichafewaretypicallyduplicates).Howtowork?P(w
12、c)?mistakingonevowelforanotherismoreprobablethanmistak
13、ingtwoconsonants;makinganerroronthefirstletterofawordislessprobable,etc.?definedatrivialmodelthatsaysallknownwordsofeditdistance1areinfinitelymoreprobablethanknownwordsofeditdistance2,andinfinitelylessprobablethanaknownwordofeditdistance0.?Thefunctio