資源描述:
《Lecture 5 - Text Classification》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫(kù)。
1、DeepLearningforNaturalLanguageProcessingTextClassicationKarlMoritzHermannkmh@google.com7Feb2017OverviewThislecturediscussestextclassication.Aspartofthatwe'lldiscussthefollowingGenerativeanddiscriminativemodelsNa?veBayesLogisticregressionTextrepresentation(BOW,Features,RNNs,Convo
2、lutions)SoftmaxClassiersPracticalAspectsofClassierTrainingImportantMessageGoodDay,MynameisDrWilliamMonroe,astainthePrivateClientsSectionofawell-knownbank,hereinLondon,England.Oneofouraccounts,withholdingbalanceof$15,000,000hasbeendormantandlastoperatedthreeyearsago.Frommyinvestiga
3、tions,theownerofthesaidaccount,JohnShumejdadiedonthe4thofJanuary2002inaplanecrash.Ihavedecidedtondareliableforeignpartnertodealwith.Ithereforeproposetodobusinesswithyou,standinginasthenextofkinofthesefundsfromthedeceased.Thistransactionistotallyfreeofriskandtroublesasthefundislegitima
4、teanddoesnotoriginatefromdrug,moneylaundryorterrorism.Onyourinterest,letmehearfromyouURGENTLY.BestRegards,DrWilliamMonroeFinancialAnalysisandRemittanceManagerWhyClassify?ClassicationTasksIsthise-mailspam?Positiveornegativereview?Whatisthetopicofthisarticle?Predicthashtagsforatweet
5、Age/genderidenticationLanguageidenticationSentimentanalysis...TypesofClassicationTasksBinaryclassication(true,false)Multi-classclassication(politics,sports,gossip)Multi-labelclassication(#party#FRIDAY#fail)Clustering(labelsunknown)ClassicationMethods1Byhand2Rule-based3St
6、atisticalClassicationMethods1ByhandE.g.Yahoointheolddays4Veryaccurateandconsistentassumingexperts8Superslow,expensive,doesnotscale2Rule-basedE.g.Advancedsearchcriteria("site:ox.ac.uk")4Accuracyhighifruleissuitable8Needtomanuallybuildandmaintainrule-basedsystem.3StatisticalThislecture4
7、Scaleswell,canbeveryaccurate,automatic8Requiresclassiedtrainingdata.Sometimesalot!StatisticalTextClassicationAssumesometextrepresentedbydandsomeclassc.Wewanttolearntheprobabilityofdbeingofclassc:P(cjd)Keyquestions:Howtorepresentd.HowtocalculateP(cjd).TextClassicationinTwoPartsTh