資源描述:
《Data Quality from Crowdsourcing(conference)》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在學術(shù)論文-天天文庫。
1、NAACLHLT2009ActiveLearningforNaturalLanguageProcessing(ALNLP-09)ProceedingsoftheWorkshopJune5,2009Boulder,ColoradoProductionandManufacturingbyOmnipressInc.2600AndersonStreetMadison,WI53707USAEndorsedbythefollowingACLSpecialInterestGroups:?SIGNLL,SpecialInterestGroupforNaturalLanguageLearning
2、?SIGANN,SpecialInterestGroupforAnnotationc2009TheAssociationforComputationalLinguisticsOrdercopiesofthisandotherACLproceedingsfrom:AssociationforComputationalLinguistics(ACL)209N.EighthStreetStroudsburg,PA18360USATel:+1-570-476-8006Fax:+1-570-476-0860acl@aclweb.orgISBN978-1-932432-40-4iiIntr
3、oductionWelcometotheworkshoponActiveLearningforNaturalLanguageProcessing!Westartedorganizingthisworkshopinmid-2008afterstrongencouragementinresponsetosomeofourownworkinthearea.Aswegatheredmembersoftheprogramcommittee,thetimelinessofthetopicresonatedwithseveralofthem:thegrowingbodyofknowledge
4、onactivelearningandonactivelearningforNLPinparticularmakesthistopiconeworthexploringinafocusedworkshopratherthaninisolatedpapersinoccasional,far-?ungconferences.Labeleddataisaprerequisiteformanypopularalgorithmsinnaturallanguageprocessingandmachinelearning.Whileitispossibletoobtainlargeamoun
5、tsofannotateddataforwell-studiedlanguagesinwell-studieddomainsandwell-studiedproblems,labeleddataarerarelyavailableforlesscommonlanguages,domains,orproblems.Unfortunately,obtaininghumanannotationsforlinguisticdataislabor-intensiveandtypicallythecostliestpartoftheacquisitionofanannotatedcorpu
6、s.Ithasbeenshownbeforethatactivelearningcanbeemployedtoreduceannotationcostsbutnotattheexpenseofquality.WhilediverseworkoverthepastdecadehasdemonstratedthepossibleadvantagesofactivelearningforcorpusannotationandNLPapplications,activelearningisnotwidelyusedinmanyongoingdataannotationtasks.Muc
7、hofthemachinelearningliteratureonthetopichasfocusedonactivelearningforclassi?cationproblemswithlessattentiondevotedtothekindsofproblemsencounteredinNLP.Relatedtopicssuchasdistributed“humancomputation”,cost-sensitivemachinelearning,andsemi-supervise