資源描述:
《Crowdsourcing a News Query Classification Dataset眾包新聞查詢分類數(shù)據(jù)集》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫(kù)。
1、CrowdsourcingaNewsQueryClassi?cationDatasetRichardM.C.McCreadieCraigMacdonaldIadhOunisDepartmentofComputingDepartmentofComputingDepartmentofComputingScienceScienceScienceUniversityofGlasgowUniversityofGlasgowUniversityofGlasgowGlasgow,G128QQGlasgow,G128QQGlasg
2、ow,G128QQrichardm@dcs.gla.ac.ukcraigm@dcs.gla.ac.ukounis@dcs.gla.ac.ukABSTRACTdataset.WeproposemultipleinterfacesforcrowdsourcedquerylabellingandevaluatetheseinterfacesempiricallyintermsoftheWebsearchenginesarewellknownforaggregatingnewsverticalqualityoftheres
3、ultinglabelsonasmallrepresentativesampleofcontentintotheirresultrankingsinresponsetoqueriesclassi?eduserqueriesfromaWebsearchenginequerylog.Later,weusetheasnews-related.However,nodatasetcurrentlyexistsuponwhichbestperformingoftheseinterfacestogenerateour?nalne
4、wsqueryapproachestonewsqueryclassi?cationcanbeevaluatedandcom-classi?cationdatasetcomprisedofalargerquerysamplefromthepared.Thispaperstudiesthegenerationandvalidationofanewssamelog.Wereportthequalityofourresultingnewsqueryclas-queryclassi?cationdatasetcomprise
5、doflabelscrowdsourcedfromsi?cationdatasetintermsofinter-workerlabellingagreementandAmazon'sMechanicalTurkanddetailsinsightsgained.Notably,accuracywithregardtolabelscreatedseparatelybytheauthors.ourstudyfocusesaroundtwochallengeswhencrowdsourcingnewsMoreover,we
6、furtherinvestigateitsqualityintheformofanad-queryclassi?cationlabels:1)howtoovercomeourworkers'lackditionalagreementstudy,inwhichcrowdsourcingisleveragedforofinformationaboutthenewsstoriesfromthetimeofeachqueryqualityassurance.and2)howtoensuretheresultinglabel
7、sareofhighenoughqualityNotably,oneofthemostinterestingaspectsofnewsqueryclassi-tomakethedatasetuseful.Weempiricallyshowthataworker's?cationlabellingisthetemporalnatureofnews-relatedqueries[16].lackofinformationaboutnewsstoriescanbeaddressedthroughInparticular,
8、aqueryshouldonlybelabelledasnews-relatediftheintegrationofnews-relatedcontentintothelabellinginterfacetherewasarelevantnoteworthystoryinthenewsaroundthetimeandthatthisimprovesthequ