資源描述:
《Teaching AI about human knowledge》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、TeachingAIabouthumanknowledgeInesMontaniExplosionAIExplosionAIisadigitalstudiospecialisinginArti?cialIntelligenceandNaturalLanguageProcessing.Open-sourcelibraryforindustrial-strengthNaturalLanguageProcessingspaCy’snext-generationMachineLearninglibraryfordeeplearningwithtextcomingso
2、on:pre-trained,customisablemodelsDataStoreforavarietyoflanguagesanddomainsMachinelearningisprogrammingbyexample.Examplesareyoursourcecode,trainingiscompilation.exampleslabelsinputpredictiontrainingdrawexamplesfromthesamedistributionastheruntimeinputsgoal:system’spredictiongivensome
3、inputmatcheslabelahumanwouldhaveassignedHowmachines“l(fā)earn”Example:trainingasimplepart-of-speechtaggerwiththeperceptronalgorithm(teachthemodeltorecogniseverbs,nouns,etc.)deftrain_tagger(examples):examples=words,tags,contextsW=defaultdict(lambda:zeros(n_tags))theweightswe'lltrainfor(
4、word,prev,next),human_taginexamples:scores=W[word]+W[prev]+W[next]scoreeachtaggivenweights&contextguess=scores.argmax()getthebest-scoringtagifguess!=human_tag:iftheguesswaswrong,adjustweightsforfeatin(word,prev,next):W[feat][guess]-=1decreasescoreforbadtaginthiscontextW[feat][human
5、_tag]+=1increasescoreforgoodtaginthiscontextThebottleneckinAIisdata,notalgorithms.Algorithmsaregeneral,trainingdataisspeci?c.dataquality,dataquantityandaccuracyproblemsarestillthebiggestproblemsinAI(Source:TheStateofAIsurvey)youcanextractknowledgefromallkindsofsources,e.g.sentiment
6、fromemojionReddit?youusuallyneedatleastsomedataspeci?ctoyourproblemannotatedbyhumansWherehumanknowledgeinAIreallycomesfromMechanicalTurkhumanannotators~$5perhourboringtaskslowincentivesImages:AmazonMechanicalTurk,depressing.orgDon’texpectgreatdataifyou’reboringtheshitoutofunderpaid
7、people.Whyarewe“designingaround”this?“TakingaHIT:DesigningaroundRejection,Mistrust,Risk,andWorkers’ExperiencesinAmazonMechanicalTurk”(McInnisetal.,2016)datacollectionneedsthesametreatmentasallotherhuman-facingprocessesgoodUX+purpose+incentives=betterqualitySOLUTION#1UX-drivendataco
8、llectionwithactivelearning