資源描述:
《the dark side of data science》由會員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在工程資料-天天文庫。
1、CHAPTER15TheDarkSideofDataScienceMarckVaismanMoreoftenthannot,datascientistshitroadblocksthatdonotnecessarilyarisefromproblemswithdataitself,butfromorganizationalandtechnicalissues.Thischapterfocusesonsomeoftheseissuesandprovidespracticaladviceondealingwiththem,bothfromhumanandtechnicalperspectiv
2、es.Theanecdotesandexamplesinthischapteraredrawnfromreal-worldexperiencesworkingwithmanyclientsoverthelastfiveyearsandhelpingthemovercomemanyofthesechallenges.Althoughtheideasthatarepresentedinthischapterarenotnew,themainpurposeistohighlightcommonpitfallsthatcanderailanalyticalefforts.Whenputintoc
3、ontext,theseguidelineswillhelpbothdatascientistsandorganizationsbesuccessful.AvoidThesePitfallsThesubjectofrunningasuccessfulanalyticsorganizationhasbeenexploredinthepast.Therearemanybooks,articles,andopinionswrittenaboutitandthiswillnotbead‐dressedhere.However,ifyouwouldliketobesuccessfulinexecu
4、tingand/ormanaginganalyticaleffortswithinyourorganization,youshouldnotheedthe“commandments”listedbelow.I.KnownothingaboutthydataII.ThoushaltprovideyourdatascientistswithasingletoolforalltasksIII.Thoushaltanalyzeforanalysis’sakeonlyIV.ThoushaltcompartmentalizelearningsV.Thoushaltexpectomnipotencef
5、romdatascientists187Thesecommandmentsattempttocluster-relatedideas,whichIwillexploreinthefol‐lowingsections.Ifyoudochoosetoobeyoneormoreofthesecommandments—whichwe’veexplicitlywarnedyounotto—youwillmostlikelyheaddownthepathofnotachievingyourgoals.KnowNothingAboutThyDataYouhavetoknowyourdata,perio
6、d.Thiscannotbestressedenough.Realworlddataismessyanddirty;thatisafact.Regardlessofhowmessyordirtyyourdatais,youneedtounderstandallofitsnuances.Youneedtounderstandthemetadataaboutthedata.Ifyourdataisdirty,knowthat.Iftherearemissingvalues,knowthat,andknowwhytheyaremissing.Ifyouhavemultiplesourceswi
7、thdifferentformatting,knowthat.Knowingthydataisacrucialstepinasuccessfulanalysiseffort.Timespentup-frontunderstandingallofthenuancesandintricaciesofthedataistimewellspent.Theruleofthumbsaysthat80%oftimespentinanalytics