資源描述:
《feature engineering》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、DiscoverFeatureEngineeringHowtoEngineerFeaturesandHowtoGetGoodatItImportanceofFeatureEngineering●Betterfeaturesmeansflexibility.●Betterfeaturesmeanssimplermodels.●Betterfeaturesmeansbetterresults.WhatisFeatureEngineering?●Featureengineeringis●theprocessoftransformingrawdataintofeatures●thatbette
2、rrepresenttheunderlyingproblemtothepredictivemodels●resultinginimprovedmodelaccuracyonunseendata.Sub-ProblemsofFeatureEngineering●FeatureImportance(correlation,randomforest)–Anestimateoftheusefulnessofafeature●FeatureExtraction(PCA)–Theautomaticconstructionofnewfeaturesfromrawdata●FeatureSelecti
3、on(rankingscore,wrapper,LASSO)–Frommanyfeaturestoafewthatareuseful●FeatureConstruction()–Themanualconstructionofnewfeaturesfromrawdata●FeatureLearning–TheautomaticidentificationanduseoffeaturesinrawdataIterativeProcessofFeatureEngineering●Brainstormfeatures●Devisefeatures●Selectfeatures●Evaluate
4、modelsGeneralExamplesofFeatureEngineering●DecomposeCategoricalAttributes–“Item_Color”thatcanbeRed,BlueorUnknown.●DecomposeaDate-Time–2014-09-20T20:45:40Z●ReframeNumericalQuantities–Num_Customer_PurchasesPurchases_Summer,Purchases_FallFeatureselectioninsklearn●Removingfeatureswithlowvariance–Vari
5、anceThreshold●Univariatefeatureselection–Regressionp-values–ClassificationAnovaF-valueVariableRanking●CorrelationCriteria–Pearsoncorrelationcoefficient●SingleVariableClassifiers–ROC(x-FPRy-TPR)AUC●InformationTheoreticRankingCriteria●Noisy(noninformative)features●Applyingunivariatefeatureselectio
6、nbeforetheSVMincreasestheSVMweightattributedtothesignificantfeaturesLimitationsofvariableranking●CanPresumablyRedundantVariablesHelpEachOther?Limitationsofvariableranking●HowDoesCorrelationImpactVariableRedundancy●Limitationsofvariableranking●CanaVariablethatisUselessbyItselfbeUsefulwithOthers?●
7、Featureselectioninsklearn●Recursivefeatureelimination–Allfeature→absoluteweightsarethesmallestarepruned(SVC)●L1-basedfeatureselection–Lasso(higheralphathefewerfeatures)–SVMsandlogistic-regression(smallerCthefewerfeatures)––●