python data analysis analyzing textual data

python data analysis analyzing textual data

ID:7296215

大?。?.15 MB

頁數(shù):22頁

時間:2018-02-10

python data analysis analyzing textual data_第1頁
python data analysis analyzing textual data_第2頁
python data analysis analyzing textual data_第3頁
python data analysis analyzing textual data_第4頁
python data analysis analyzing textual data_第5頁
資源描述:

《python data analysis analyzing textual data》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在工程資料-天天文庫。

1、AnalyzingTextualDataandSocialMediaInthepreviouschapters,wefocusedontheanalysisofstructureddata,mostlyintabularformat.Inreality,plaintextisthemostpredominantformofdataavailabletoday.Textanalysisappliesanalysisofwordfrequencydistributions,patternrecognition,tagging,linkandassociationanalysis,sentimen

2、tanalysis,andvisualization.WewillanalyzetextwiththePythonNaturalLanguageToolkit(NLTK)library.NLTKcomeswithacollectionofsampletextscalledcorpora.Asmallexampleofnetworkanalysiswillalsobecovered.Thefollowingtopicswillbediscussedinthischapter:?InstallingNLTK?Filteringoutstopwords,names,andnumbers?Theba

3、g-of-wordsmodel?Analyzingwordfrequencies?NaiveBayesclassification?Sentimentanalysis?Creatingwordclouds?SocialnetworkanalysisAnalyzingTextualDataandSocialMediaInstallingNLTKNLTKisaPythonAPIfortheanalysisoftextswritteninnaturallanguages,suchasEnglish.NLTKwascreatedin2001andwasoriginallyintendedasatea

4、chingtool.InstallNLTKwiththefollowingcommand:$sudopipinstallnltk$pipfreeze

5、grepnltknltk==2.0.4Asusual,wewillchecktheinstallationwithanewversionofthepkg_check.pyfile.Thefollowingimportstatementisrequired:importnltkIfeverythingworks,weshouldgetaresultsimilartothefollowing:nltkversion2.0.4nltk.appDESC

6、RIPTIONchartparser:ChartParserchunkparser:Regular-ExpressionChunkParsercollocations:Findcollocationsintextconcordance:Partnltk.ccgDESCRIPTIONFormoreinformationseenltk/doc/contrib/ccg/ccg.pdfPACKAGECONTENTSapichartcombinatorlexiconDATABackwardApplication

7、Theseperformsimplepatternmatchingonsentencestypedbyusers,andrespondwithautomaticallygnltk.chunkDESCRIPTIONClassesandinterfacesforidentifyingnon-overlappinglinguisticgroups(suchasbasenounphrases)inunrestrictedtext.Thisnltk.classifyDESCRIPTIONClassesandinterfacesforlabelingtokenswithcategorylabels(or

8、"classlabels").Typically,labelsarerepresentedwithstrinltk.clusterDESCRIPTIONThismodulecontainsanumberofbasicclusteringalgorithms.Clusteringdescribesthetaskofdiscoveringgroupsofsimilaritenltk.corpusnltk.draw

當前文檔最多預(yù)覽五頁,下載文檔查看全文

此文檔下載收益歸作者所有

當前文檔最多預(yù)覽五頁,下載文檔查看全文
溫馨提示:
1. 部分包含數(shù)學(xué)公式或PPT動畫的文件,查看預(yù)覽時可能會顯示錯亂或異常,文件下載后無此問題,請放心下載。
2. 本文檔由用戶上傳,版權(quán)歸屬用戶,天天文庫負責(zé)整理代發(fā)布。如果您對本文檔版權(quán)有爭議請及時聯(lián)系客服。
3. 下載前請仔細閱讀文檔內(nèi)容,確認文檔內(nèi)容符合您的需求后進行下載,若出現(xiàn)內(nèi)容與標題不符可向本站投訴處理。
4. 下載文檔時可能由于網(wǎng)絡(luò)波動等原因無法下載或下載錯誤,付費完成后未能成功下載的用戶請聯(lián)系客服處理。