資源描述:
《ABSTRACT Extracting Statistical Data Frames from Text》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在行業(yè)資料-天天文庫。
1、ExtractingStatisticalDataFramesfromTextJishengLiang,KrzysztofKoperski,ThienNguyen,andGiovanniMarchisioInsightfulCorporation1700WestlakeAveN,Suite500Seattle,WA98109[jliang,krisk,thien,giovanni]@insightful.comABSTRACTatimeandberathercomputationallyexpensive.Itincludestechniques
2、likelexicalanalysis,multiwordphrasegrouping,senseWepresentaframeworkthatbridgesthegapbetweennaturaldisambiguation,part-of-speechtagging,anaphoraresolution,andlanguageprocessing(NLP)andtextmining.Centraltothisisaroledetermination.TheargumentsagainstNLParethatitiserror-newappro
3、achtotextparameterizationthatcapturesmanyprone,andNLPoutput(i.e.parsetrees)containstoomuchinterestingattributesoftextusuallyignoredbystandardindices,linguisticdetail,noiseanduncertaintytoprovideaworkingliketheterm-documentmatrix.BystoringNLPtags,thenewknowledgebasefordataanal
4、ysisormining.Failuretoaccountforindexsupportsahigherdegreeofknowledgediscoveryandsemanticandsyntacticvariationsacrossadocumentcollectionpatternfindingfromtext.Theindexisrelativelycompact,hasledtodisappointingresultswhentryingtousefineindexingenablingdynamicsearchofarbitraryre
5、lationshipsandeventsinstructuresderivedfromalinguisticparser.largedocumentcollections.WecanexportsearchresultsinInformationExtraction(IE)isamethodologyemployedasaformatsanddatastructuresthataretransparenttostatisticalanalysistoolslikeS-PLUS?.Inanumberofexperiments,weprecursor
6、totextminingespeciallyinbioinformatics[4][5].IEdemonstratehowthisframeworkcanturnmountainsofappliesNLPtechniquestoextractpredefinedsetsofentities,unstructuredinformationintoinformativestatisticalgraphs.relationships,andpatternsofinterestfromdocuments.IEsystems,likethosedevelo
7、pedintheMUC[6]andACE[7]Keywordsprograms,arelimitedintheirpowerofinformationdiscovery.Textmining,naturallanguageprocessing,NLP,statisticaldataFirst,theyemploypre-determinedtemplatesorrulesets.Second,frames,dataanalysis,visualizationtheydonotindexeverythinginacorpus,butonlywhat
8、theyarepreprogrammedtofind.Theaimoftextminingshouldbetofind1.INTRODU