資源描述:
《an introduction to information retrieval (2009)》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、AnIntroductiontoInformationRetrievalDraftofApril1,2009Onlineedition(c)2009CambridgeUPOnlineedition(c)2009CambridgeUPAnIntroductiontoInformationRetrievalChristopherD.ManningPrabhakarRaghavanHinrichSchützeCambridgeUniversityPressCambridge,EnglandOnlineedit
2、ion(c)2009CambridgeUPDRAFT!DONOTDISTRIBUTEWITHOUTPRIORPERMISSION?2009CambridgeUniversityPressByChristopherD.Manning,PrabhakarRaghavan&HinrichSchützePrintedonApril1,2009Website:http://www.informationretrieval.org/Comments,corrections,andotherfeedbackmostw
3、elcomeat:informationretrieval@yahoogroups.comOnlineedition(c)2009CambridgeUPDRAFT!?April1,2009CambridgeUniversityPress.Feedbackwelcome.vBriefContents1Booleanretrieval12Thetermvocabularyandpostingslists193Dictionariesandtolerantretrieval494Indexconstructi
4、on675Indexcompression856Scoring,termweightingandthevectorspacemodel1097Computingscoresinacompletesearchsystem1358Evaluationininformationretrieval1519Relevancefeedbackandqueryexpansion17710XMLretrieval19511Probabilisticinformationretrieval21912Languagemod
5、elsforinformationretrieval23713Textclassi?cationandNaiveBayes25314Vectorspaceclassi?cation28915Supportvectormachinesandmachinelearningondocuments31916Flatclustering34917Hierarchicalclustering37718Matrixdecompositionsandlatentsemanticindexing40319Websearc
6、hbasics42120Webcrawlingandindexes44321Linkanalysis461Onlineedition(c)2009CambridgeUPOnlineedition(c)2009CambridgeUPDRAFT!?April1,2009CambridgeUniversityPress.Feedbackwelcome.viiContentsListofTablesxvListofFiguresxixTableofNotationxxviiPrefacexxxi1Boolean
7、retrieval11.1Anexampleinformationretrievalproblem31.2A?rsttakeatbuildinganinvertedindex61.3ProcessingBooleanqueries101.4TheextendedBooleanmodelversusrankedretrieval141.5Referencesandfurtherreading172Thetermvocabularyandpostingslists192.1Documentdelineati
8、onandcharactersequencedecoding192.1.1Obtainingthecharactersequenceinadocument192.1.2Choosingadocumentunit202.2Determiningthevocabularyofterms222.2.1Tokenization222.2.2Droppingcommonterms:stopwords272.2.3Normalization(equiv