国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

首頁> 外文學(xué)位 >Probabilistic topic models for information retrieval and concept modeling.
【24h】

Probabilistic topic models for information retrieval and concept modeling.

機(jī)譯:用于信息檢索和概念建模的概率主題模型。

獲取原文
獲取原文并翻譯 | 示例

摘要

Statistical topic models are a class of probabilistic latent variable models for textual data that represent text documents as distributions over topics. These models have been shown to produce interpretable summarization of documents in the form of topics. In this dissertation, we investigate how the statistical topic modeling framework can be used for information retrieval tasks and for the integration of background knowledge in the form of semantic concepts. We first describe the special-words topic models in which a document is represented as a distribution of (i) a mixture of shared topics, (ii) a special-words distribution specific to the document, and (iii) a corpus-level background distribution. We describe the utility of the special-words topic models for information retrieval tasks and illustrate a variation of the model for metadata enhancement of digital libraries with multiple corpora. We next investigate the problem of integrating background knowledge in the form of semantic concepts into the topic modeling framework. To combine data-driven topics and semantic concepts, we propose the concept-topic model which represents a document as a distribution over data-driven topics and semantic concepts. We extend this model to the hierarchical concept-topic model to incorporate concept hierarchies into the modeling framework. For all these models, we develop learning algorithms and demonstrate their utility with experiments conducted on real-world data sets.
機(jī)譯:統(tǒng)計主題模型是一類針對文本數(shù)據(jù)的概率潛在變量模型,這些文本模型將文本文檔表示為主題上的分布。這些模型已經(jīng)顯示出可以以主題的形式產(chǎn)生可解釋的文檔摘要。在本文中,我們研究了統(tǒng)計主題建??蚣苋绾斡糜谛畔z索任務(wù)以及以語義概念的形式用于背景知識的集成。我們首先描述特殊詞主題模型,其中文檔表示為(i)共享主題的混合,(ii)特定于文檔的特殊詞分布和(iii)語料庫級背景的分布分配。我們描述了專用詞主題模型用于信息檢索任務(wù)的實用性,并說明了用于具有多個語料庫的數(shù)字圖書館的元數(shù)據(jù)增強(qiáng)模型的變體。接下來,我們研究將背景知識以語義概念的形式集成到主題建模框架中的問題。為了結(jié)合數(shù)據(jù)驅(qū)動主題和語義概念,我們提出了概念主題模型,該模型將文檔表示為數(shù)據(jù)驅(qū)動主題和語義概念的分布。我們將此模型擴(kuò)展到層次概念主題模型,以將概念層次結(jié)構(gòu)合并到建??蚣苤小τ谒羞@些模型,我們都會開發(fā)學(xué)習(xí)算法,并通過對真實數(shù)據(jù)集進(jìn)行的實驗來證明其實用性。

著錄項

相似文獻(xiàn)

  • 外文文獻(xiàn)
  • 中文文獻(xiàn)
  • 專利
獲取原文

客服郵箱:kefu@zhangqiaokeyan.com

京公網(wǎng)安備:11010802029741號 ICP備案號:京ICP備15016152號-6 六維聯(lián)合信息科技 (北京) 有限公司?版權(quán)所有
  • 客服微信

  • 服務(wù)號