国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

首頁> 外文學(xué)位 >New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases.
【24h】

New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases.

機譯:在某些不確定數(shù)據(jù)庫中頻繁進行順序模式和項集數(shù)據(jù)挖掘的新算法。

獲取原文
獲取原文并翻譯 | 示例

摘要

In this dissertation, new algorithms and concepts in the areas of sequential pattern and itemset mining—in both certain and uncertain databases—are disseminated. These algorithms include: 1) An improved algorithm for mining frequent patterns, which uses a pattern-growth enumeration technique and new data structure called a first-occurrence forest (FOF). The new data structure is composed of a simple linked-list of pointers, in which each pointer in the list point to the first-occurrences of an item within an aggregate tree (a lossless / compressed version of the pattern database); 2) The definition of the new concept and algorithm for the mining of relaxed closed subspace clusters (RCSCs). RCSCs are mined by transforming the problem of mining for subspace clusters into the problem of mining for frequent itemsets. After the transformation of the problem, a new quality measurement is defined—known as the diameter of the subspace cluster. By using the diameter, the concept of a closed subspace cluster is defined. Lastly, to combat the problem of a possible low number of unique diameters—and thus a high number of unique subspace clusters—a relaxation factor is introduced to partition the user-defined minimum closeness threshold and create a finite number of intervals to which each subspace cluster must belong; 3) The definition of the new concept and algorithm for probabilistic frequent closed itemsets (PFCIs) in uncertain databases called PFCIM for probabilistic frequent closed itemset mining. To this end, the definition of probabilistic support is introduced to facilitate the mining of a lossless and compact representation of all probabilistic frequent itemsets (PFIs) from an uncertain database; 4) A new algorithm for the approximation of PFCIs, named A-PFCIM for approximate probabilistic frequent closed itemset mining. Because the probability of the support of an itemset occurring within an uncertain database can be modeled using the Poisson binomial distribution, and the fact that the Poisson binomial distribution can be accurately approximated using the Poisson distribution, the formulation of an algorithm that approximates PFCIs using the Poisson distribution is possible and is formulated; 5) An algorithm for the mining of approximate probabilistic frequent itemsets (A-PFIs) from incremental or evolutionary uncertain databases called (IA-PFIM). In that research, the properties of the expected support of an itemset in an uncertain database, along with the properties of the Poisson distribution, are exploited to define lemmas that allow one to maintain the set of PFIs in an evolving database; finally, 6) The definition of the new concept and algorithm for the mining of probabilistic generalized frequent itemsets (PGFIs) in uncertain databases called PGFIM for probabilistic generalized frequent itemset mining. In that research, a taxonomy is introduced (in the form of a directed acyclical graph)—relating the items that are within the uncertain database, and those which are generalized (or abstract items) which do not appear in the database. Given this taxonomy, there exist generalized items which do not appear in the uncertain database, but nevertheless, can be probabilistically frequent within. Thus, a new method is introduced which calculates the probability of a generalized item appearing within a transaction, and thus, one can then mine for PGFIs in an uncertain database.
機譯:在這篇論文中,在確定的和不確定的數(shù)據(jù)庫中,在順序模式和項目集挖掘領(lǐng)域中傳播了新的算法和概念。這些算法包括:1)一種改進的挖掘頻繁模式的算法,該算法使用模式增長枚舉技術(shù)和稱為首次出現(xiàn)森林(FOF)的新數(shù)據(jù)結(jié)構(gòu)。新的數(shù)據(jù)結(jié)構(gòu)由一個簡單的指針鏈接列表組成,其中列表中的每個指針都指向聚合樹(模式數(shù)據(jù)庫的無損/壓縮版本)中某項的首次出現(xiàn)。 2)新概念和算法的定義,用于挖掘?qū)捤傻姆忾]子空間簇(RCSC)。通過將子空間集群的挖掘問題轉(zhuǎn)換為頻繁項集的挖掘問題來挖掘RCSC。問題轉(zhuǎn)化之后,定義了一種新的質(zhì)量度量,即子空間簇的直徑。通過使用直徑,定義了封閉子空間簇的概念。最后,為了解決可能的唯一直徑數(shù)量少(進而導(dǎo)致唯一子空間簇數(shù)量過多)的問題,引入了松弛因子來劃分用戶定義的最小接近度閾值,并為每個子空間創(chuàng)建有限數(shù)量的間隔集群必須屬于; 3)在不確定的數(shù)據(jù)庫中為概率頻繁關(guān)閉項目集挖掘的概率頻繁關(guān)閉項目集(PFCI)的新概念和算法的定義。為此,引入了概率支持的定義,以幫助從不確定的數(shù)據(jù)庫中挖掘所有概率性頻繁項集(PFI)的無損緊湊表示; 4)一種新的PFCI近似算法,稱為A-PFCIM,用于近似概率頻繁封閉項目集挖掘。由于可以使用泊松二項式分布來建模不確定數(shù)據(jù)庫中支持項集的概率,并且可以使用泊松分布來精確地近似泊松二項式分布這一事實,因此,使用泊松分布是可能的并且是公式化的; 5)一種從(IA-PFIM)增量或進化不確定數(shù)據(jù)庫中挖掘近似概率頻繁項集(A-PFI)的算法。在這項研究中,利用不確定數(shù)據(jù)庫中項目集的預(yù)期支持屬性以及Poisson分布的屬性來定義引理,使人們可以在不斷發(fā)展的數(shù)據(jù)庫中維護一組PFI。最后,6)在不確定的數(shù)據(jù)庫(PGFIM)中定義概率廣義頻繁項集挖掘的概率廣義頻繁項集(PGFI)的新概念和算法的定義。在該研究中,引入了分類法(以有向無環(huán)圖的形式),將不確定數(shù)據(jù)庫中的項目與未出現(xiàn)在數(shù)據(jù)庫中的廣義項目(或抽象項目)相關(guān)聯(lián)。在這種分類法下,存在一些通用項目,這些項目不會出現(xiàn)在不確定的數(shù)據(jù)庫中,但是在概率上可能是經(jīng)常出現(xiàn)的。因此,引入了一種新方法,該方法可計算出廣義項目出現(xiàn)在交易中的可能性,因此,可以在不確定的數(shù)據(jù)庫中挖掘PGFI。

著錄項

  • 作者

    Peterson, Erich Allen.;

  • 作者單位

    University of Arkansas at Little Rock.;

  • 授予單位 University of Arkansas at Little Rock.;
  • 學(xué)科 Computer Science.
  • 學(xué)位 Ph.D.
  • 年度 2012
  • 頁碼 150 p.
  • 總頁數(shù) 150
  • 原文格式 PDF
  • 正文語種 eng
  • 中圖分類
  • 關(guān)鍵詞

相似文獻

  • 外文文獻
  • 中文文獻
  • 專利
獲取原文

客服郵箱:kefu@zhangqiaokeyan.com

京公網(wǎng)安備:11010802029741號 ICP備案號:京ICP備15016152號-6 六維聯(lián)合信息科技 (北京) 有限公司?版權(quán)所有
  • 客服微信

  • 服務(wù)號