国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

首頁> 外文期刊>International journal of soft computing >Genetic Algorithm Based Dimensionality Reduction for Improving Performance of K-Means Clustering: A Case Study for Categorization of Medical Dataset
【24h】

Genetic Algorithm Based Dimensionality Reduction for Improving Performance of K-Means Clustering: A Case Study for Categorization of Medical Dataset

機譯:基于遺傳算法的降維方法提高K-Means聚類性能:以醫(yī)學數(shù)據(jù)集分類為例

獲取原文
獲取原文并翻譯 | 示例

摘要

Medical data mining is the process of extracting hidden patterns from medical data. Among the various clustering algorithms, k-means is the one of most widely used clustering technique. The performance of k-means clustering depends on the initial cluster centers and might converge to local optimum, k-means does not guarantee unique clustering because it generates different results with randomly chosen initial clusters for different runs of k-means. In addition the performance of any data mining depends on feature subset selection. This study attempts to improve performance of k-means clustering using two stages. As part of first stage, this study investigates the use of wrapper approach for feature selection for clustering where Genetic Algorithm (GA) is used as a random search technique for subset generation, wrapped with k-means clustering. In second stage, GA and Entropy based Fuzzy Clustering (EFC) are used to find the initial centroid for k-means clustering. Experiments have been conducted using standard medical dataset namely Pima Indians Diabetes Dataset (PIDD) and Heart statlog. Results show markable reduction of 8.42 and 18.89% in the classification error of k-means clustering for PIDD and Heart statlog dataset using features identified by proposed wrapper approach and initial centroids identified by GA when compared to k-means performance with all the features and centroids initialized by random method for PIDD and Heart statlog dataset.
機譯:醫(yī)學數(shù)據(jù)挖掘是從醫(yī)學數(shù)據(jù)中提取隱藏模式的過程。在各種聚類算法中,k-means是最廣泛使用的聚類技術(shù)之一。 k均值聚類的性能取決于初始聚類中心,并且可能會收斂到局部最優(yōu)值,k均值不能保證唯一的聚類,因為它針對不同的k均值運行隨機選擇初始聚類會產(chǎn)生不同的結(jié)果。另外,任何數(shù)據(jù)挖掘的性能都取決于特征子集的選擇。這項研究試圖通過兩個階段來提高k均值聚類的性能。作為第一階段的一部分,本研究調(diào)查了使用包裝方法進行聚類的特征選擇,其中遺傳算法(GA)被用作子集生成的隨機搜索技術(shù),并用k均值聚類。在第二階段,使用基于GA和熵的模糊聚類(EFC)查找k均值聚類的初始質(zhì)心。已經(jīng)使用標準醫(yī)學數(shù)據(jù)集(即Pima Indians Diabetes Dataset(PIDD)和Heart statlog)進行了實驗。結(jié)果表明,與所有特征和質(zhì)心的k-means性能相比,使用擬定包裝方法識別的特征和GA識別的初始質(zhì)心,針對PIDD和Heart statlog數(shù)據(jù)集的k-means聚類分類誤差顯著降低了8.42和18.89%。通過PIDD和Heart statlog數(shù)據(jù)集的隨機方法初始化。

著錄項

相似文獻

  • 外文文獻
  • 中文文獻
  • 專利
獲取原文

客服郵箱:kefu@zhangqiaokeyan.com

京公網(wǎng)安備:11010802029741號 ICP備案號:京ICP備15016152號-6 六維聯(lián)合信息科技 (北京) 有限公司?版權(quán)所有
  • 客服微信

  • 服務(wù)號