国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

首頁> 外文學(xué)位 >Classification and variable selection for high dimensional multivariate binary data: Adaboost based new methods and a theory for the plug-in rule.
【24h】

Classification and variable selection for high dimensional multivariate binary data: Adaboost based new methods and a theory for the plug-in rule.

機(jī)譯:高維多元二進(jìn)制數(shù)據(jù)的分類和變量選擇:基于Adaboost的新方法和插件規(guī)則的理論。

獲取原文
獲取原文并翻譯 | 示例

摘要

We consider theoretically a classification problem where all the covariates are independent Bernoulli random variables Xji,1 ≤ i ≤ n and j = 0, 1, i.e., each variable has values 0 or 1, recording the presence or absence of an event. The parameters of Bernoulli random variables are estimated by using maximum likelihood estimation and they are plugged into the optimal Bayes rule, which is called the plug-in rule. This rule has been applied to real DNA fingerprint data as well as simulations in Wilbur et al. [2002] and shown to classify well even when the independence assumption does not hold. The asymptotic performance of the plug-in rule is the primary object of this study.;Since the number of variables and hence the number of Bernoulli parameters depend on the sample size, n, indicating the need of more and more complex models as n increases, the usual notion of consistency, i.e., convergence of estimates to fixed parameter values isn't applicable. We introduce triangular arrays and a suitably modified definition of consistency called persistence based on how close the performance of the plug-in rule to the classifier with known parameters, pji 1 ≤ i ≤ n and j = 0, 1. We present various cases where the plug-in rule is persistent or not persistent. Under sparsity condition, we show that the plug-in rule with well-chosen variables may overcome non-persistence. This shows that variable selection can be effective in high dimensional data with sparsity condition.;We also discuss convergence rate of the plug-in rule with sobolev ball type parameter spaces. We show that the plug-in rule with selected variables can improve convergence rate which shows that a simpler model may achieve better performance than the full model. As Bickel and Levina [2004] showed a naive Bayes model performs better than the full model, our results also underpin the well-known practical finding that a model with well-chosen variables may achieve better rate in prediction than the full model especially for high dimensional data.;In addition to the theoretical study of the plug-in rule, we propose and study a new methodology for classification and variable selection based on adaboost. Our application to real and simulated data suggests the new methods perform consider ably better than the plug-in rule. A theoretical study of the new methods is yet to be done.
機(jī)譯:我們從理論上考慮一個(gè)分類問題,其中所有協(xié)變量都是獨(dú)立的伯努利隨機(jī)變量Xji,1≤i≤n且j = 0、1,即每個(gè)變量的值均為0或1,記錄事件的存在或不存在。通過使用最大似然估計(jì)來估計(jì)伯努利隨機(jī)變量的參數(shù),并將其插入最佳貝葉斯規(guī)則(稱為插入規(guī)則)。該規(guī)則已應(yīng)用于真實(shí)的DNA指紋數(shù)據(jù)以及Wilbur等人的模擬中。 (2002年),即使在獨(dú)立性假設(shè)不成立的情況下,也能很好地分類。插入規(guī)則的漸近性能是本研究的主要目標(biāo)。由于變量的數(shù)量以及伯努利參數(shù)的數(shù)量取決于樣本量n,因此隨著n的增加,需要越來越復(fù)雜的模型,通常的一致性概念(即估算值收斂到固定參數(shù)值)不適用。我們基于插入式規(guī)則的性能與已知參數(shù)pji 1≤i≤n且j = 0,1的分類器的接近程度,介紹三角數(shù)組和適當(dāng)修改的一致性定義(稱為持久性)。插件規(guī)則是持久性還是非持久性。在稀疏條件下,我們表明具有精心選擇的變量的插件規(guī)則可以克服非持久性。這表明變量選擇在稀疏條件下的高維數(shù)據(jù)中是有效的。;我們還討論了帶有sobolev球類型參數(shù)空間的插入規(guī)則的收斂速度。我們表明,具有選定變量的插件規(guī)則可以提高收斂速度,這表明,較簡單的模型可能會(huì)比完整模型具有更好的性能。由于Bickel和Levina [2004]的研究表明,樸素的貝葉斯模型的性能要優(yōu)于完整模型,因此我們的研究結(jié)果也支持了一個(gè)著名的實(shí)踐發(fā)現(xiàn),即具有良好選擇變量的模型的預(yù)測(cè)率可能高于完整模型,特別是對(duì)于維數(shù)據(jù);除了對(duì)插件規(guī)則的理論研究以外,我們還提出并研究了一種基于adaboost的分類和變量選擇的新方法。我們對(duì)真實(shí)和模擬數(shù)據(jù)的應(yīng)用表明,新方法在性能上要比插件規(guī)則好得多。新方法的理論研究尚未完成。

著錄項(xiàng)

  • 作者

    Park, Junyong.;

  • 作者單位

    Purdue University.;

  • 授予單位 Purdue University.;
  • 學(xué)科 Statistics.
  • 學(xué)位 Ph.D.
  • 年度 2006
  • 頁碼 77 p.
  • 總頁數(shù) 77
  • 原文格式 PDF
  • 正文語種 eng
  • 中圖分類
  • 關(guān)鍵詞

相似文獻(xiàn)

  • 外文文獻(xiàn)
  • 中文文獻(xiàn)
  • 專利
獲取原文

客服郵箱:kefu@zhangqiaokeyan.com

京公網(wǎng)安備:11010802029741號(hào) ICP備案號(hào):京ICP備15016152號(hào)-6 六維聯(lián)合信息科技 (北京) 有限公司?版權(quán)所有
  • 客服微信

  • 服務(wù)號(hào)