An Improved Feature Selection Method for Data Analytics

機(jī)譯：一種改進(jìn)的數(shù)據(jù)分析特征選擇方法

獲取原文

獲取原文并翻譯 | 示例

頁(yè)面導(dǎo)航

摘要
著錄項(xiàng)
引文網(wǎng)絡(luò)
相關(guān)主題

摘要

Feature selection is the process of choosing a significant subset of features from the given feature set for pattern recognition. It can be treated as a pre-step before constructing the machine learning model which could improve the prediction result. By selecting the most significant features, it would be easier to reduce the time of training, reduce the complexity of the machine learning model, avoid data overfitting, and help the researcher to understand the source data. Most data types of features are either number or string, and most of their distributions are either continuous or categorized. However, there exists a type of feature called a binary feature whereas the value is either 1 or 0. Unfortunately, there is less research work addressing the situation where the large portion of features are binary features. Inspired by some existing feature selection methods, a new framework called FMC_SELECTOR was represented which addresses specifically to select significant binary features from highly imbalanced datasets. By combining the fisher linear discriminant analysis technique and the cross-entropy concept together in our framework, the FMC_SELECTOR can be used to select the most significant features from the given binary feature set. We assess the performance and prediction results of FMC_SELECTOR by comparing it with the other two most popular feature selection methods Univariate Importance (UI) and Recursive Feature Elimination (RFM). The proposed framework showed better results than the benchmarks. The new formula called Mapping Based Cross-Entropy Evaluation (MCE) was derived from cross-entropy which integrated mapping function to address the specific concerns for binary feature. The introduced evaluation method called Positive Case Prediction Score (PPS) could explore some additional information from imbalanced dataset where other existing methods were inadequate or not applicable.

著錄項(xiàng)

作者
Wang, Zhipeng.;
展開(kāi)▼
作者單位

University of Nebraska at Omaha.;

展開(kāi)▼
授予單位 University of Nebraska at Omaha.;
學(xué)科 Computer science.;Artificial intelligence.;Information science.
學(xué)位
年度 2021
頁(yè)碼 102
總頁(yè)數(shù) 102
原文格式 PDF
正文語(yǔ)種 eng
中圖分類(lèi)
關(guān)鍵詞
Computer science.; Artificial intelligence.; Information science.;

機(jī)譯：計(jì)算機(jī)科學(xué)。;人工智能。;信息學(xué)。;

獲取原文

客服郵箱：kefu@zhangqiaokeyan.com

京公網(wǎng)安備：11010802029741號(hào) ICP備案號(hào)：京ICP備15016152號(hào)-6 六維聯(lián)合信息科技 (北京) 有限公司?版權(quán)所有

客服微信
服務(wù)號(hào)

国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

An Improved Feature Selection Method for Data Analytics

摘要

著錄項(xiàng)

引文網(wǎng)絡(luò)

相關(guān)主題

期刊訂閱