国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

首頁> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training
【24h】

Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training

機譯:利用GPU進行有效的梯度提升決策樹訓(xùn)練

獲取原文
獲取原文并翻譯 | 示例

摘要

In this paper, we present a novel parallel implementation for training Gradient Boosting Decision Trees (GBDTs) on Graphics Processing Units (GPUs). Thanks to the excellent results on classification/regression and the open sourced libraries such as XGBoost, GBDTs have become very popular in recent years and won many awards in machine learning and data mining competitions. Although GPUs have demonstrated their success in accelerating many machine learning applications, it is challenging to develop an efficient GPU-based GBDT algorithm. The key challenges include irregular memory accesses, many sorting operations with small inputs and varying data parallel granularities in tree construction. To tackle these challenges on GPUs, we propose various novel techniques including (i) Run-length Encoding compression and thread/block workload dynamic allocation, (ii) data partitioning based on stable sort, and fast and memory efficient attribute ID lookup in node splitting, (iii) finding approximate split points using two-stage histogram building, (iv) building histograms with the aware of sparsity and exploiting histogram subtraction to reduce histogram building workload, (v) reusing intermediate training results for efficient gradient computation, and (vi) exploiting multiple GPUs to handle larger data sets efficiently. Our experimental results show that our algorithm named ThunderGBM can be 10x times faster than the state-of-the-art libraries (i.e., XGBoost, LightGBM and CatBoost) running on a relatively high-end workstation of 20 CPU cores. In comparison with the libraries on GPUs, ThunderGBM can handle higher dimensional problems which the libraries become extremely slow or simply fail. For the data sets the existing libraries on GPUs can handle, ThunderGBM achieves up to 10 times speedup on the same hardware, which demonstrates the significance of our GPU optimizations. Moreover, the models trained by ThunderGBM are identical to those trained by XGBoost, and have similar quality as those trained by LightGBM and CatBoost.
機譯:在本文中,我們提出了一種新穎的并行實現(xiàn),用于在圖形處理單元(GPU)上訓(xùn)練梯度提升決策樹(GBDT)。得益于分類/回歸方面的出色成果以及諸如XGBoost之類的開源庫,GBDT近年來變得非常流行,并在機器學(xué)習(xí)和數(shù)據(jù)挖掘競賽中贏得了許多獎項。盡管GPU已經(jīng)證明了其在加速許多機器學(xué)習(xí)應(yīng)用程序方面的成功,但是開發(fā)基于GPU的高效GBDT算法仍然是一項挑戰(zhàn)。關(guān)鍵的挑戰(zhàn)包括不規(guī)則的內(nèi)存訪問,使用小輸入的許多排序操作以及樹結(jié)構(gòu)中變化的數(shù)據(jù)并行粒度。為了解決GPU上的這些挑戰(zhàn),我們提出了多種新穎的技術(shù),其中包括(i)運行長度編碼壓縮和線程/塊工作負載動態(tài)分配,(ii)基于穩(wěn)定排序的數(shù)據(jù)分區(qū),以及在節(jié)點拆分中快速高效地進行內(nèi)存ID查找;(iii)使用兩階段直方圖構(gòu)建找到近似的分割點;(iv)意識到稀疏性的直方圖構(gòu)建,并利用直方圖減法來減少直方圖構(gòu)建工作量;(v)將中間訓(xùn)練結(jié)果重新用于有效的梯度計算;以及(vi )利用多個GPU來有效處理更大的數(shù)據(jù)集。我們的實驗結(jié)果表明,我們的名為ThunderGBM的算法比在20個CPU內(nèi)核的高端工作站上運行的最新庫(即XGBoost,LightGBM和CatBoost)快10倍。與GPU上的庫相比,ThunderGBM可以處理更高維度的問題,這些問題變得極其緩慢或完全失敗。對于GPU上現(xiàn)有庫可處理的數(shù)據(jù)集,ThunderGBM在相同硬件上的速度提高了10倍,這證明了我們進行GPU優(yōu)化的重要性。此外,ThunderGBM訓(xùn)練的模型與XGBoost訓(xùn)練的模型相同,并且質(zhì)量與LightGBM和CatBoost訓(xùn)練的模型相似。

著錄項

相似文獻

  • 外文文獻
  • 中文文獻
  • 專利
獲取原文

客服郵箱:kefu@zhangqiaokeyan.com

京公網(wǎng)安備:11010802029741號 ICP備案號:京ICP備15016152號-6 六維聯(lián)合信息科技 (北京) 有限公司?版權(quán)所有
  • 客服微信

  • 服務(wù)號