首頁> 外文學(xué)位 >Failure-Aware Reconfigurable Distributed Virtual Machine for dependable and high productivity computing.

【24h】

Failure-Aware Reconfigurable Distributed Virtual Machine for dependable and high productivity computing.

機(jī)譯：故障感知可重新配置的分布式虛擬機(jī)，用于可靠和高生產(chǎn)率的計(jì)算。

獲取原文

獲取原文并翻譯 | 示例

頁面導(dǎo)航

摘要
著錄項(xiàng)
引文網(wǎng)絡(luò)
相似文獻(xiàn)
相關(guān)主題

摘要

Modern networked computing systems continue to grow in scale and in the complexity of their components and interactions. Component failures become norms instead of exceptions in these environments. Therefore, it is important to ensure the availability and adaptivity of computing services. To this end, we present Failure-Aware Reconfigurable Distributed Virtual Machine ( FAR-DVM) framework to build failure-resilient and dependable high-productivity computing systems.;The framework monitors and analyzes node, cluster and system wide failure behaviors and forecasts prospective failure occurrences based on quantified failure dynamics. The prediction results are utilized to manage system resources in failure-aware manner. The system management components autonomically construct resilient and dependable services and integrate geographically distributed resources into a seamless environment.;Within FAR-DVM framework, we propose hPREFECTS for proactive failure management. It collects failure events from compute nodes at runtime and constructs a failure signature for each event. It then analyzes the temporal and spatial correlations among failure signatures in different system scopes. The quantified correlation data is used by a failure predictor in forecasting the occurrence time of failures in the near future.;To manage system resources in a failure-aware manner, we also propose a construction and reconfiguration strategy for distributed virtual machines (DVM). It leverages the failure prediction results in resource management. We consider both the performance and reliability status of compute nodes, and define a capacity-reliability metric to combine the effects of both factors in node selection. We propose Best-fit algorithms with optimistic and pessimistic selection strategies to find the best qualified nodes on which to construct and reconfigure DVMs.;We have designed and implemented a prototype of FAR-DVM and evaluated it in production environments. The hPREFECTS achieves more than 76% accuracy in offline prediction of failures by using the Los Alamos HPC traces. For online predictions, its accuracy is more than 70% in the Wayne State Computational Grid. We enhance the system productivity by using our proposed failure-aware resource management strategy with practically achievable accuracy of failure prediction. With the Best-fit strategies, the job completion rate is increased by 17.6% compared with that achieved in the current LANL HPC cluster. The task completion rate reaches 91.7% with 83.6% utilization of relatively unreliable nodes.;Complement to the work on failure-aware resource management, we have also proposed a service migration mechanism which moves runtime computing services from one compute node to another, in face of system anomalies. To evaluate the goodness of migration polices, we have investigated the migration decision problem for load balancing. We derive the optimal time for service migration with the objective of minimizing migration frequency, and obtain the lower bound of the destination server capacity.

機(jī)譯：現(xiàn)代網(wǎng)絡(luò)計(jì)算系統(tǒng)的規(guī)模以及組件和交互的復(fù)雜性不斷增長。在這些環(huán)境中，組件故障已成為規(guī)范，而非異常。因此，確保計(jì)算服務(wù)的可用性和適應(yīng)性很重要。為此，我們提出了故障感知可重配置分布式虛擬機(jī)（FAR-DVM）框架，以構(gòu)建具有故障恢復(fù)能力和可靠性的高生產(chǎn)率計(jì)算系統(tǒng)。該框架監(jiān)視和分析節(jié)點(diǎn)，群集和系統(tǒng)范圍的故障行為，并預(yù)測預(yù)期的故障基于量化的故障動(dòng)態(tài)發(fā)生的事件。預(yù)測結(jié)果用于以故障感知方式管理系統(tǒng)資源。系統(tǒng)管理組件可以自動(dòng)構(gòu)造彈性和可靠的服務(wù)，并將地理上分散的資源集成到一個(gè)無縫環(huán)境中。在FAR-DVM框架內(nèi)，我們提出了hPREFECTS用于主動(dòng)故障管理。它在運(yùn)行時(shí)從計(jì)算節(jié)點(diǎn)收集故障事件，并為每個(gè)事件構(gòu)造故障簽名。然后，它分析了不同系統(tǒng)范圍內(nèi)故障特征之間的時(shí)間和空間相關(guān)性。故障預(yù)測器使用量化的相關(guān)數(shù)據(jù)來預(yù)測不久的將來發(fā)生故障的時(shí)間。為了以故障感知的方式管理系統(tǒng)資源，我們還提出了分布式虛擬機(jī)（DVM）的構(gòu)造和重新配置策略。它在資源管理中利用了故障??預(yù)測結(jié)果。我們同時(shí)考慮了計(jì)算節(jié)點(diǎn)的性能和可靠性狀態(tài)，并定義了容量可靠性指標(biāo)來結(jié)合兩個(gè)因素在節(jié)點(diǎn)選擇中的作用。我們提出了具有樂觀和悲觀選擇策略的最佳擬合算法，以找到在其上構(gòu)造和重新配置DVM的最佳合格節(jié)點(diǎn)。我們已經(jīng)設(shè)計(jì)并實(shí)現(xiàn)了FAR-DVM原型，并在生產(chǎn)環(huán)境中對其進(jìn)行了評(píng)估。通過使用Los Alamos HPC跟蹤，hPREFECTS在離線故障預(yù)測中實(shí)現(xiàn)了76％以上的準(zhǔn)確性。對于在線預(yù)測，在韋恩州計(jì)算網(wǎng)格中，其準(zhǔn)確性超過70％。我們通過使用我們提出的故障感知資源管理策略來提高系統(tǒng)生產(chǎn)率，并且可以實(shí)際實(shí)現(xiàn)故障預(yù)測的準(zhǔn)確性。與當(dāng)前的LANL HPC集群相比，采用最佳策略可以使工作完成率提高17.6％。任務(wù)完成率達(dá)到91.7％，相對不可靠的節(jié)點(diǎn)利用率達(dá)到83.6％。；作為對故障感知資源管理工作的補(bǔ)充，我們還提出了一種服務(wù)遷移機(jī)制，將運(yùn)行時(shí)計(jì)算服務(wù)從一個(gè)計(jì)算節(jié)點(diǎn)轉(zhuǎn)移到另一個(gè)計(jì)算節(jié)點(diǎn)系統(tǒng)異常。為了評(píng)估遷移策略的良好性，我們研究了用于負(fù)載平衡的遷移決策問題。我們以最小化遷移頻率為目標(biāo)，得出服務(wù)遷移的最佳時(shí)間，并獲得目標(biāo)服務(wù)器容量的下限。

著錄項(xiàng)

作者
Fu, Song.;
展開▼
作者單位

Wayne State University.;

展開▼
授予單位 Wayne State University.;
學(xué)科 Computer science.
學(xué)位 Ph.D.
年度 2008
頁碼 143 p.
總頁數(shù) 143
原文格式 PDF
正文語種 eng
中圖分類
關(guān)鍵詞

相似文獻(xiàn)

外文文獻(xiàn)
中文文獻(xiàn)
專利

1. Failure-aware resource management for high-availability computing clusters with distributed virtual machines [J] . Song Fu Journal of Parallel and Distributed Computing . 2010,第4期

機(jī)譯：具有分布式虛擬機(jī)的高可用性計(jì)算集群的故障感知資源管理
2. Stochastic modeling and analysis of hybrid mobility in reconfigurable distributed virtual machines [J] . Song Fu, Cheng-Zhong Xu Journal of Parallel and Distributed Computing . 2006,第11期

機(jī)譯：可重構(gòu)分布式虛擬機(jī)中混合移動(dòng)性的隨機(jī)建模和分析
3. Information dependability in distributed systems: The dependable distributed storage system [J] . Salvatore Distefano, Antonio Puliafito Integrated Computer-Aided Engineering . 2014,第1期

機(jī)譯：分布式系統(tǒng)中的信息可靠性：可靠的分布式存儲(chǔ)系統(tǒng)
4. Failure-Aware Construction and Reconfiguration of Distributed Virtual Machines for High Availability Computing [C] . Song Fu Cluster Computing and the Grid, 2009. CCGRID '09 . 2009

機(jī)譯：用于高可用性計(jì)算的分布式虛擬機(jī)的故障感知構(gòu)造和重新配置
5. Flexible multi-layer virtual machine design for virtual laboratory in distributed systems and grids. [D] . Kim, Dohan. 2005

機(jī)譯：靈活的多層虛擬機(jī)設(shè)計(jì)，適用于分布式系統(tǒng)和網(wǎng)格中的虛擬實(shí)驗(yàn)室。
6. A Distributed Parallel Genetic Algorithm of Placement Strategy for Virtual Machines Deployment on Cloud Platform [O] . Yu-Shuang Dong, Gao-Chao Xu, Xiao-Dong Fu -1

機(jī)譯：云平臺(tái)上虛擬機(jī)部署的分布式并行遺傳算法
7. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing. [O] . Samuel V Angiuoli, James R White, Malcolm Matalka, 2011

機(jī)譯：使用虛擬機(jī)和云計(jì)算評(píng)估微生物序列分析的資源和成本。
8. PVM (Parallel Virtual Machine): A Framework for Parallel Distributed Computing. [R] . Sunderam, V. S. 1989

機(jī)譯：pVm（并行虛擬機(jī)）：并行分布式計(jì)算的框架。

国产bbaaaaa片,成年美女黄网站色视频免费,成年黄大片,а天堂中文最新一区二区三区,成人精品视频一区二区三区尤物

Failure-Aware Reconfigurable Distributed Virtual Machine for dependable and high productivity computing.

摘要

著錄項(xiàng)

引文網(wǎng)絡(luò)

相似文獻(xiàn)

相關(guān)主題

期刊訂閱