
RelSim is a framework for the lifetime reliability modeling and evaluation of heterogeneous computing systems. The framework can be flexibly configured to model various designs of heterogeneous processors with a various mix of failure models (e.g., electro-migration, gate oxide breakdown) and statistical distributions (e.g., lognormal, Weibull) under user-defined execution scenarios. The framework takes three sets of inputs, i) definitions of failure models, ii) system specifications, and iii) use case conditions. The definitions of failure models specify model parameters and statistical distributions to simulate failure mechanisms. System specifications include the number of PUs, size of heterogeneous components, as well as operating voltage, temperature, and stress time at which the targeted lifetime of a system is defined (i.e., product specifications). Lastly, use case conditions describe operation scenarios where system reliability is to be evaluated. RelSim conducts a sizable set of Monte Carlo simulations to estimate the lifetime reliability of the heterogeneous system. It engages multi-threaded acceleration in CPUs or GPUs to speed up compute-intensive statistical calculations. The framework is also configurable to simulate a variety of dynamic reliability management (DRM) schemes such as replacement (e.g., spare components), rotation, and k-out-of-n (e.g., graceful degradation) models.
Prerequisite, Download, and Build
RelSim uses g++ and nvcc to compile C++ and CUDA codes, respectively. The latest release of the RelSim framework is v1.1 (as of Sept. 2022). To obtain this version of RelSim, use the following git command. Alternatively, you may get the latest stable copy of the RelSim framework from the master branch without the ‑‑branch option in the command below.
$ git clone --branch v1.1 https://github.com/yonsei-icsl/relsim
Try building and executing an example model using the following commands for NVIDIA GPUs.
$ cd relsim/ $ make $ ./relsim
Alternatively, RelSim can be built for CPUs by adding a target=cpu option.
$ cd relsim/ $ make target=cpu $ ./relsim
Documentation
RelSim v1.1 is the latest release (as of Sept. 2022). For instructions regarding the installation and execution of RelSim, visit the GitHub repository: https://github.com/yonsei-icsl/relsim. A related publication is currently under review.
@article{jung_icsl2022, author = {S. Jung and Y. Chon and J. Hwang and B. Kim and A. Trivedi and W. Song}, title = {{Computational Framework for Lifetime Reliability Assessment of Heterogeneous Computing Processors}}, journal = {under review}, month = {Feb.}, year = {2022}, }