Neural Processing Units
Nearing the end of Moore’s Law, computing systems are geared towards specialization; executing a workload using multiple, heterogeneous processing units. With rapid advances in deep learning techniques, neural networks have become important workloads for computer systems. Such trends sparked recent races to develop neural processing units (NPU), specialized processors for neural networks. As deep learning techniques mature, it becomes increasingly complicated to develop and analyze NPUs. We attempt to tackle this challenge in three approaches by developing i) a neural network benchmark suite, ii) a full-system NPU architecture simulation framework, and iii) memory-centric and reconfigurable NPUs. The neural network benchmark suite named Nebula implements full-fledged neural networks and lightweight versions of them, where the compact benchmarks enable the quick analysis of NPU hardware. NeuroSpector is an optimization framework that systematically identifies the optimal scheduling schemes of NPUs for diverse neural networks. NPUsim is a full-system NPU architecture simulation framework that enables cycle-accurate timing simulations with in-place functional executions of deep neural networks. It supports fully reconfigurable dataflows, accurate energy and cycle cost calculations, and cycle-accurate functional simulations in conjunction with sparsity, compression, and quantization techniques embedded in the NPU hardware. We envision that these frameworks will propel a number of related research towards developing memory-centric, reconfigurable NPUs.
Scalable Memory Systems for High-Performance Computing
As emerging applications growingly demand larger memory bandwidth and capacity, the memory becomes the critical performance bottleneck of computing systems. In addition, the virtualization of heterogeneous processing units including GPUs and accelerators makes memory management increasingly more complicated. To overcome memory wall problems, we need innovative improvements to memory systems such as heterogeneous memory designs pairing up high-bandwidth on-package DRAM with large-capacity off-package memory as well as processing in (or near) memory. Such changes in memory systems address various engineering problems in computer architectures and operating systems. Our research interests include architectural and OS supports for efficient memory management of high-performance computing systems.
GPU Microarchitecture for Deep Learning and Quantum Computing
Graphics processing units (GPUs) have been serving as de facto hardware to compute deep learning algorithms. Massively parallel executions in a single-instruction, multiple-thread (SIMT) fashion enable the GPUs to achieve substantially greater throughput than conventional CPUs. However, the GPUs still fall short of the desired throughput that deep learning workloads demand. Tensor cores introduced in recent NVIDIA GPUs attempt to augment the operational intensity of neural operations, which are specialized execution units dedicated to performing matrix-multiply-accumulate operations. By operating directly on matrices instead of scalar data elements, the tensor cores provide superior throughput to conventional CUDA cores. However, augmenting the operation intensity of GPUs by incorporating tensor cores makes DNN applications more memory-bounded. Our research focuses on enhancing the memory management efficiency of GPUs with tensor cores for deep learning workloads and quantum computing simulations.
Power, Thermal, and Reliability
Operations of contemporary computer systems are bounded by power, temperature, and reliability constraints. Such physical limitations have pushed the computer systems to enhance power efficiency (i.e., operations per watt), which requires innovative changes from devices and circuits to microarchitectures. Devising effective solutions for power, thermal, and reliability management requires a profound understanding of multi-physics phenomena. Microarchitectural activities in a processor consume power, which results in thermal dissipation. The temperature increase in turn raises leakage power, creating a positive feedback loop between the power and thermal dissipation. The temperature increase also has an adverse impact on reliability (e.g., aging and failures). These complex multi-physics interactions eventually constrain the operations of computer systems and their performance. Our research interests lie in modeling the multi-physics phenomena in heterogeneous computer systems, machine learning-based SoC power modeling and management solutions, etc.