Energy is increasingly a first-order concern in computer systems. Exploiting energy-accuracy trade-offs is an attractive choice in applications that can tolerate inaccuracies. Recent work has explored exposing this trade-off in programming models. A key challenge, though, is how to isolate parts of the program that must be precise from those that can be approximated so that a program functions correctly even as quality of service degrades.
In this talk I will described our effort on co-desiginig language and hardware to take advantage of approximate computing for significant energy savings. We use type qualifiers to declare data that may be subject to approximate computation. Using these types, the system automatically maps approximate variables to low-power storage, uses low-power operations, and even applies more energy-efficient algorithms provided by the programmer. In addition, the system can statically guarantee isolation of the precise program component from the approximate component. This allows a programmer to control explicitly how information flows from approximate data to precise data. Importantly, employing static analysis eliminates the need for dynamic checks, further improving energy savings. I will describe a micro-architectutre that offers explicit approximate storage and computation and will briefly discuss our recent proposal on using neural networks as approximate accelerators for imperative programs. I will conclude with an overview of our current/future research directions in hardware support for disciplined approximation.
Luis Ceze, Assistant Professor, joined the Computer Science and Engineering faculty in 2007. His research focuses on computer architecture, programming languages and OS to improve the programmability, reliability and energy efficiency of computer systems, with emphasis on parallel and distributed systems. See the SAMPA research group.
He has co-authored over 50 papers in these areas and had several papers selected as IEEE Micro Top Picks and CACM research Highlights. He participated in the Blue Gene, Cyclops, and PERCS projects at IBM and is a recipient of several IBM awards. He is also a recipient of an NSF CAREER Award, a Sloan Research Fellowship and a Microsoft Research Faculty Fellowship. He co-founded Corensic, a UW CSE spin-off company.
He was born in São Paulo, Brazil. He received his Ph.D. in Computer Science from University of Illinois at Urbana-Champaign and his B.Eng. and M.Eng. in Electrical Engineering from University of São Paulo, Brazil.
I will review physics of the so-called "thermodynamic limit" on the energy consumption at computation, and C. Bennett's idea of reversible computing, which allows that limit to be avoided. Unfortunately, even if implemented in hardware virtually free of static power consumption (such as Parametric Quantron circuits), a genuinely reversible computation would require exponentially large resources. Selective reversibility sacrifices may sharply reduce this hardware overhead, but still leave the circuit speed and defect tolerance relatively low. The implementation of reversible computing in CMOS circuits, with their final static power consumption, adds additional challenges. I believe that the future of this concept will depend on the progress of IC patterning and 3D integration.
Konstantin K. Likharev received the Candidate (Ph.D.) degree from the Department of Physics of Lomonosov Moscow State University, Russia in 1969, and the habilitation degree of Doctor of Sciences from the Higher Attestation Committee of the U.S.S.R. in 1979. From 1969 to 1988 Dr. Likharev was a Member of Research Staff of Moscow State University, and from 1989 to 1991 the Head of the Laboratory of Cryoelectronics of that university. In 1991 he assumed a Professorship at Stony Brook University (Distinguished Professor since 2002). During his research career, Dr. Likharev worked in the fields of nonlinear classical and dissipative quantum dynamics, and solid-state physics and electronics, notably including superconductor electronics and nanoelectronics. He is an author of more than 250 original publications, 75 review papers and book chapters, 2 monographs, and several patents. Dr. Likharev is a Fellow of the APS and IEEE.
All of computing today relies on an abstraction where software expects hardware to behave flawlessly for all inputs, under all conditions. While this abstraction worked well historically, due to the relatively small magnitude of variations in hardware and environment, computing will increasingly be done with devices and circuits that are inherently stochastic because of how small they are, or whose behavior is stochastic due to manufacturing and environmental uncertainties. For such emerging circuits and devices, the cost of guaranteeing correctness will be prohibitive, and we will need to fundamentally rethink the correctness contract between hardware and software. Such rethinking becomes particularly compelling considering that a significant amount of energy is wasted in guaranteeing reliability even for applications that are inherently error tolerant.
The primary goal of my research has been to revisit the correctness contract between hardware and software to enable extremely energy-efficient computing. Instead of computing machines where hardware variations are always hidden from the software behind conservative design specifications, my research advocates computing machines (stochastic processors) where (a) these variations are opportunistically exposed to the highest layers of software in the form of hardware errors, and (b) software and hardware are optimized to maximize energy savings while delivering acceptable outputs, in spite of errors. In this talk, I will describe architecture and physical design-based approaches to build and optimize stochastic processors. I will also discuss our ongoing work on building applications for such processors. As a proof of concept, I will discuss an example prototype system based on commodity hardware that exploits application-level error tolerance to maximize system efficiency. Finally, I will outline some other promising approaches to energy-efficient computing for emerging applications.
John Sartori received a B.S. degree in electrical engineering, computer science, and mathematics from the University of North Dakota, Grand Forks and a M.S. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign (UIUC). He is currently finishing a Ph.D. in electrical and computer engineering at UIUC. His research interests include stochastic computing, energy-efficient computing, and system architectures for emerging workloads. John's research has been recognized by a best paper award [CASES 2011] and a best paper award nomination [HPCA 2012] and has been the subject of several keynote talks and invited plenary lectures. His work has been chosen to be the cover feature for popular media sources such as BBC News and HPCWire, and has also been covered extensively by scientific press outlets such as the IEEE Spectrum and the Engineering and Technology Magazine. When not doing research, John enjoys outdoor activities in the balmy Champaign weather, playing music, and studying and discussing philosophy.
More than ten years ago, it was envisioned that the interconnects will be the limiters for continued increase in compute performance. Now we know that it's not the interconnects but power and energy that has been the limiter. As technology scaling continues providing abundance of transistors, and new architectures to continue to deliver performance in a given power envelope, we need to revisit the role of interconnects. This talk will touch on technology outlook, future architectures and design directions for continued performance towards Exascale, and the role of interconnects-whether they will help or hinder!
Shekhar Borkar graduated with an M.Sc in Physics from University of Bombay in 1979, MSEE from University of Notre Dame in 1981 and joined Intel Corp, where he worked on the 8051 family of microcontrollers, and Intel's supercomputers. His research interests are low power, high performance digital circuits, and high speed signaling. Shekhar is an Intel Fellow, an IEEE Fellow, and Director of Extreme-scale Technologies in Intel Labs.
Scaling multicores to thousands of cores efficiently requires significant innovation across the software-hardware stack. On one hand, to expose ample parallelism, many applications will need to be divided in fine-grain tasks of a few thousand instructions each, and scheduled dynamically in a manner that addresses the three major difficulties of fine-grain parallelism: locality, load imbalance, and excessive overheads. On the other hand, hardware resources must scale efficiently, even as some of them are shared among thousands of threads. In particular, the memory hierarchy is hard to scale in several ways: conventional cache coherence techniques are prohibitively expensive beyond a few tens of cores, and caches cannot be easily shared among multiple threads or processes. Ideally, software should be able to configure these shared resources to provide good overall performance and quality of service (QoS) guarantees under all possible sharing scenarios.
In this talk, I will present several techniques to scale both software and hardware. First, I will describe a scheduler that uses high-level information from the programming model about parallelism, locality, and heterogeneity to perform scheduling dynamically and at fine granularity to avoid load imbalance. This fine-grain scheduler can use lightweight, flexible hardware support to keep overheads small as we scale up. Second, I will present a set of techniques that, together, enable scalable memory hierarchies that can be shared efficiently: ZCache, a cache design that achieves high associativity cheaply (e.g., 64-way associativity with the latency, energy and area of a 4-way cache) and is characterized by simple and accurate analytical models; Vantage, a cache partitioning technique that leverages the analytical guarantees of ZCache to implement scalable and efficient partitioning, enabling hundreds of threads to share the cache in a controlled manner, providing configurability and isolation; and SCD, which leverages ZCache to implement scalable cache coherence with QoS guarantees.
Daniel Sanchez is a PhD candidate in the Electrical Engineering Department at Stanford University. His research focuses on large-scale multicores, specifically on scalable and dynamic fine-grain runtimes and schedulers, hardware support for scheduling, scalable and efficient memory hierarchies, and architectures with QoS guarantees. He has earned an MS in Electrical Engineering from Stanford, and a BS in Telecommunication Engineering from the Technical University of Madrid (UPM).
Life in the time of Dennard scaling was relatively easy for architects and the computer industry. Every process generation delivered twice as many transistors to a chip that could run at a 1.4 times faster clock rate and consume the same power as the previous generation. General purpose processors spent this bounty on deep pipelining for high clock rates, extreme out-of-order execution to mine instruction-level parallelism, and large on-chip caches. In today's post-Dennard scaling world that no longer benefits from voltage scaling, each generation of process technology doubles chip transistor count but requires 40% more power at the same clock rate as the previous generation. In this era, every computing device is energy or power limited and energy efficiency is equivalent to performance. This talk will describe the challenges facing computer architectures, ranging from mobile devices to high-performance computers. Using examples from contemporary architectures, it will then discuss strategies for creating energy-efficient computers, including extreme energy-efficient microarchitectures, parallelism, data locality, and specialization. The talk will conclude with a set of challenges for the architecture research community and discuss why this era is providing a renaissance of opportunity for innovative computer architectures.
Steve Keckler is the Director of Architecture Research at NVIDIA and Professor of both Computer Science and Electrical and Computer Engineering at the University of Texas at Austin. His research team at UT-Austin developed scalable parallel processor and memory system architectures, including non-uniform cache architectures; explicit data graph execution processors; and micro-interconnection networks to implement distributed processor protocols. At NVIDIA, Dr. Keckler focuses on parallel, energy-efficient architectures that span mobile through supercomputing platforms. He is a Fellow of the ACM, a Fellow of the IEEE, an Alfred P. Sloan Research Fellow, and a recipient of the ACM Grace Murray Hopper award, the President's Associates Teaching Excellence Award at UT-Austin, and the Edith and Peter O'Donnell award for Engineering. He earned a BS in Electrical Engineering from Stanford University and an MS and a Ph.D. in Computer Science from the Massachusetts Institute of Technology.
Chip Multiprocessors (CMPs) are now commodity hardware, but commoditization of parallel software remains elusive. In the near term, the current trend of increased core-per-socket count will continue, despite a lack of parallel software. Future CMPs must deliver thread-level parallelism when software provides threads to run, but must also continue to deliver high single-thread performance — via instruction-level and memory-level parallelism — to mitigate sequential bottlenecks and/or to guarantee service-level agreements. However, power limitations will prevent conventional cores from exploiting both simultaneously.
The Wisconsin Multifacet project has recently developed two alternative scaleable core architectures, which can scale their execution logic up to run single threads fast, or down to run multiple threads within a fixed power budget. WiDGET (Wisconsin Decoupled Grid Execution Tiles) decouples thread context management from a sea of simple execution units. WiDGET’s decoupled design provides flexibility to alter resource allocation for a particular power-performance target while turning off unallocated resources. Forwardflow dynamically builds an explicit internal dataflow representation from a conventional instruction set architecture, using forward dependence pointers to guide instruction wakeup, selection, and issue. Forwardflow’s backend is organized into discrete units that can be individually (de-)activated, allowing each core’s performance to be scaled by system software at the architectural level.
Professor David A. Wood is a Professor in the Computer Sciences Department at the University of Wisconsin, Madison and has a joint appointment in Electrical and Computer Engineering.
Dr. Wood was named an ACM Fellow (2005) and IEEE Fellow (2004), received the University of Wisconsin's H.I. Romnes Faculty Fellowship (1999), received the National Science Foundation's Presidential Young Investigator award (1991), and earned his Ph.D. in Computer Sciences from the University of California, Berkeley (1990). Dr. Wood is Chair of ACM Special Interest Group on Computer Architecture (SIGARCH), Area Editor (Computer Systems) of ACM Transactions on Modeling and Computer Simulation, is Associate Editor of ACM Transactions on Architecture and Compiler Optimization, served as Program Committee Chairman of ASPLOS-X (2002), and has served on numerous program committees. Dr. Wood is an ACM Fellow, an IEEE Fellow, and a member of the IEEE Computer Society. Dr. Wood has published over 70 technical papers and is an inventor on twelve U.S. and International patents.
Dr. Wood co-leads the Wisconsin Multifacet Project with Prof. Mark Hill which is exploring techniques for improving the availability, designability, programmability, and performance of commercial multiprocessor and chip multiprocessor servers.
In this talk, I will discuss Berkeley's efforts at designing efficient architectures for Ion-trap quantum computers and will present our Computer Aided Design (CAD) flow for quantum circuits. The CAD flow can automatically insert quantum error correction, partition and layout quantum circuits, optimize the placement of teleportation and error correction operations, and evaluate the error properties of the resulting layout. With the CAD tool, we are able to study and optimize large quantum circuits (such as adders, Shor's factoring, etc). Among other things, I will argue that quantum circuits should be evaluated in the context of suitable metrics such as ADCR -- the probabilistic equivalent of the Area-Delay product. This talk will reinforce some of the important recent lessons in quantum computing, such as the fact that communication cost and errors significantly impact the behavior of quantum circuits -- so much so that a full layout of a target quantum circuit is desirable. Among other things, I will (1) present Qalypso, a quantum datapath architecture that optimizes ancilla generation, (2) discuss the design of routeable teleportation networks, (3) show how a simple error-correction optimization based circuit retiming can improve ADCR by an order of magnitude or more. This later optimization can produce circuits of greater reliability by removing error correction steps.
John Kubiatowicz is a Professor of EECS at the University of California at Berkeley. Prof. Kubiatowicz received a dual B.S in Physics and Electrical Engineering (1987), as well as an MS in EECS (1993) and PhD in EECS (1998), all from MIT. Kubiatowicz was chosen as one of Scientific American's top 50 researchers in 2002, one of US News and World Report's "people to watch for 2004", and is the recipient of an NSF PCASE award (2000). Kubiatowicz's research interests include quantum computing design tools and architectures, manycore Operating Systems architecture and resource management, multiprocessor and manycore CPU designs, Internet-scale distributed systems, and long-term digital information preservation.
Luis Ceze, Assistant Professor, University Washington
"Safe and General Energy-Aware Programming with Disciplined Approximation"
Konstantin K. Likharev, Professor, Stony Brook University (SUNY)
"Reversible Computing: Possibilities and Challenges"
John Sartori, University of Illinois, Urbana-Champaign
"Stochastic Computing: Embracing Errors in Architecture and Design of Processors and Applications"
Shekhar Borkar, Director, Extreme-scale Technologies, Intel Labs
"Will Interconnect Help or Limit the Future of Computing?"
Daniel Sanchez, Stanford University
"Scaling Software and Hardware for Thousand-Core Systems"
Stephen Keckler, Professor, University of Texas at Austin
" Life After Dennard and How I Learned to Love the Picojoule"
David A. Wood, Professor, University of Wisconsin, Madison
"Two Scalable Core Architectures for Power-Constrained CMPs"
John Kubiatowicz, Professor, University of California, Berkeley
"Optimizing the Layout and Error Properties of Quantum Circuits"