The Third International Workshop on Coarse-Grained Reconfigurable Architectures for High-Performance Computing (CGRA4HPC)

Introduction Important Dates Invited Speakers Workshop Program Call for Papers Topics of Interest Paper submission Organization

Introduction

With the end of Dennard scaling and the impending termination of Moore's law, researchers are actively searching for alternative forms of computing to continue providing better, faster, and less power-hungry systems in the future. Today, several potential architectures are emerging to fill the widening void arising from the end of Moore's law, including radical (and intrusive) systems such as quantum- and neuromorphic computers. However, out of the many proposed architectures, perhaps none is as salient an alternative as Coarse-Grained Reconfigurable Architectures/Arrays (CGRAs).

CGRAs belong to the programmable logic device family of architectures, where the architectures aspire to provide some form of plasticity or reconfigurability. Such reconfigurability allows the silicon to be specialized towards a particular application in order to reduce data movement and improve performance and energy efficiency. Unlike their cousins, the Field-Programmable Gate Arrays (FPGAs), CGRAs provide reconfigurable Arithmetic Logic Units (ALUs) and a highly specialized yet versatile data path. This ``coarsening'' of reconfiguration allows CGRAs to achieve a significant (custom ASIC-like) reduction in power consumption and increase in operating frequency compared to FPGAs. At the same time, they remedy and overcome the expensive von Neumann (instruction-decoding) overhead that traditional general-purpose processors (CPUs) suffer from. In short, CGRAs strike a seemingly perfect balance between the reconfigurability of FPGAs and the performance of CPUs, with power-consumption characteristics closer to custom ASICs.

CGRAs have long research lineages that date back to their inception some 25 years ago (with theory dating back to the 1970s). However, they are recently garnering renewed interest and importance in High-Performance Computing (HPC). Today, we see an explosion in the number of custom-built AI-accelerators intended for use in data centers and the IoT-Edge-Cloud continuum. Many of these accelerators are CGRAs, such as those built by SambaNova or Cerebras. More importantly, there is an active and growing effort to use these AI accelerators to accelerate scientific applications on supercomputers, and many HPC centers are already including these CGRAs in their testbeds (e.g., Cerebras-1 in ORNL or EPCC).

This workshop provides a focused interdisciplinary forum for both CGRA hardware researchers and HPC/distributed computing researchers from academia or industry to come together to discuss state-of-the-art CGRA research for use in emerging HPC systems.

Important Dates

Invited speaker: Patricia Gonzalez-Guerrero (Berkeley Lab)

Abstract

Massive parallelism and extreme heterogeneity are key for the future of high-performance computing (HPC). One of the most popular execution models for parallelism involves shared-memory, with hardware-based cache-coherence mechanisms that enforce atomicity, ensuring transparent data movement and memory consistency. However, as the levels of parallelism (up to 100M cores) and heterogeneity increase, the scalability of cache coherence protocols is compromised. Hardware message queues (HMQs) might be key for practical massive parallelism and extreme heterogeneity because it offers a low-latency direct path for inter-node communication that bypasses expensive cache-coherence protocols. Contrary to general-purpose cache-coherence systems, the same HMQ mechanisms can be used for general-purpose cores such as RISCVs or kick-start computation in specialized accelerators such as Fast Fourier Transform. In this work, we propose MoSAIC, a full-stack platform to facilitate the evaluation and design space exploration of HMQs in parallel and heterogeneous architectures. Since field programmable arrays (FPGAs) provide a cost-effective testbed for hardware exploration, we aim at a lightweight, flexible architecture optimized for FPGAs, however, MOSAIC could also target chiplets or SoC/ASIC.

Bio

Patricia Gonzalez-Guerrero is a research scientist at Berkeley Lab. She earned her Ph.D. in 2019 and her M.Sc. in 2015, both from the University of Virginia in Charlottesville, VA, USA. She completed her bachelor's degree in 2008 at Pontifical Xavierian University in Bogota, Colombia. She worked as an ASIC Design and Verification Engineer at Hewlett-Packard in Costa Rica. Her research focuses on ultra-low power digital and mixed-signal circuit design for non-conventional computing paradigms aimed at reducing power and energy consumption. This includes areas such as synchronous and asynchronous stochastic computing, computing with sigma-delta streams, and race logic. Additionally, she has been exploring the use of FPGAs for the evaluation and exploration of high-performance computing and chiplet-based specialized accelerators.

Invited speaker: Andrew Schmidt (Advanced Micro Devices)

Abstract

As academic, research, and industry explores different computer architectures, such as Course Grained Reconfigurable Arrays (CGRAs) and Neural Processing Units (NPUs) we will describe the AMD Ryzen AI™ platform and AMD’s NPUs. We present Riallto, an open-source exploration framework for first time users of the NPU developed by teams from the AMD Research and Advanced Development group and the AMD University Program. AMD Ryzen AI is the world’s first built-in AI engine on select x86 computers. This dedicated engine is built on the AMD XDNA™ spatial dataflow NPU architecture consisting of a tiled array of AI Engine processors and is designed to offer lower latency and better energy efficiency. Such processor arrays are also found in the Versal Adaptive SoC enabling rapid development and evaluation across heterogenous architectures. This integration optimizes efficiency by offloading specific AI processing tasks such as background blur, facial detection, and eye gaze correction, freeing up CPU and GPU cycles and enhancing system efficiency. With Ryzen AI-powered laptops or miniPCs, you can develop innovative applications and productivity solutions like Information search, summarization, transcription and so much more. Riallto lowers the barrier of entry and access to the AIEs and includes a wealth of education material via Juypter Notebooks that makes understanding and using ML accelerators in an ever-increasing heterogenous environment. We are excited to share details of the hardware and software architecture with the CGRA4HPC community and see how the technology can be leveraged by their work.

Workshop Program

CGRA4HPC 2024 will be held in conjunction with IPDPS 2024 in San Francisco, US, on Friday, May 31st.

Time Session Description Authors
09:00am Introduction Opening
09:15 am Invited Exploring Coarse-Grained Arrays with Riallto: An Open-Source Framework for AMD Neural Processing Units Andrew Schmidt (Advanced Micro Devices)
10:00 am Coffee break
10:30 am Technical Paper 1: An Architecture-Agnostic Dataflow Mapping Framework on CGRA Jiangnan Li, Yazhou Yan, and Lingli Wang
11:00 am Technical Paper 2: TransMap: An Efficient CGRA Mapping Framework via Transformer and Deep Reinforcement Learning Jingyuan Li, Yuan Dai, and Lingli Wang
11:30 am Technical Paper 3: Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches Chihyo Ahn, Shinnung Jeong, Liam Paul Cooper, Nicholas Parnenzini, and Hyesoon Kim
12:00 pm Lunch break
01:30 pm Invited MoSAIC: Modular system for Acceleration Integration in HPC using programmable logic Patricia Gonzalez-Guerrero (Lawrence Berkeley National Laboratory)
02:30 pm Technical Paper 4: CGRA-ME 2.0: A Research Framework for Next-Generation CGRA Architectures and CAD Omar Ragheb, Stephen Wicklund, Matthew Walker, Rami Beidas, Adham Ragab, Tianyi Yu, and Jason Anderson
03:00 pm Coffee break
03:30 pm Technical Paper 5: A Scalable Mapping Method for Elastic CGRAs Makoto Saito, Takuya Kojima, Hideki Takase, and Hiroshi Nakamura
04:00 pm Technical Paper 6: GIM (Ghost In the Machine): A Coarse-Grained Reconfigurable Compute-In-Memory Platform for Exploring Machine-Learning Architectures Maya Borowicz, James Ding, Winnie Fan, Zhongqi Gao, Davis Jackson, Ares Lu, Sophia Rohlfsen, and Ray Simar
04:15 pm Panel AI for CGRAs -- how can AI improve CGRA architecture research and CAD?
05:00 pm Ending Workshop conclusion

Call for Papers

The call for paper is available to download HERE

Topics of Interest

Topics of interest include (but is not limited) to the following:

CGRA Hardware and Architectures

Hybrid Processor/CGRA Technology

Programming Models, Compilers, and Middleware

Use-Cases and Experiments

CGRAs and Generative Artificial Intelligence

  • New emerging CGRA-like architectures for Generative AI
  • Case studies and evaluations of CGRAs for (Generative) AI
  • Paper Submission

    We welcome authors to contribute full-length research papers subject to the topics of interest described below. Contributions should be unpublished and not for consideration in other venues. Papers should not exceed eight (8) single-spaced pages, formatted in the double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style). We adopt a single-blind review process. Accepted papers will be included in the workshop proceedings, that will be distributed at the conference and are submitted for inclusion in the IEEE Xplore Digital Library after the conference. We also welcome presentations on new and emerging CGRA technologies from industry and startups. These will be presented at a special lightning session in the workshop. Please contact the workshop organizers (Send mail here) if you are interested in participating in this event.

    Submit your paper HERE

    Organization

    CGRA4HPC Organizers

    CGRA4HPC Program Committee