The Fourth International Workshop on Coarse-Grained Reconfigurable Architectures for High-Performance Computing and AI (CGRA4HPCA)

Submit your research paper here

Zoom link

Introduction

With the end of Dennard scaling and the impending termination of Moore's law, researchers are actively searching for alternative forms of computing to continue providing better, faster, and less power-hungry systems in the future. Today, several potential architectures are emerging to fill the widening void arising from the end of Moore's law, including radical (and intrusive) systems such as quantum- and neuromorphic computers. However, out of the many proposed architectures, perhaps none is as salient an alternative as Coarse-Grained Reconfigurable Architectures/Arrays (CGRAs).

CGRAs belong to the programmable logic device family of architectures, where the architectures aspire to provide some form of plasticity or reconfigurability. Such reconfigurability allows the silicon to be specialized towards a particular application in order to reduce data movement and improve performance and energy efficiency. Unlike their cousins, the Field-Programmable Gate Arrays (FPGAs), CGRAs provide reconfigurable Arithmetic Logic Units (ALUs) and a highly specialized yet versatile data path. This ``coarsening'' of reconfiguration allows CGRAs to achieve a significant (custom ASIC-like) reduction in power consumption and increase in operating frequency compared to FPGAs. At the same time, they remedy and overcome the expensive von Neumann (instruction-decoding) overhead that traditional general-purpose processors (CPUs) suffer from. In short, CGRAs strike a seemingly perfect balance between the reconfigurability of FPGAs and the performance of CPUs, with power-consumption characteristics closer to custom ASICs.

CGRAs have long research lineages that date back to their inception some 25 years ago (with theory dating back to the 1970s). However, they are recently garnering renewed interest and importance in High-Performance Computing (HPC). Today, we see an explosion in the number of custom-built AI-accelerators intended for use in data centers and the IoT-Edge-Cloud continuum. Many of these accelerators are CGRAs, such as those built by SambaNova or Cerebras. More importantly, there is an active and growing effort to use these AI accelerators to accelerate scientific applications on supercomputers, and many HPC centers are already including these CGRAs in their testbeds (e.g., Cerebras-1 in ORNL or EPCC).

This workshop provides a focused interdisciplinary forum for both CGRA hardware researchers and HPC/distributed computing researchers from academia or industry to come together to discuss state-of-the-art CGRA research for use in emerging HPC systems and Artifical Intelligence (AI).

Important Dates

Paper submission: ~~February 1st, 2025~~ , ~~February 14th, 2025~~, February 21th, 2025 (FINAL)

Paper notification: ~~February 23rd, 2025~~ March 11th

Camera-ready due: ~~March 6th, 2025~~ March 16th, 2025

Registration link: HERE

Invited talks

Talk #1: Hardware and Software Co-Designed Neuron Array Processor for AI-IoT Applications

Atsutake Kosuge (The University of Tokyo, Tokyo, Japan)

Abstract

To apply AI algorithms to IoT applications, many types of low-power AI processors have been developed these days. Many of these achieve power consumption that is orders of magnitude lower than GPUs by minimizing memory access; however, they sacrifice versatility through extensive hardware optimization and specialization. To address this issue, we developed a neuron array that allows programmable synaptic wiring with stored weights on a template circuit. We demonstrate its general applicability to a wide range of applications of wearable IoT. Additionally, we introduce a co-optimization method of software and hardware to reduce the number of wirings, which is a key challenge of the neuron cell array.

Biography

Atsutake Kosuge received the Ph.D. degree in electrical engineering from Keio University, Yokohama, Japan, in 2016. From 2014 to 2017, he was a JSPS Research Fellow at Keio University. From 2017 to 2020, he held research positions at Hitachi Ltd., Tokyo, Japan, and Sony Corporation, Tokyo. In 2021, he joined the Systems Design Laboratory (d.lab) as an Assistant Professor, The University of Tokyo, Tokyo, Japan. His research interests include energy-efficient computing and 3-D chip integration technologies.

Talk #2: Architecture/Compiler Co-Design of Tightly Coupled Processor Arrays

Frank Hannig (Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany)

Abstract

Parallel computing is ubiquitous and can be found in a wide variety of applications, from high-performance computing to embedded systems. A key factor across all these areas is energy efficiency, which denotes the number of computations that can be performed per unit of energy. Hence, customization and a tight co-design of architecture and compiler are crucial for scaling future systems further. This talk presents tightly coupled processor arrays (TCPAs), a class of massively parallel on-chip arrays of locally interconnected processing elements (PEs), as well as corresponding compilation concepts. TCPAs differ from coarse-grained reconfigurable arrays (CGRAs) in that the PEs are programmable, utilizing small instruction memories. They allow for the parallel execution of multiple rather than just the innermost loop dimension of many computationally intensive applications. Besides introducing the main architectural building blocks of these arrays, the presentation covers the corresponding application mapping, which starts from a functional programming language and involves symbolic loop compilation. In this approach, the loop bounds and number of available PEs can be unknown at compile time. Finally, the talk presents a concrete TCPA chip design called ALPACA, recently manufactured in 22 nm technology.

Biography

Frank Hannig received a Diploma degree in an interdisciplinary course of study in electrical engineering and computer science from the University of Paderborn, Germany, in 2000; a Ph.D. degree (Dr.-Ing.) and a Habilitation degree in computer science from Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany, in 2009 and 2018, respectively. He has led the Architecture and Compiler Design Group in the Computer Science Department at FAU since 2004. His primary research interests are the design of massively parallel architectures, ranging from dedicated hardware to multicore architectures, mapping methodologies for domain-specific computing, and architecture/compiler co-design. He has authored or co-authored more than 200 peer-reviewed publications. Dr. Hannig has served on the program committees of several international conferences (ARC, ASAP, CODES+ISSS, DAC, DATE, DASIP, SAC, SAMOS) and is an associate editor of the Journal of Real-Time Image Processing and IEEE Embedded Systems Letters. He is a Senior Member of the IEEE and an affiliate member of the European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC).

Workshop Program

CGRA4HPCA 2025 will be held in conjunction with IPDPS 2025 in Milano, Italy, on June 3rd.

Location: Politecnico di Milano located at Piazza Leonardo da Vinci 32, Milan 20133.

Workshop program is below:

Time	Activity	Title	Authors
09:00 AM	Welcome session
09:15 AM	Technical Paper	Enabling Manual-Controllable Compilation for Dataflow CGRAs	Felix Böseler, Jörg Walter, Verena Klös
09:45 AM	Technical Paper	RAAP-CGRA: Placement for CGRAs with Restricted Routing Architectures	Anh Nguyen, Sebastian Czyrny, Takahide Yoshikawa, Jason Anderson
10:15 AM	Technical Paper	Benchmarking Floating Point Performance of Massively Parallel Dataflow Overlays on AMD Versal Compute Primitives	Mohamed Bouaziz, Suhaib A. Fahmy
10:45 AM	Coffee Break
11:00 AM	Technical Paper	PRNGine: Massively Parallel Pseudo-Random Number Generation and Probability Distribution Approximations on AMD AI Engines	Mohamed Bouaziz, Suhaib A. Fahmy
11:30 AM	Technical Paper	A Decoupled Coarse-Grained Reconfigurable Architecture by Introducing Data Flow Management Unit	Hisako Ito, Takuya Kojima, Hideki Takase, Hiroshi Nakamura
12:00 AM	Lunch Break
02:00 PM	Invited Talk	Hardware and Software Co-Designed Neuron Array Processor for AI-IoT Applications	Atsutake Kosuge
03:00 PM	Invited Talk	Architecture/Compiler Co-Design of Tightly Coupled Processor Arrays	Frank Hannig
04:00 PM	Coffee Break
04:30 PM	Panel Discussion
05:30 PM	Concluding Remarks

Call for Papers

The call for paper is available to download HERE

Topics of Interest

Topics of interest include (but is not limited) to the following:

Novel high-performance CGRA architectures for HPC and AI, including energy-efficient architectures(incl. asynchronous/clockless CGRAs, powerconsumption optimizations, etc.)
Parallel programming language support for programming CGRA architectures (e.g., supporting OpenMP or CUDA/HIP for programming CGRA architectures)
Compilation strategies, algorithms, and methods for mapping computations and applications onto CGRAs
Smart middleware and runtime systems for support of CGRAs, including multi-CGRA systems for HPC and AI
Experience in porting scientific kernels and applications to state-of-the-art CGRAs (e.g., weather/climate codes, CFD, MD, etc.)
The use of CGRA frameworks (e.g., CGRA-ME and OpenCGRA) to generate and customize architectures
Software-programmable CGRAs (e.g., Xilinx ACAP Versal)
Processors with a tightly interconnected CGRA subsystem
Machine Learning applications and case studies, performance and power-efficiency comparisons between traditional systems (CPUs/GPUs) and CGRAs
Combination of CGRAs and other emerging post-Moore models (e.g., neuromorphic systems)
New emerging CGRA-like architectures for Generative AI
Case studies and evaluations of CGRAs for (Generative) AI
(New) AI and Machine Learning applications and casestudies, performance and power-efficiency comparisons between traditional systems (CPUs/GPUs) and CGRAs

Paper Submission

We welcome authors to contribute full-length research papers subject to the topics of interest described below. Contributions should be unpublished and not for consideration in other venues. Papers should not exceed eight (8) single-spaced pages, formatted in the double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style). We adopt a single-blind review process. Accepted papers will be included in the workshop proceedings, that will be distributed at the conference and are submitted for inclusion in the IEEE Xplore Digital Library after the conference. We also welcome presentations on new and emerging CGRA technologies from industry and startups. These will be presented at a special lightning session in the workshop. Please contact the workshop organizers (Send mail here) if you are interested in participating in this event.

Submit your paper HERE

Organization

Organizers

Artur Podobas (KTH, Sweden)
Kentaro Sano (RIKEN, Japan)
Jason Anderson (University of Toronto, Canada)
Tomohiro Ueno (RIKEN, Japan)

Program Committee

Andreas Koch, TU Darmstadt
Boma Anantasatya Adhi, RIKEN
Cheng Tan, Google/ASU
Christian Hochberger, TU Darmstadt
Elliot Delaye, AMD
Georgi Gaydadjiev, TU Delft
Hayden So, Univ. of Hong-Kong
Jens Domke, RIKEN CCS
Lingli Wang, Fudan Univ
Markus Weinhardt, HS Osnabruck
Nakashima Yasuhiko, Nara Institute of Science and Technology
Omar Ragheb, Fujitsu and Univ. of Toronto
Ryota Shioya, Univ. of Tokyo
Takuya Kojima, Univ. of Tokyo