Research Overview

The overall objective of my group's research is to develop new techniques, algorithms, and software systems that help (i) engineers to construct complex systems in less time with higher confidence of correctness, and (ii) scientists to develop models that can be used to gain deeper insights into physical and biological systems. The research work is within the intersection of (i) programming language and compiler research, (ii) real-time and cyber-physical systems, and (iii) machine learning with a focus on probabilistic methods.

A central part of this research is to develop programmatic modeling languages, that is, expressive formal modeling languages where model developers describe what the system should do, not exactly how it is executed. Models can be constructed for the system before its being build (engineering perspective) or models may describe abstractions of an already existing system, such as a biological system that is then used for analysis (science perspective).

Overall research questions include, but are not limited to:

(Q1) How can we formalize the semantics of such languages and prove correctness properties?
(Q2) How can we develop algorithms and compilation strategies that result in effective and scalable analysis tools (including inference and simulation)?
(Q3) How can language fragments and models be modularly defined and then composed in a sound and efficient manner?

The research concerns several abstraction levels, including semantic aspects of languages, efficiency of compilers, and properties of target computer architectures; either as custom research hardware or compilation to standard computer architectures, such as graphical processing units (GPUs). As a consequence—from a computer architecture perspective—David's research includes both hardware and software aspects, with a focus on the software.

My group's research is funded by a number of research projects, which include both theory (development of new algorithms and methods) and practice (development and dissemination of open-source software). Since we focus on fundamental research of algorithms and tools, we collaborate closely with domain experts, including both scientists (e.g., within evolutionary biology) and industry (engineering companies within telecom and manufacturing technology). The overall research can be divided into three research areas:

Research Area 1: Differentiable Probabilistic Programming Languages
Research Area 2: Modular Meta-Programming Systems
Research Area 3: Predictable and Timed Systems

These three areas are presented below. Besides the technical research, David has also been involved in pedagogical research. See for instance our work on the company approach to software engineering projects [IEEE IEEE Transactions on Education 2012], and assessment models for large project courses [SIGCSE 2014].

Research Area 1: Differentiable Probabilistic Programming Languages
This research area focuses on the area of combining differentiable programming (the ability to make use of differentiable functions and automatic differentiation directly within programming languages) and probabilistic programming (a generalized approach to Bayesian networks where programmatic probabilistic models can be encoded within Turing complete languages). Some highlights within this area are:

Probabilistic Programming and Machine Learning. We develop algorithms, semantics, and compilers for probabilistic programming languages in general. Recent results include work on delayed sampling [AISTATS 2018], proving correctness of Sequential Monte Carlo (SMC) inference within PPLs [ESOP 2021], and efficient compilation of universal probabilistic programming of SMC to GPUs [ESOP 2022]. We have earlier worked on non-PPL-based supervised and unsupervised learning methods in the context of software engineering and automated bug assignment [Empirical Software Engineering 2016] and as generalized methods for parallelization of Latent Dirichlet allocation models [Journal of Computational and Graphical Statistics 2017]. Recently, we showed for the first time how universal probabilistic programming offers a new, unique, and a powerful approach to statistical phylogenetics, enabling rapid modeling that was not possible before [Communication Biology 2021].
Differentiable and Equation-Based Languages. We have for a long time been developing static and dynamic semantics for equation-based object-oriented (EOO) modeling languages, which are based on differentiable-algebraic equations. I have been part of the Modelica language design group since 2005. Some results and contributions are on types [Modelica 2006], [GPCE 2006], higher-order models [SNE 2009], connection semantics [PADL 2012], and some involvement in the open-source tool OpenModelica [SNE 2005]. We are currently working on incorporating automatic differentiation as first-class citizens in languages, both in an efficient and semantically correct way.

Key areas where we are actively doing research in: (i) defining a new domain-specific modeling language within phylogenetics, (ii) develop more domain-specific compilation techniques for both automatic differentiation and probabilistic inference, and (iii) develop reinforcement techniques that incorporate differentiable probabilistic programming concepts, including equation-based modeling and simulation.

Research Area 2: Modular and Efficient Meta-Programming Systems
The second research area focuses on theory and practical approaches for constructing modular programming systems, where language fragments, models, and model instances can be composed together in a sound and efficient manner.

The Miking System. Central for our current group's activities is the open-source platform Miking that we use as a research platform (see the Vision paper in [SLE 2019] or the Github repositories). We develop new techniques for composability and a technique called Resolvable Ambiguity [CC 2021], as well as a model for interactive programmatic modeling [TECS 2021]. A central part of this line of work is the ability to compose language fragments so that new domain-specific languages can be rapidly defined without starting from scratch. For instance, the work on probabilistic programming in Area 1 (published in ESOP 2021 and 2022) is developed on top of the Miking framework. We base many aspects of the Miking framework on experiences and ideas from David's previous system called Modelyze (see [PEPM 2018] and www.modelyze.org).
Co-simulation. Another aspect of modular systems is co-simulation. We have done work in this area for many years, which has resulted in several visible publications, including determinate composition for co-simulation [EMSOFT 2013], techniques for hybrid co-simulation [SoSym 2019], and requirements for hybrid co-simulation [HSCC 2015], and the leading survey on the topic [ACM Computing Survey in 2019].

Key areas where we currently do research are: (i) automatic tuning techniques to improve compilation performance, (ii) acceleration and domain-specific compilation to GPUs, (iii) theories for language fragment composition, both at syntax and semantic level.

Research Area 3: Predictable and Timed Systems
The third research area includes languages and compilers, specially targeted for real-time systems, and hardware designed for predictability. These two abstractions meet at the compiler level, where compilers can be designed to improve predictability and timing aspects.

Timed Programming Languages and compilers. A line of work that our group has recently been developing is language primitives for efficient and simple real-time programming in standard programming languages. In particular, we have developed a language extension called Timed C [RTAS 2018 paper], where the standard C language is extended with a small set of timing primitives. We have also developed end-to-end tools for Timed C [RTSS 2019]. We have also investigated WCET-aware code mapping techniques [TECS 2017] and models for approximate synchrony [CAV 2015].
Predictable Processors and Mixed-Critical Systems. Another line of research has focused on what is called Precision Timed (PRET) machines, mainly developed at UC Berkeley. An essential step in this development is the FlexPRET architecture [RTAS 2014], a predictable RISC-V-based softcore processor for mixed-critical systems. We have also developed predictable DRAM controllers [RTAS 2015] and models for relaxing the synchronous approach for mixed-critical systems [RTAS 2014].

During the past years, the group's research has focused mostly on compiler and language perspectives in general, and especially on research related to Timed C. Out of the three research areas, there are fewer planned activities within Area 3. However, one important aspect is connecting predictable and timed programming with co-simulation aspects and differentiable programming. There are both interesting semantic problems and compiler challenges.

This is a personal web page. More information.