# Scheduling and compiling rate-synchronous programs with end-to-end latency constraints

Timothy Bourke

Inria Paris — PARKAS Team École normale supérieure, PSL University

21 October 2025, Systèmes réactifs synchrones MPRI 2-23-1

### Plan

#### Rate-Synchronous Model

The Rosace Example

Flowgraphs and Scheduling

End-to-end Latency

Code Generation

Avoiding Cycles during Sequencing

Scheduling Complications

Multi-threaded Scheduling

Conclusio

#### Context

- Standard practice: design an application as a set of periodically executed tasks that communicate through shared variables.
- Read data from sensors via a bus, compute through sequences of cyclic tasks, and write to actuators via the bus.

### Context

- Standard practice: design an application as a set of periodically executed tasks that communicate through shared variables.
- Read data from sensors via a bus, compute through sequences of cyclic tasks, and write to actuators via the bus.

#### Airbus project "All-in-Lustre"

- Current system: task = Lustre node ( $\approx 5\,000$ ), separate constraints on order and latency.
- Desired system: "All-in-Lustre": compose nodes into a single Lustre program with new features for specifying periods and execution constraints.
- Generate sequential code for cyclic execution on a single-processor platform.
- Base period = 5ms. Tasks at 10ms, 20ms, 40ms, and 120ms.
- Tasks are already chopped up into small pieces.

```
node read() returns (y:int);
node write(x:int) returns ();
node filter(x:int) returns (y:int);
node main() returns ()
var s0, s1, s2, s3 : int :: 1/3;
let
    s0 = read();
    s1 = filter(s0):
    s2 = filter(s1);
    s3 = s1 + s2;
    () = write(s3);
tel
```

- Declare variables of rate 1/3 (period = 3)
- Calculate each one once every three cycles

```
$ presseail example1.ail -v --glpk --print
```

```
node read() returns (y:int);
node write(x:int) returns ();
node filter(x:int) returns (y:int);
node main() returns ()
var s0, s1, s2, s3 : int :: 1/3;
let
    s0 = read();
    s1 = filter(s0):
    s2 = filter(s1):
    s3 = s1 + s2;
    () = write(s3);
tel
```

- Declare variables of rate 1/3 (period = 3)
- Calculate each one once every three cycles

\$ presseail example1.ail --glpk --compile 1 --print

```
node read() returns (y:int);
node write(x:int) returns ();
node filter(x:int) returns (y:int);
node main() returns ()
var s0, s1, s2, s3 : int :: 1/3;
let
    s0 = read():
    s1 = filter(s0):
    s2 = filter(s1):
    s3 = s1 + s2;
    () = write(s3);
tel
```

- Declare variables of rate 1/3 (period = 3)
- Calculate each one once every three cycles
- The 5 calculations here are synchronous relative to the period
   even if they are not necessarily simultaneous relative to the base clock
- s3 = s1 + s2 is well clocked since s1 :: 1/3, s2 :: 1/3, and s3 :: 1/3.

```
$ presseail example1.ail --glpk --compile 1 --print
```

```
node read() returns (y:int);
node write(x:int) returns ();
node filter(x:int) returns (y:int);
node main() returns ()
var s0, s1, s2, s3 : int :: 1/3;
let
    s0 = read():
    s1 = filter(s0):
    s2 = filter(s1):
    s3 = s1 + s2;
    () = write(s3);
tel
```

- Declare variables of rate 1/3 (period = 3)
- Calculate each one once every three cycles
- The 5 calculations here are synchronous relative to the period
   even if they are not necessarily simultaneous relative to the base clock
- s3 = s1 + s2 is well clocked since s1 :: 1/3, s2 :: 1/3, and s3 :: 1/3.
- Causality applies 'across' a period and 'within' an instant:  $s_0 \rightarrow s_1 \rightarrow s_2 \rightarrow s_3 \rightarrow$  ()

```
$ presseail example1.ail --glpk --compile 1 --print
```

```
resource cpu : int
                                            x when c
 node filter(x : int)
                                             » c is for '(sampling) choice'
   returns (y : int);
                                             » sub-sampling of a stream
 node main(s0 : int)
                                             » fast-to-slow rate change
 returns (s4 : int)
 var s1, s2 : int :: 1/3;
                                            current(x, c)
     s3 : int :: 1/3 last = 0:
 let
                                             » stutter stream elements.
     s1 = filter(s0 when (0 % 3)):
                                             » must declare an initial last value
     s2 = filter(s1):
     s3 = filter(s2):
                                             » slow-to-fast rate change
     s4 = current(s3, (2 \% 3));
                                              y = merge c x ((0 fby y) when not c)
 tel
$ presseail example2.ail --glpk --compile 1
```







```
r = w when (i % n)
```

- (i % n): take the ith of every n elements.
- n is the rate of w relative to r
  E.g., for w :: 1/4 and r :: 1/8, n is 2.
- It can be deduced from the clocks, but is useful for readability.
- It implies a lower bound on the scheduling of the equation.

```
r = w \text{ when } (i \% n)
```

- (i % n): take the ith of every n elements.
- n is the rate of w relative to r
  E.g., for w :: 1/4 and r :: 1/8, n is 2.
- It can be deduced from the clocks, but is useful for readability.
- It implies a lower bound on the scheduling of the equation.

```
r = current(w, (i \% n))
```

- (i % n): repeat the initial last value i times, then repeat each w value n times.
- n is the rate of r relative to w
   E.g., for r :: 1/4 and w :: 1/8, n is 2.
- It implies an upper bound on the scheduling of the equation.

```
r = w \text{ when } (i \% n)
```

- (i % n): take the ith of every n elements.
- n is the rate of w relative to r
  E.g., for w :: 1/4 and r :: 1/8, n is 2.
- It can be deduced from the clocks, but is useful for readability.
- It implies a lower bound on the scheduling of the equation.

- r = current(w, (i % n))
- (i % n): repeat the initial last value i times, then repeat each w value n times.
- n is the rate of r relative to w
   E.g., for r :: 1/4 and w :: 1/8, n is 2.
- It implies an upper bound on the scheduling of the equation.

- Write (? % n) if we don't care when values are sampled/updated.
- The schedule decides when sampling/updating occur; fixed at compile time.



One slow tick is synchronous with three fast ones.



One slow tick is synchronous with three fast ones.



- Calculations are synchronous relative to their periods but not necessarily simultaneous relative to execution cycles
- The compiler assigns computations to phases, buffering values if necessary







- Like Prelude Forget, Boniol, Lesens, and Pagetti (2010):

  A Real-Time Architecture Design Language for Multi-Rate Embedded Control Systems

  But, no WCET, no deadlines, no real-time tasks
- Rates expressed as 1/n of the base clock



- Like Prelude Forget, Boniol, Lesens, and Pagetti (2010):

  A Real-Time Architecture Design Language for Multi-Rate Embedded Control Systems

  But, no WCET, no deadlines, no real-time tasks
- Rates expressed as 1/n of the base clock

- One or more step functions
- Called cyclically at the base rate



- Like Prelude Forget, Boniol, Lesens, and Pagetti (2010):

  A Real-Time Architecture Design Language for Multi-Rate Embedded Control Systems

  But, no WCET, no deadlines, no real-time tasks
- Rates expressed as 1/n of the base clock

- One or more step functions
- Called cyclically at the base rate



- Vertex = equation
- Arc from producer to consumer
- Independent of source language

- Like Prelude Forget, Boniol, Lesens, and Pagetti (2010):

  A Real-Time Architecture Design Language for Multi-Rate Embedded Control Systems

  But, no WCET, no deadlines, no real-time tasks
- Rates expressed as 1/n of the base clock

- One or more step functions
- Called cyclically at the base rate



- Vertex = equation
- Arc from producer to consumer
- Independent of source language

- Data dependencies
- Load balancing
- End-to-end latency

### Aside: fby or last

```
x = c fby e;
P

var nx : T last = c

nx = e;
P{last nx/x}
```

- c fby e initialized unit delay / register / delay c e
- last x previous value of initialized variable [Pouzet (2006): Lucid Synchrone, v. 3. ]
- Here: easier to work with last x



```
node eg1() returns ()
var vf : int :: 1;
    vs : int :: 1/3 last = 0;
    n : int :: 1 last = 0;
let
    n = (last n) + 1;
    vf = n + current(vs, (2 % 3));
    vs = (vf when (1 % 3)) + 5;
tel
```

| vf | 1 | 2 | 10 | 11 | 12 | 23 | 24 | 25 | 39 |  |
|----|---|---|----|----|----|----|----|----|----|--|
| n  | 1 | 2 | 3  | 4  | 5  | 6  | 7  | 8  | 9  |  |
| VS |   | 7 |    |    | 17 |    |    | 30 |    |  |



```
node eg1() returns ()
var vf : int :: 1;
    vs : int :: 1/3 last = 0;
    n : int :: 1 last = 0;
let
    n = (last n) + 1;
    vf = n + current(vs, (2 % 3));
    vs = (vf when (1 % 3)) + 5;
tel
```

| vf | 1 | ,2 | 10 | 11 | 12 | 23 | 24 | 25 | 39 |  |
|----|---|----|----|----|----|----|----|----|----|--|
| n  | 1 | (2 | 3  | 4  | 5  | 6  | 7  | 8  | 9  |  |
| VS |   | 7  |    |    | 17 |    |    |    |    |  |



```
node eg1() returns ()
var vf : int :: 1;
    vs : int :: 1/3 last = 0;
    n : int :: 1 last = 0;
let
    n = (last n) + 1;
    vf = n + current(vs, (2 % 3));
    vs = (vf when (1 % 3)) + 5;
tel
```

| vf | 1 | ,2  | 10         | 11 | 12 | 23 | 24 | 25 | 39 |  |
|----|---|-----|------------|----|----|----|----|----|----|--|
| n  | 1 | 2   | <b>J</b> 3 | 4  | 5  | 6  | 7  | 8  | 9  |  |
| VS |   | 17/ |            |    | 17 |    |    |    |    |  |



```
node eg1() returns ()
var vf : int :: 1;
    vs : int :: 1/3 last = 0;
    n : int :: 1 last = 0;
let
    n = (last n) + 1;
    vf = n + current(vs, (2 % 3));
    vs = (vf when (1 % 3)) + 5;
tel
```

| vf | 1 | ,2  | 10         | 11 | ,12 | 23 | 24 | 25 | 39 |  |
|----|---|-----|------------|----|-----|----|----|----|----|--|
| n  | 1 | 2   | <b>J</b> 3 | 4  | 5   | 6  | 7  | 8  | 9  |  |
| VS |   | 17/ |            |    | 17  |    |    | 30 |    |  |



```
node eg1() returns ()
var vf : int :: 1;
    vs : int :: 1/3 last = 0;
    n : int :: 1 last = 0;
let
    n = (last n) + 1;
    vf = n + current(vs, (2 % 3));
    vs = (vf when (1 % 3)) + 5;
tel
```

| vf | 1 | ,2 | 10             | 11 | ,12 | 23             | 24 | 25 | 39 |  |
|----|---|----|----------------|----|-----|----------------|----|----|----|--|
| n  | 1 | 2  | <del>]</del> 3 | 4  | 5   | <del>/</del> 6 | 7  | 8  | 9  |  |
| VS |   | 77 |                |    | 17/ | 7              |    | 30 |    |  |



```
node eg1() returns ()
var vf : int :: 1;
    vs : int :: 1/3 last = 0;
    n : int :: 1 last = 0;
let
    n = (last n) + 1;
    vf = n + current(vs, (2 % 3));
    vs = (vf when (1 % 3)) + 5;
tel
```

| vf | 1 | ,2  | 10         | 11 | ,12 | 23         | 24 | ,25 | 39 |  |
|----|---|-----|------------|----|-----|------------|----|-----|----|--|
| n  | 1 | (2  | <i>J</i> 3 | 4  | 5   | <b>7</b> 6 | 7  | 8   | 9  |  |
| VS |   | 17/ |            |    | 17/ | 7          |    | 30  |    |  |



```
node eg1() returns ()
var vf : int :: 1;
    vs : int :: 1/3 last = 0;
    n : int :: 1 last = 0;
let
    n = (last n) + 1;
    vf = n + current(vs, (2 % 3));
    vs = (vf when (1 % 3)) + 5;
tel
```

| vf | 1 | ,2  | 10         | 11 | ,12 | 23         | 24 | ,25 | 39         |  |
|----|---|-----|------------|----|-----|------------|----|-----|------------|--|
| n  | 1 | 2   | <b>7</b> 3 | 4  | 5   | <b>7</b> 6 | 7  | 8   | <b>7</b> 9 |  |
| VS |   | 17/ |            |    | 17/ |            |    | 30/ |            |  |

### Syntax

```
eq := x = e \mid x^* = f(e^*)
e := c \mid x \mid \diamond e \mid e \oplus e \mid \text{if } e \text{ then } e \text{ else } e \mid \text{last } x
      x when s | (last x) when s | current(x, s)
s := (c \% c) \mid (? \% c)
p := (d :)^*
d ::= resource x : ty
       node f((x:ty;)^*) returns ((x:ty;)^*) requires ((x=c;)^*)
      node f((x:ty::ck [last = c]:)^*) returns ((x:ty::ck [last = c]:)^*)
       \operatorname{var}(x:ty::ck [\operatorname{last}=c];)^*\operatorname{let}(((\operatorname{pragmas}\ eq)\mid cst);)^+\operatorname{tel}
pragmas ::= [label(x)] [phase(c \% c)]
cst := resource balance x
         resource x rel c
        latency (exists | forward | backward) rel c (x, x(x)^*)
rel ::= <= | < | = | > | >=
```

# Valid programs are defined by clock typing

$$\frac{e_1\,::\,{}^1\!/{\scriptscriptstyle n}\qquad e_2\,::\,{}^1\!/{\scriptscriptstyle n}}{e_1\oplus e_2\,::\,{}^1\!/{\scriptscriptstyle n}}$$

$$\frac{x :: \frac{1}{n}}{\text{last } x :: \frac{1}{n}}$$

$$\frac{x :: \frac{1}{m}}{x \text{ when } (\cdot \% n) :: \frac{1}{mn}}$$

$$\frac{x :: 1/mn}{\text{current}(x, (\cdot \% n)) :: 1/m}$$

- No phase offsets in clock types, unlike
- » Prelude: rate(100, 0)
  [Forget, Boniol, Lesens, and Pagetti (2010): A Real-Time Architecture Design Language for Multi-Rate Embedded Control Systems
- » Lucy-n: (010), 00(00100)
  [Cohen, Duranton, Eisenbeis, Pagetti, Plateau, and Pouzet (2006): N-Synchronous Kahn networks: a relaxed model of synchrony for real-time systems
- » 1-Synchronous: [0, 2]

  Flooss, Pouzet, Cohen, Potop-Butucaru, Souyris, Bregeon, and Baufreton (2020): 1-Synchronous Programming of Large Scale, Multi-Periodic Real-Time Applications with Functional Degrees of Freedom
- » Simulink: [Ts, To]
- Dataflow semantics: independent of phase offsets
- Generated code: phase offsets implement data dependencies.

# Valid programs are defined by clock typing



$$\frac{x :: \frac{1}{m}}{x \text{ when } (\cdot \% n) :: \frac{1}{mn}}$$

$$\frac{x :: 1/mn}{\text{current}(x, (\cdot \% n)) :: 1/m}$$

- No phase offsets in clock types, unlike
- » Prelude: rate(100, 0)
  [Forget, Boniol, Lesens, and Pagetti (2010): A Real-Time Architecture Design Language for Multi-Rate Embedded Control Systems
- » Lucy-n: (010), 00(00100)
  [Cohen, Duranton, Eisenbeis, Pagetti, Plateau, and Pouzet (2006): N-Synchronous Kahn networks: a relaxed model of synchrony for real-time systems
- » 1-Synchronous: [0, 2]

  Tlooss, Pouzet, Cohen, Potop-Butucaru, Souyris, Bregeon, and Baufreton (2020): 1-Synchronous Programming of Large Scale, Multi-Periodic Real-Time Applications with Functional Degrees of Freedom
- » Simulink: [Ts, To]
- Dataflow semantics: independent of phase offsets
- Generated code: phase offsets implement data dependencies.

### Related Work: Lucy-n

- Model Cohen, Duranton, Eisenbeis, Pagetti, Plateau, and Pouzet (2006): N-Synchronous Kahn networks: a relaxed model of synchrony for real-time systems and language Mandel, Plateau, and Pouzet (2010): Lucy-n: a n-Synchronous extension of Lustre
- Flexible scheduling patterns (0010(010)) and buffering
- Notion of jitter with clock envelopes [Cohen, Mandel, Plateau, and Pouzet (2008): Abstraction of Clocks in Synchronous Data-flow Systems
- Sophisticated type-based analysis for causality and buffer sizes
- Less focus on code generation

#### Our work

- Less flexible scheduling
- Buffering is implicit and very limited
- Less clock typing, more causality
- Generate imperative code

#### Related Work: looss et al.

- "1-synchronous" programs
- Tooss, Pouzet, Cohen, Potop-Butucaru, Souyris, Bregeon, and Baufreton (2020): 1-Synchronous Programming of Large Scale, Multi-Periodic Real-Time Applications with Functional Degrees of Freedom
- Two-element clocks: [phase, period]  $(0^k 10^{n-k-1} \text{ or } 0^k (10^{n-1}), \text{ where } n \text{ is the period and } 0 \le k < n \text{ is the phase}$
- Related to work on affine clocks
  - Curic (2005): Implementing Lustre Programs on Distributed Platforms Lwith Real-Time Constraints
  - Smarandache, Gautier, and Le Guernic (1999): Validation of Mixed Signal-Alpha Real-Time Systems through Affine Calculus on Clock Synchronisation Constraints
- Several operators: when, current, delay, delayfby, buffer, bufferfby
- Prototype in Heptagon: introduces (lots of) whens and merges

#### Our work

- Simpler clocks, fewer operators, implicit buffering
- Generate imperative code directly

# Prelude: Multi-periodic Sync. Prog.

Pagetti, Forget, Boniol, Cordovilla, and

Lesens (2011): Multi-task implementation

of multi-periodic synchronous programs

- Forget, Boniol, Lesens, and Pagetti (2010): Language
  - and compiler A Real-Time Architecture Design Language for Multi-Rate Embedded Control Systems
- Extend Lustre with task periods/phases and WCET.
- Compose real-time primitives to express communication patterns.
- Generate and schedule a set of real-time tasks
  - WCET, release times, deadlines
- Adapt existing scheduling algorithms to respect data dependencies
- "Don't Care" [Wyss, Boniol, Forget, and Pagetti (2012): A Synchronous Language with Partial Delay Specification for Real-Time Systems Programming ], Let the compiler decide if  $c \, dc \, x$  ( $c \, fby? \, x$ ) is
  - c fby x

#### Stream-based Semantics

$$\llbracket e_1 \oplus e_2 
rbracket (i) = \llbracket e_1 
rbracket (i) \oplus \llbracket e_2 
rbracket (i)$$

$$\llbracket exttt{last } x 
rbracket (i) = egin{cases} x_{-1} & ext{if } i = 0 \ \llbracket x 
rbracket (i-1) & ext{otherwise} \end{cases}$$

$$\llbracket x \text{ when } (s\%_n) \rrbracket (i) = \llbracket x \rrbracket (n \cdot i + s)$$

- Recursive equations on streams:  $\mathbb{N} o \mathbb{V}$
- $x_{-1}$  is the initial last value
- No explicit presence or absence
- Cf. Prelude (tagged-signal model)

$$\llbracket \operatorname{current}(x,(s_{n})) \rrbracket(i) = egin{cases} x_{-1} & \text{if } i < s \\ \llbracket x \rrbracket(\lfloor i - s/n \rfloor) & \text{otherwise} \end{cases}$$

#### Declare and constrain resources

```
resource cpu : int
node read() returns (v:int);
node write(x:int) returns ();
                                      • Declare named weights to represent resources
node filter(x:int) returns (v:int)
                                        required per cycle
  requires (cpu = 4);
                                       » Simple proxies for worst-case execution time
node main() returns ()
                                       » Network bus accesses
var s0, s1, s2, s3 : int :: 1/3;
let

    Use to constrain scheduling

    resource cpu <= 4;
    s0 = read():
    s1 = filter(s0):
                                      • normally: resource balance cpu
    s2 = filter(s1);
    s3 = filter(s2):
    () = write(s3):
tel
```

#### Declare and constrain resources

```
resource cpu : int
node read() returns (v:int);
node write(x:int) returns ();
                                      • Declare named weights to represent resources
node filter(x:int) returns (v:int)
                                         required per cycle
  requires (cpu = 4);
                                        » Simple proxies for worst-case execution time
node main() returns ()
                                        » Network bus accesses
var s0, s1, s2, s3 : int :: 1/3;
let

    Use to constrain scheduling

    resource cpu <= 4;
    s0 = read():
    s1 = filter(s0):
                                       • normally: resource balance cpu
    s2 = filter(s1);
    s3 = filter(s2):
    () = write(s3):
tel
Also: trade-off resource balancing against latency:
```

latency\_chain forward <= 0 (s0 -> s1 -> s2 -> s3);

#### **Direct Communications**

$$r = f(w)$$

$$w \longrightarrow f$$

- D<sub>f</sub><sup>w</sup>: Direct Write-before-read (forward concomitance)
- Dependency constraint:  $p: w \le p: r$
- $0 \le p:r p:w$



r = f(last w)



- D<sub>b</sub><sup>r</sup>: Direct Read-before-write (backward concomitance)
- Dependency constraint:  $p:r \le p:w$
- $0 \le p: w p: r$



#### Rate Transitions

$$r = f(w when (1 \% 3))$$

(i % n): take value i of every n
(? % n): take any of every n values



 /nf: Fast-to-slow (forward concomitance)



$$r = f(current(w, (1 % 3)))$$

(i % n): i initial values, then repeat n times



 \*<sub>nf</sub> | \*<sub>nb</sub> | \*<sub>n?</sub>: Slow-to-fast (forward or backward concomitance)



## Macro-scheduling using Integer Linear Programming (ILP)

#### Usual Workflow

- 1. \$ presseail example2.ail --write-lp example2.lp writes the scheduling constraints to a file
- 2. Call cplex
- 3. \$ presseail example2.ail --read-sol example2.sol --compile 1 reads the solution and generates code

## Macro-scheduling using Integer Linear Programming (ILP)

#### Usual Workflow

- 1. \$ presseail example2.ail --write-lp example2.lp writes the scheduling constraints to a file
- 2. Call cplex
- 3. \$ presseail example2.ail --read-sol example2.sol --compile 1 reads the solution and generates code

#### Testing simple examples

• \$ presseail example2.ail --glpk --compile 1

```
Minimize
 rmax.equ
```

```
Subject to
```

```
pw.def0.filter: pw.0.filter + pw.1.filter + pw.2.filter = 1
pw.def1.filter: 2 pw.2.filter + pw.1.filter - p.filter = 0
. . .
```

depd.wr.p.read.p.filter\_5: p.filter - p.read >= 0

. . .

rbnd.cpu\_8: 5 pw.0.filter\_1 + 5 pw.0.filter\_0 + 5 pw.0.filter <= 8 rbnd.cpu\_7: 5 pw.1.filter\_1 + 5 pw.1.filter\_0 + 5 pw.1.filter <= 8

rbnd.cpu\_6: 5 pw.2.filter\_1 + 5 pw.2.filter\_0 + 5 pw.2.filter <= 8 Bounds General

 $0 \le p.read \le 3$ 

p.read p.filter ...

. . .

0 <= p.filter < 3</pre> Binary

End

pw.0.read pw.1.read pw.2.read pw.0.filter ...

### Plan

Rate-Synchronous Model

The Rosace Example

Flowgraphs and Scheduling

End-to-end Latency

Code Generation

Avoiding Cycles during Sequencing

Scheduling Complications

Multi-threaded Scheduling

Conclusio

# The ROSACE Case Study [Pagetti, Saussié, Gratia, Noulard, and Siron (2014): The ROSACE Case Study: From Simulink Specification to Multi/Many-Core Execution



# The ROSACE Case Study [Pagetti, Saussié, Gratia, Noulard, and Siron (2014): The ROSACE Case Study: From Simulink Specification to Multi/Many-Core Execution



```
resource ops : int
node alt_hold (h_c, h_f : float) returns (vz_c : float) requires (ops = 201);
. . .
const H200 : rate = 1 / 2 (* base clock = 400Hz *)
const H100 : rate = 1 / 4
const H50 : rate = 1 / 8
const H10 : rate = 1 / 40
node assemblage1( h_c : float :: H10 last = 0.; (* altitude command *)
                                                                                    elevator
                                                                                    deflection
                 va_c : float :: H10 last = 0.) (* airspeed command *)
                                                                                    command
returns (d_th_c : float :: H50 last = 1.6402; (* throttle command *)
         d_e_c : float :: H50 last = 0.0186) (* elevator deflection command *)
var h_f : float :: H100; (* altitude *)
    vz_c : float :: H50: (* vertical speed command *)
let
   h_f = h_filter(h when (? % 2));
  vz_c = alt_hold( current(h_c, (? % 5)), h_f when (? % 2) );
                                                                                    throttle
  resource balance ops:
                                                                                    command
  latency assemblage exists <= 2
    (dynamics, h_filter, alt_hold, vz_control, elevator);
tel
```

200 Hz

50 Hz

# The ROSACE Case Study Pagetti, Saussiè, Gratia, Noulard, and Siron (2014): The ROSACE Case Study Study: From Simulink Specification to Multi/Many-Core Execution



```
node assemblage(h_c, va_c : float rate(100, 0))
returns (d_th_c, d_e_c : float)
var vz_c : float;
    d_e, th, h, az, va, q, vz : float;
    vz_f, va_f, h_f, az_f, q_f : float;
let
  (* 2.00 Hz *)
  d_e = elevator(d_e_c *^4);
  th = engine(d_{th_c} *^4);
  (va, az, q, vz, h) = dynamics(1.6402 fby th, 0.0186 fby d_e);
  (* 100Hz *)
  h f = h filter(h /^2):
  az_f = az_filter(az /^2); ...
  (* 50Hz *)
  vz_c = alt_hold(h_c *^5, h_f /^2);
  d_e_c = vz_control(vz_c, vz_f /^2, q_f ^/ 2,
                      az f /^2 2):
  d_{th_c} = va_{control}(va_c *^5, va_f /^2,
                       q_f /^2, vz_f /^2;
  (* dynamics \rightarrow h_filter \rightarrow alt_hold \rightarrow vz_control \rightarrow elevator <= 2 *)
  (* scheduled as real-time tasks with WCET, precedence, deadlines *)
tel
                                                                               25 / 72
```

```
node assemblage(h_c, va_c : float :: 1/40 last = 0.)
returns (d_th_c, d_e_c : float :: 1/8 last = (1.6402, 0.0186))
var vz_c : float :: 1/8;
    d_e, th, h, az, va, q, vz : float :: 1/2;
    vz_f, va_f, h_f, az_f, q_f : float :: 1/4;
let
  (* 200 Hz = 1/2 *)
 d_e = elevator(current(d_e_c, (? % 4)));
  th = engine(current(d_{th_c}, (? % 4)));
  (va, az, q, vz, h) = dynamics(th, d_e);
  (* 100 Hz = 1/4 *)
  h_f = h_filter(h when (? % 2));
  az_f = az_filter(az when (? % 2)); ...
  (* 50 Hz = 1/8 *)
  vz_c = alt_hold(current(h_c, (? \% 5)), h_f when (? \% 2));
  d_e_c = vz_control(vz_c, vz_f when (? % 2), q_f when (? % 2),
                     az f when (? % 2)):
  d_{th_c} = va_{control(current(va_c, (? % 5)), va_f when (? % 2),
                      q_f when (? % 2), vz_f when (? % 2));
  latency exists <= 2 (dynamics, h_filter, alt_hold, vz_control, elevator);</pre>
 resource balance ops;
tel
```

26 / 72

```
h_f = h_filter(h when (? % 2));
vz_c = alt_hold(current(h_c, (? % 5)),
h_f when (? % 2));
h_f = h_filter \qquad h_f \qquad alt_hold \qquad v_z_c \qquad b_f \qquad alt_hold \qquad b_f \qquad
```

```
h_f = h_filter(h when (? % 2));
 vz_c = alt_hold(current(h_c, (? % 5)),
                       h_f when (? % 2));
                                     alt_hold
                                    50 \, \text{Hz} = \frac{1}{8}
           100 \, \text{Hz} = \frac{1}{4}
p:alt_hold = 0
```

```
static int c_30 = 0;
 h_f = h_filter(h when (? % 2));
vz_c = alt_hold(current(h_c, (? \% 5)),
                                                    void step0()
                   h_f when (? % 2));
                                                         if (c 30 % 2 == 0) {
                                                             if (c_30 % 4 == 2) {
                               alt hold
           h filter
                                                                  h_filter(); // ***
                              50 \, \text{Hz} = \frac{1}{8}
         100 \, \text{Hz} = \frac{1}{4}
                                                         } else {
                                                         switch (c_30) {
                                                         case 2: va_control(): break:
                                                         case 6: alt_hold();
p:alt hold = 0
                                                                  vz_control();
                                                                  break:
                                                         c_{30} = (c_{30} + 1) \% 8:
                                                                                        27 / 72
```

static int  $c_30 = 0$ ;

```
h_f = h_filter(h when (? % 2));
vz_c = alt_hold(current(h_c, (? \% 5)),
                                                      void step0()
                    h_f when (? % 2));
                                                           if (c 30 % 2 == 0) {
                                                               if (c_30 % 4 == 2) {
                                alt hold
                                                                    h filter(): // ***
                               50 \, \text{Hz} = \frac{1}{8}
         100 \, \text{Hz} = \frac{1}{4}
                                                           } else {
                                                           switch (c_30) {
                                                           case 2: va_control(); break;
                                                           case 6: alt_hold();
p:alt hold = 0
                                                                    vz_control();
                                                                    break:

    Source: dataflow semantics

                                                           c_{30} = (c_{30} + 1) \% 8:

    Target: C code implicitly writing and reading

   static variables
                                                                                           27 / 72
```

static int  $c_30 = 0$ :

```
h_f = h_filter(h when (? % 2));
vz_c = alt_hold(current(h_c, (? \% 5)),
                                                  void step0()
                  h_f when (? % 2));
                                                       if (c 30 % 2 == 0) {
                                                           if (c_30 % 4 == 2) {
                              alt hold
                                                                h filter(): // ***
                             50 \, \text{Hz} = \frac{1}{8}
        100 \, \text{Hz} = \frac{1}{4}
                                                      switch (c_30) {
                                                       case 2: va_control(); break;
                                                       case 6: alt_hold();
p:alt hold = 0 1
                                                                vz_control();
                                                                break:

    Source: dataflow semantics

                                                       c_{30} = (c_{30} + 1) \% 8:

    Target: C code implicitly writing and reading

  static variables
                                                                                     27 / 72
```

```
static int c_30 = 0;
 h_f = h_filter(h when (? % 2));
vz_c = alt_hold(current(h_c, (? \% 5)),
                                                     void step0()
                    h_f when (? % 2));
                                                          if (c 30 % 2 == 0) {
                                                               if (c_30 \% 4 == 2) {
                                alt hold
                                                                    h_filter();
                               50 \, \text{Hz} = \frac{1}{8}
         100 \, \text{Hz} = \frac{1}{4}
                                                          } else {
                                                          switch (c 30) {
                                                          case 2: va_control(); break;
                                                          case 6: alt_hold();
p:alt hold = 0
                                                                    vz_control();
                                                                    break:

    Source: dataflow semantics

                                                          c_{30} = (c_{30} + 1) \% 8:

    Target: C code implicitly writing and reading

   static variables
```

27 / 72

static int  $c_30 = 0$ ;

```
h_f = h_filter(h when (? % 2));
 vz_c = alt_hold(current(h_c, (? \% 5)),
                                                       void step0()
                    h_f when (? % 2));
                                                            switch (c 30) {
                                                            case 2: va_control(); break;
                                 alt hold
                                                            case \ alt_hold();
                                                                      vz_control();
                                50 \, \text{Hz} = \frac{1}{8}
                                                                      break:
          100 \, \text{Hz} = \frac{1}{4}
                                                            if (c_30 \% 2 == 0) {
                                                                 if (c_30 % 4 == 2)
                                                                      h filter():
p:alt hold = 0
                                                            } else {

    Source: dataflow semantics

                                                            c_{30} = (c_{30} + 1) \% 8:

    Target: C code implicitly writing and reading
```

static variables

```
static int c_30 = 0;
 h_f = h_filter(h when (? % 2));
vz_c = alt_hold(current(h_c, (? \% 5)),
                                                     void step0()
                    h_f when (? % 2));
                                                          switch (c 30) {
                                                          case 2: va_control(); break;
                                alt hold
                                                          case 6: alt_hold();
                                                                    vz_tontrol();
                               50 \, \text{Hz} = \frac{1}{8}
                                                                    break:
         100 \, \text{Hz} = \frac{1}{4}
                                                                       b (concomitance)
                                                          if (c_30 % 2 == 0) {
p:h_filter= 0
                                                               if (c_30 % 4 == 2) {
                                                                   h_filter(); // ***
p:alt hold = 0
                                                          } else {
                                                               . . .

    Source: dataflow semantics

                                                          c_{30} = (c_{30} + 1) \% 8:

    Target: C code implicitly writing and reading
```

static variables

#### Plan

Rate-Synchronous Model

The Rosace Example

Flowgraphs and Scheduling

End-to-end Latency

Code Generation

Avoiding Cycles during Sequencing

Scheduling Complications

Multi-threaded Scheduling

Conclusion



- Generate flowgraph from program
  - » r = w + 1 becomes  $w \rightarrow r$
  - »  $r = last w + 1 also becomes <math>w \rightarrow r$
- Annotate edges with
  - » Rate-transitions and data dependencies
  - » Concomitance: order within cycle
- Analyze the graph to identify cycles (strongly-connected components)
- Eliminate cycles
  - » Changing the concomitance
  - » Dropping data dependencies



- Generate flowgraph from program
  - » r = w + 1 becomes  $w \rightarrow r$
  - »  $r = last w + 1 also becomes <math>w \rightarrow r$
- Annotate edges with
  - » Rate-transitions and data dependencies
  - » Concomitance: order within cycle
- Analyze the graph to identify cycles (strongly-connected components)
- Eliminate cycles
  - » Changing the concomitance
  - » Dropping data dependencies



- Generate flowgraph from program
  - » r = w + 1 becomes  $w \rightarrow r$
  - »  $r = last w + 1 also becomes <math>w \rightarrow r$
- Annotate edges with
  - » Rate-transitions and data dependencies
  - » Concomitance: order within cycle
- Analyze the graph to identify cycles (strongly-connected components)
- Eliminate cycles
  - » Changing the concomitance
  - » Dropping data dependencies



- Generate flowgraph from program
  - » r = w + 1 becomes  $w \rightarrow r$
  - $\gg$  r = last w + 1 also becomes w 
    ightarrow r
- Annotate edges with
  - » Rate-transitions and data dependencies
  - » Concomitance: order within cycle
- Analyze the graph to identify cycles (strongly-connected components)
- Eliminate cycles
- » Changing the concomitance
- » Dropping data dependencies



- Generate flowgraph from program
  - » r = w + 1 becomes  $w \rightarrow r$
  - »  $r = last w + 1 also becomes <math>w \rightarrow r$
- Annotate edges with
  - » Rate-transitions and data dependencies
- » Concomitance: order within cycle
- Analyze the graph to identify cycles (strongly-connected components)
- Eliminate cycles
  - » Changing the concomitance
  - » Dropping data dependencies



- Generate flowgraph from program
  - » r = w + 1 becomes  $w \rightarrow r$
  - »  $r = last w + 1 also becomes <math>w \rightarrow r$
- Annotate edges with
  - » Rate-transitions and data dependencies
  - » Concomitance: order within cycle
- Analyze the graph to identify cycles (strongly-connected components)
- Eliminate cycles
- » Changing the concomitance
- » Dropping data dependencies



- Generate flowgraph from program
  - > r = w + 1 becomes  $w \rightarrow r$
  - »  $r = last w + 1 also becomes <math>w \rightarrow r$
- Annotate edges with
  - » Rate-transitions and data dependencies
  - » Concomitance: order within cycle
- Analyze the graph to identify cycles (strongly-connected components)
- Eliminate cycles
- » Changing the concomitance
- » Dropping data dependencies

## Flowgraph links

#### Schedule

|                   | ops  | phase |                |
|-------------------|------|-------|----------------|
| elevator          | 98   | 1 % 2 | (p: <i>e</i> ) |
| engine            | 82   | 0 % 2 |                |
| dynamics          | 1174 | 1%2   | (p:d)          |
| h_filter          | 38   | 2 % 4 | (p:h)          |
| az_filter         | 37   | 2 % 4 |                |
| ${\tt q\_filter}$ | 37   | 2 % 4 |                |
| $vz_filter$       | 37   | 2 % 4 |                |
| va_filter         | 38   | 2 % 4 |                |
| alt_hold          | 201  | 6%8   | (p: <i>a</i> ) |
| vz_control        | 88   | 6%8   | (p:v)          |
| va_control        | 90   | 2%8   |                |
|                   |      |       |                |

- Our ROSACE implementation
  - » Cycle period =  $2.5 \, \text{ms} (400 \, \text{Hz})$
  - » Allow load balancing of fastest components (200 Hz)
- » The ops resource estimates the computations required
- Assign each component a phase relative to its period (in terms of base cycles)
- Balance ops per cycle
- Respect end-to-end latency

#### Schedule



latency exists <= 2 (dynamics, h\_filter, alt\_hold, vz\_control, elevator);
latency exists <= 2 (d, h, a, v, e);</pre>

31 / 72

### Plan

Rate-Synchronous Model

The Rosace Example

Flowgraphs and Scheduling

End-to-end Latency

Code Generation

Avoiding Cycles during Sequencin

Scheduling Complications

Multi-threaded Scheduling

Conclusio

## Minimum Pairwise Latency: same period



- $eq_w \xrightarrow{\mathsf{D}^\mathsf{w}_\mathsf{f}} eq_r$
- Write-before-read:  $p: w \le p: r$
- $0 \le p:r-p:w < period$

#### Minimum Pairwise Latency: same period



• 
$$eq_w \xrightarrow{D_f^w} eq_r$$

- Write-before-read:  $p: w \le p: r$
- $0 \le p:r p:w < period$

Delayed communication: 
$$r = last w$$



• 
$$eq_w \xrightarrow{D_b^r} eq_r$$

- Read-before-write:  $p:r \le p:w$
- $0 < p:r p:w + period \le period$

## Minimum Pairwise Latency: same period



•  $eq_w \xrightarrow{D_f^w} eq_r$ 

•  $ea... \xrightarrow{D_b^r} ea...$ 

- Write-before-read:  $p: w \le p: r$
- $0 \le p:r-p:w < period$

Delayed communication: r = last w



- Read-before-write:  $p:r \le p:w$
- $0 < p:r-p:w+period \le period$

Unconstrained communication (r = last? w)

- $eq_w \xrightarrow{D_f^2} eq_r$ :  $(p:r-p:w+period(w)) \mod period(w)$
- $eq_w \xrightarrow{D_b^2} eq_r$ :  $((p:r-p:w+period(w)-1) \mod period(w))+1$

#### Minimum Pairwise Latency: fast-to-slow



#### Minimum Pairwise Latency: fast-to-slow



#### Minimum Pairwise Latency: slow-to-fast

r = current(w, (1 % 3)) 
$$eq_w \xrightarrow{*_n f} eq_r$$
  
((branch · period(r) + period(w) + p:r - p:w - 1) mod period(w)) + 1



#### Minimum Pairwise Latency: slow-to-fast

```
r = current(w, (1 % 3)) eq_w \xrightarrow{*_n f} eq_r
((branch · period(r) + period(w) + p:r - p:w - 1) mod period(w)) + 1
```



```
r = current(last w, (? % 3))?
```

- Not allowed. Not enough 'memories'.
- Must be normalized to

```
r = current(t, (? % 3));
t = last w:
```

## End-to-End Latency: the wrong way

- Define pairwise latencies for each link type.
- Chain them together into a sequence.
- Difficult to handle branching and dead ends.
- Difficult to explain.
- Complicated formulas.
- There's a better way...



























$$lat_{d,h} + lat_{h,a} + lat_{a,v} + lat_{v,e} \leq 2$$

#### Chains of constraints

latency forward/backward  $\leq B(\ldots, w, r, \ldots)$ 

#### Chains of constraints

latency forward/backward  $\leq B \; (\ldots, w, r, \ldots)$ For each link  $w \xrightarrow{s,c} r$ ,

#### Chains of constraints

latency forward/backward  $\leq B(\ldots, w, r, \ldots)$ 

For each link  $w \xrightarrow{s,c} r$ ,

$$0 \leq i: r < hp/period(r)$$

$$0 \le lat_{w,r} < L$$
 for  $c = f$  where  $L = period(r)$  if forward  $0 < lat_{w,r} \le L$  for  $c = b$  and  $L = period(w)$  if backward

$$0 \leq \operatorname{wrap}_{w,r} \leq 1$$
 if forward and  $s \notin \{D^w, *_n\}$  or if backward and  $s \notin \{D^w, /_n\}$ 

$$\operatorname{period}(w) \cdot i : w + p : w + \operatorname{lat}_{w,r} - hp \cdot \operatorname{wrap}_{w,r} = \operatorname{period}(r) \cdot i : r + p : r.$$

- Each intermediate instance is both a reader and a writer, and thus constrained by two equations.
- forward: attach chains from the top
- backward: attach chains to the bottom

## Showlatency demo



https://www.di.ens.fr/~bourke/showlat/showlatency.html

#### End-to-End Latency



Feiertag, Richter, Nordlander, and Jonsson (2008): A Compositional Framework for End-to-End Path Delay Calculation of Automotive Systems under Different Path Semantics

Girault, Prévot, Quinton, Henia, and Sordon (2018): Improving and Estimating the Precision of Bounds on the Worst-Case Latency of Task Chains

#### Plan

Rate-Synchronous Mode

The Rosace Example

Flowgraphs and Scheduling

End-to-end Latency

Code Generation

Avoiding Cycles during Sequencing

Scheduling Complications

Multi-threaded Scheduling

Conclusio

#### Manipulating flow graphs

#### Before scheduling

- 1. Construct flow graph from program
- 2. Remove potential cycles by flipping the concomitance
- 3. Use to generate ILP constraints (causality, end-to-end latency)

#### After scheduling: convert to dependency graph

- 1. Drop edges between equations that cannot execute in any phase.
- 2. Flip the cobackward (b) edges.
- 3. Use with standard algorithm to schedule equations within a step function.

# ROSACE example: flow graph → dependency graph

|                     | ops  | phase |
|---------------------|------|-------|
| elevator            | 98   | 1 % 2 |
| engine              | 82   | 0 % 2 |
| dynamics            | 1174 | 1 % 2 |
| h_filter            | 38   | 2 % 4 |
| az_filter           | 37   | 2 % 4 |
| $q_filter$          | 37   | 2 % 4 |
| $vz\_filter$        | 37   | 2 % 4 |
| va_filter           | 38   | 2 % 4 |
| alt_hold            | 201  | 6 % 8 |
| vz_control          | 88   | 6 % 8 |
| ${\tt va\_control}$ | 90   | 2 % 8 |



## ROSACE example: flow graph → dependency graph

|                     | ops  | phase |
|---------------------|------|-------|
| elevator            | 98   | 1 % 2 |
| engine              | 82   | 0 % 2 |
| dynamics            | 1174 | 1%2   |
| h_filter            | 38   | 2 % 4 |
| az_filter           | 37   | 2 % 4 |
| ${\tt q\_filter}$   | 37   | 2 % 4 |
| $vz\_filter$        | 37   | 2 % 4 |
| va_filter           | 38   | 2 % 4 |
| alt_hold            | 201  | 6%8   |
| vz_control          | 88   | 6%8   |
| ${\tt va\_control}$ | 90   | 2%8   |



- Remove all edges between equations that can never execute in the same phase.
- Reverse edges whose concomitance is b.

## ROSACE example: flow graph → dependency graph

|                      | ops  | phase |
|----------------------|------|-------|
| elevator             | 98   | 1%2   |
| engine               | 82   | 0 % 2 |
| dynamics             | 1174 | 1%2   |
| h_filter             | 38   | 2 % 4 |
| az_filter            | 37   | 2 % 4 |
| ${	t q}_{	t filter}$ | 37   | 2 % 4 |
| $vz\_filter$         | 37   | 2 % 4 |
| va_filter            | 38   | 2 % 4 |
| alt_hold             | 201  | 6%8   |
| vz_control           | 88   | 6%8   |
| $va\_control$        | 90   | 2%8   |



- Remove all edges between equations that can never execute in the same phase.
- Reverse edges whose concomitance is b.

## ROSACE example: generated code

```
static int c = 0:
static float h_c = 0, d_th_c = 1.6402, d_e_c = 0.0186, ...;
static float vz_c, ..., q_f;
void step0()
  if (c \% 2 == 0) {
    engine();
    if (c % 4 == 2) {
      vz_filter(); h_filter(); va_filter(); q_filter(); az_filter();
 } else {
    elevator(); dynamics();
  switch (c) {
  case 2: va_control(); break;
  case 6: alt_hold(); vz_control(); break;
  c = (c + 1) \% 8:
```

#### Code generation

#### Generalize the clock-directed scheme

Biernacki, Colaço, Hamon, and Pouzet (2008): Clock-directed modular code generation for synchronous data-flow languages

- --compile n generates n step functions
- » For the ith step function, step<sub>i</sub>, List .filter\_map equations by phase offset.
- Generate dependency graph ignoring variables not in step;
   macro-scheduling guarantees they will already have been calculated.
- » Micro-schedule equations in step; w.r.t. dependencies and phase offset/rate.
- Generate multiple Obc step methods, buffer values in state variables.
- Optimize the Obc by joining adjacent case statements.

#### Code generation: 2

#### Specialized case construct

```
case (state(c 3) mod 3) {
 0: { skip }
 1: { state(s2) := filter(state(s1)) }
 2: { skip }
                                                          case (state(c 3) mod 3) {
 else undefined
                                                            0: { state(s1) := filter(s0) }
                                                            1: { state(s2) := filter(state(s1)) }
                                                            2: { skip }
case (state(c 3) mod 3) {
 0: { state(s1) := filter(s0) }
                                                            else undefined
 1: { skip }
 2: { skip }
 else undefined
```

#### Code generation: 2

The 'else undefined' simplifies optimisation under (implicit) invariants

```
if (state(c 3) mod 24 = 7) {
case (state(c 3) mod 24) {
                                                           state(x) := read real()
 7: \{ state(x) := read real() \}
                                                    } else {
 23: { v := read real() }
                                                           y := read real()
 else undefined }
case (state(c 3) mod 24) {
                                                        case (state(c 3) mod 24) {
 7: \{ state(x) := read real() \}
                                                         7: \{ state(x) := read real() \}
 15: { skip }
                                                       15: { skip }
                                                          23: { y := read real() }
 23: \{ y := read \ real() \}
 else undefined }
                                                          else undefined }
```

#### Plan

Rate-Synchronous Model

The Rosace Example

Flowgraphs and Scheduling

End-to-end Latency

Code Generation

Avoiding Cycles during Sequencing

Scheduling Complications

Multi-threaded Scheduling

Conclusio

#### More Information

#### Scheduling and Compiling Rate-Synchronous Programs with End-To-End Latency Constraints

Timothy Bourke ⊠ ©

Ecole normale supérioure, PSL University, CNRS, Paris, France

Vincent Bregeon ⊠ Airbus Operations S.A.S., Toulouse, France

Ecole normale supérieure, PSL University, CNRS, Paris, Fennee Inria Paris. France

We present an extension of the synchronous-stactive model for specifying multi-rate systems. A set of periodically executed components and their communication dependencies are expressed in a are or personance constraint compensation and agree communication emphasization are expressed to a Lauter-like programming language with features for lead talenting, resource limiting, and specifying emblered latercies. The language abstracts from execution time and phase effects. This permits simple clock typing rules and a stream-based semantics, but requires each component to exceeds within an overall been period. A program is compiled to a single periodic task in two stages. First, Integer Linear Programming is used to determine plane offsets using standard encodings for dependencies and bad talasteing, and a nivel encoding for end-to-end latency. Second, a code coveration scheme is advanted to produce step functions. As a result, components are synchronous consistence occurring an acomposal to prosence stop interference, as a neutral, components are symmetrized relative to their respective rates, but not necessarily simultaneous relative to the base period. This approach has been implemented in a prototype compiler and validated on an industrial application.

2012 ACM Subject Classification Computer systems organization -> Renl-time languages; Computer systems organization -- Embedded software

Keywords and phrases synchronone-reactive, integer linear programming, code generation

Digital Object Identifier 10.4230/LIPIo.ECRTS-2023.1

#### 1 Introduction

Embedded control software is often designed as a set of components that each repeatedly sample inputs, compute a transition function, and update outputs. Such components must be scheduled so as to share processor resources while respecting timing and communication requirements. Scheduling determines how data propagates along chains of components from sensor acquisitions, through successive computations, to corresponding actuator emissions.

The end-to-end latencies of such chains are crucial to overall system performance. We characterize and extend an approach for developing avionics software based on the synchronous-reactive languages Lustre [29] and Scade [13]. Our application model comprises (i) a set of components whose execution rates are specified as unit fractions (1/n) of a base rate, and (ii) a graph of data flow between components. The Worst-Case Execution Time (WCET) of each component must be less than the base period. This is a significant restriction, but one that is acceptable for safety-critical axiomics applications. The implementation target is one of more acquential step functions called cyclically to, in turn, call individual component sten functions. Data is exchanged by reading and writing static variables. to and time behavior execution rates allow implementa-

#### **ECRTS 2023 Article**

- Details of language and ILP encoding
- Encoding for end-to-end latency constraints
- Adapt clock-directed modular code generation Biernacki, Colaco, Hamon, and Pouzet (2008): Clock-directed modular code generation for synchronous data-flow languages
- Decide concomitance before ILP scheduling

#### More Information

#### Scheduling and Compiling Rate-Synchronous Programs with End-To-End Latency Constraints

Timothy Bourke ⊠®

Ecole normale supérioure, PSL University, CNRS, Paris, France

Vincent Bregeon ⊠ Airbus Operations S.A.S., Toulouse, France

Ecole normale supérieure, PSL University, CNRS, Paris, Fennee

Incia Paris. France

chronous Programming-

#### Lustre, fast first and fresh

Timothy Bourke and Marc Pouzet

Abstract—The rate-synthronous model formatizes an indus-Assured—the case-systemotous most to-majors as most trial approach for comparing Lastre sodes that execute at true appreases yet companing causes gones that execute at different rates. Such programs are compiled to cyclic sequential no step functions are ordered for execution ycle of the generated code. By default, programs within a cycle or the generaton cone, by occasio, programs are deterministic; for any valid schedule, the generated code are neseronmone; for any same sensones are generated code; inhediates the values decreed by the source dataflow semantics at the specified rates. In practice, though, specifying precise values in the source program is sometimes unnecessary, impracticable. in the source program is someomes unincreasity, impostipation and overly constraining. In this case, the ILP constraints can and overty constraining, in the case, lost it, constraints of decides which dataflow semantics applies. Care is stiff required tolerates which canginos semantics appress. Care is sit to ensure that code generation remains deterministic. Index Terms\_Time-Centric Reactive Software, Dataflow Syn-

A Lustre [1] program comprises a hierarchy of nodes. which are functions that map streams of input values to streams of output values. Within a node, a set of equations defines the values of local and output signals using a simple grown a systematical point that the control of the careful point of the careful point of the careful point and memory. This is system. The fast-th-slow rate transition must then choose one



- II. A DETERMINISTIC RATE-SYNCHRONOUS LANGUAGE
- The rate-synchronous language was originally motivated by a flight control and guidance system comprising approximately 5000 nodes and over 120 000 named signals. The language's goal was to aflow the nodes to be instantiated within a single main node by providing constructions for specifying execution rates and resource constraints. The Prelude language [4], [5] was developed with the same goal and targets a set of tasks for execution by a real-time scheduler. In contrast, rate-synchrony abstracts from WCET and signal phases, and focuses on statically-scheduled sequential code generation.

g) Example: To illustrate the language, consider the simple system sketched in figure 1 using a syntax inspired by ectines the varies to next mis-suspen signate to-sig a single system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a soing a syntax imports of system suscepts in tigger a syntax in tigger a syntax in tigger a syntax imports of system suscepts in tigger a syntax in duation Simph whose directed area are labelled by signal and then processed by three successive filters  $f_1$ ,  $f_2$  and  $f_3$ . manes and whose vertexes specify computations. Nodes are given a synchronous scenarios so that they can be compiled fifted are to run three times slower than the base rate of the

#### **ECRTS 2023 Article**

- Details of language and ILP encoding
- Encoding for end-to-end latency constraints
- Adapt clock-directed modular code generation Biernacki, Colaco, Hamon, and Pouzet (2008): Clock-directed modular code generation for synchronous data-flow languages
- Decide concomitance before ILP scheduling

#### TCRS 2024 article

- Decide concomitance during ILP scheduling
- Extra constraints to avoid cycles in sequencing
- The fast first convention to avoid cycles

#### **Direct Communications**



$$p: w < p: r + cc_{w,r}$$

$$\begin{cases} p: w \leq p: r & \text{if } cc_{w,r} = 1 \text{ (forward)} \\ p: w < p: r & \text{if } cc_{w,r} = 0 \text{ (backward)} \end{cases}$$



$$r = f(last w)$$



$$p: r + cc_{w,r} \leq p: w$$

$$\begin{cases} p:r < p:w & \text{if } cc_{w,r} = 1 \text{ (forward)} \\ p:r \leq p:w & \text{if } cc_{w,r} = 0 \text{ (backward)} \end{cases}$$



#### Rate Transitions

$$r = f(w \text{ when } (1 \% 3))$$

(i % n): take value i of every n
(? % n): take any of every n values



$$r = f(current(w, (1 % 3)))$$

(i % n): i initial values, then repeat n times



$$(i-1) \cdot \operatorname{period}(r) + p:r \le p:w - \operatorname{cc}_{w,r}$$
  
 $< i \cdot \operatorname{period}(r) + p:r$ 



# Cycles

```
s1 = f1(s0 when (? % 3), s4 when (? % 3));
s2 = f2(s1);
s3 = f3(s2);
s4 = current(s3, (? % 3));
```

#### Cycles

```
s1 = f1(s0 \text{ when } (? \% 3), s4 \text{ when } (? \% 3));
s2 = f2(s1);
s3 = f3(s2);
s4 = current(s3, (? \% 3));

    Every flow has forward

                   concomitance (:f)
                 • If all equations are
         /3:f
         (s4)
                    scheduled in the same
                   phase, then code
                   generation fails because it
     *3:f
                   cannot break the cycle.
```

Classic encodings: see, e.g.,  $\begin{bmatrix} Baharev, Schichl, Neumaier, and Achterberg (2021): An \\ Exact Method for the Minimum Feedback Arc Set Problem \end{bmatrix}$  §3

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = \mathbf{0} \text{ if } (i,j) \not\in E \\ y_{i,j} = \mathbf{0} \text{: } i \text{ precedes } j \text{ in ordering)} \end{split}$$

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not \in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering)} \end{split}$$

• Encode linear ordering as a graph:

$$y_{j,i}=\mathbf{1}-y_{i,j}$$

- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not \in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering}) \end{split}$$

• Encode linear ordering as a graph:

- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not \in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering)} \end{split}$$

• Encode linear ordering as a graph:



- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not \in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering}) \end{split}$$

• Encode linear ordering as a graph:



- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not\in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering}) \end{split}$$

• Encode linear ordering as a graph:

- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not \in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering}) \end{split}$$

• Encode linear ordering as a graph:



- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not \in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering}) \end{split}$$

Encode linear ordering as a graph:



- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not\in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering)} \end{split}$$

• Encode linear ordering as a graph:



- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = \mathbf{0} \text{ if } (i,j) \not\in E \\ y_{i,j} = \mathbf{0} \text{: } i \text{ precedes } j \text{ in ordering)} \end{split}$$

• Encode linear ordering as a graph:



- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}}$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not\in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering}) \end{split}$$

• Encode linear ordering as a graph:



- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

Classic encodings: see, e.g., [Baharev, Schichl, Neumaier, and Achterberg (2021): An ] §3

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = \mathbf{0} \text{ if } (i,j) \not\in E \\ y_{i,j} = \mathbf{0} \text{: } i \text{ precedes } j \text{ in ordering)} \end{split}$$

• Encode linear ordering as a graph:







- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

Grötschel, Jünger, and Reinelt (1984): A Cutting Plane Algorithm for the Linear Ordering Problem

$$\min_{y} \sum_{j=1}^{m} w_{j} y_{j}$$
s.t. 
$$\sum_{j=1}^{m} a_{ij} y_{j} \ge 1 \qquad \text{for each } i = 1, 2, \dots, \ell$$

$$y_{j} \text{ is binary}$$

- $y_i = 1$ : arc j is in the feedback arc set
- $a_{ij} = 1$ : arc j participates in cycle i
- I elementary cycles, worst case  $\approx O(2^m)...$
- branch-and-cut:

[Grötschel, Jünger, and Reinelt (2022): Comments on "An Exact Method for the Minimum Feedback Arc Set Problem"

• cross fingers and enumerate?

Johnson (1975): Finding All the Elementary Circuits of a Directed Graph

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \end{bmatrix}}\ \S 3$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = \mathbf{0} \text{ if } (i,j) \not\in E \\ y_{i,j} = \mathbf{0} \text{ : } i \text{ precedes } j \text{ in ordering)} \end{split}$$

• Encode linear ordering as a graph:







- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

Grötschel, Jünger, and Reinelt (1984): A Cutting Plane Algorithm for the Linear Ordering Problem

$$\min_y \sum_{j=1}^m w_j y_j$$
 s.t. 
$$\sum_{j=1}^m a_{ij} y_j \ge 1 \qquad \text{ for each } i=1,2,\dots,\ell$$
 
$$y_j \text{ is binary}$$

- $y_i = 1$ : arc j is in the feedback arc set
- $a_{ij} = 1$ : arc j participates in cycle i
- I elementary cycles, worst case  $\approx O(2^m)...$
- branch-and-cut:

[Grötschel, Jünger, and Reinelt (2022): Comments on "An Exact Method for the Minimum Feedback Arc Set Problem"

• cross fingers and enumerate?

[Johnson (1975): Finding All the Elementary County of Director County of C

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \end{bmatrix}}\ \S 3$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = 0 \text{ if } (i,j) \not\in E \\ y_{i,j} = 0 \text{ : } i \text{ precedes } j \text{ in ordering)} \end{split}$$

• Encode linear ordering as a graph:







- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

Grötschel, Jünger, and Reinelt (1984): A Cutting Plane Algorithm for the Linear Ordering Problem

$$\min_y \sum_{j=1}^m w_j y_j$$
 s.t. 
$$\sum_{j=1}^m a_{ij} y_j \ge 1 \qquad \text{for each } i=1,2,\dots,\ell$$
 
$$y_j \text{ is binary}$$

- $y_i = 1$ : arc j is in the feedback arc set
- $a_{ii} = 1$ : arc j participates in cycle i
- I elementary cycles, worst case  $\approx O(2^m)...$
- branch-and-cut:

Grötschel, Jünger, and Reinelt (2022): Comments on "An Exact Method for the Minimum Feedback Arc Set Problem"

• cross fingers and enumerate?

Johnson (1975): Finding All the Elementary Circuits of a Directed Graph

 $Classic\ encodings:\ see,\ e.g.,\ {\tiny \begin{bmatrix} Baharev,\ Schichl,\ Neumaier,\ and\ Achterberg\ (2021):\ An\ \\ Exact\ Method\ for\ the\ Minimum\ Feedback\ Arc\ Set\ Problem\ \end{bmatrix}}\ \S 3$ 

$$\begin{split} \min_y \sum_{j=1}^n \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^n c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = \mathbf{0} \text{ if } (i,j) \not\in E \\ y_{i,j} = \mathbf{0} \text{: } i \text{ precedes } j \text{ in ordering)} \end{split}$$

Encode linear ordering as a graph:







- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

Grötschel, Jünger, and Reinelt (1984): A Cutting Plane Algorithm for the Linear Ordering Problem

$$\min_y \sum_{j=1}^m w_j y_j$$
 s.t. 
$$\sum_{j=1}^m a_{ij} y_j \geq 1 \qquad \text{ for each } i=1,2,\ldots,\ell$$
 
$$y_j \text{ is binary}$$

- $y_i = 1$ : arc j is in the feedback arc set
- $a_{ii} = 1$ : arc j participates in cycle i
- I elementary cycles, worst case  $\approx O(2^m)...$
- branch-and-cut:

[Grötschel, Jünger, and Reinelt (2022): Comments on "An Exact Method for the Minimum Feedback Arc Set Problem"

cross fingers and enumerate?
 Johnson (1975): Finding All the Elementary Circuits of a Directed Graph

Classic encodings: see, e.g., [Baharev, Schichl, Neumaier, and Achterberg (2021): An ] §3

$$\begin{split} \min_{y} \sum_{j=1}^{n} \left( \sum_{k=1}^{j-1} c_{k,j} y_{k,j} + \sum_{\ell=j+1}^{n} c_{\ell,j} (1-y_{j,\ell}) \right) \\ \text{subject to} \\ y_{i,j} + y_{j,k} - y_{i,k} \leq 1, \quad 1 \leq i < j < k \leq n \\ -y_{i,j} - y_{j,k} + y_{i,k} \leq 0, \quad 1 \leq i < j < k \leq n \\ y_{i,j} = \{0,1\}, \quad 1 \leq i < j \leq n. \\ (c_{i,j} = \mathbf{0} \text{ if } (i,j) \not\in E \\ y_{i,j} = \mathbf{0} \text{: } i \text{ precedes } j \text{ in ordering)} \end{split}$$

• Encode linear ordering as a graph:







- $O(n^2)$  binaries and  $O(n^3)$  constraints...
- branch-and-cut:

Grötschel, Jünger, and Reinelt (1984): A Cutting Plane Algorithm for the Linear Ordering Problem

$$\min_y \sum_{j=1}^m w_j y_j$$
 s.t. 
$$\sum_{j=1}^m a_{ij} y_j \geq 1 \qquad \text{ for each } i=1,2,\dots,\ell$$
 
$$y_j \text{ is binary}$$

- $y_i = 1$ : arc j is in the feedback arc set
- $a_{ij} = 1$ : arc j participates in cycle i
- I elementary cycles, worst case  $\approx O(2^m)...$
- branch-and-cut:

[Grötschel, Jünger, and Reinelt (2022): Comments on "An Exact Method for the Minimum Feedback Arc Set Problem"

cross fingers and enumerate?

[Johnson (1975): Finding All the Elementary Circuits of a Directed Graph

# Cycles: --fast-first

```
s1 = f1(s0 \text{ when } (? \% 3), s4 \text{ when } (? \% 3));
s2 = f2(s1);
s3 = f3(s2);
s4 = current(s3, (? \% 3));

    Every flow has forward

                   concomitance (:f)
                • If all equations are
        /3·f
        (s4)
                   scheduled in the same
                   phase, then code
 f3
                   generation fails because it
    *3:f
                   cannot break the cycle.
```

#### Cycles: --fast-first

```
s1 = f1(s0 \text{ when } (? \% 3), s4 \text{ when } (? \% 3));
   = f2(s1);
s3 = f3(s2):
s4 = current(s3, (? \% 3));

    Every flow has forward

                   concomitance (:f)

    If all equations are

        (s4)
                   scheduled in the same
                   phase, then code
                   generation fails because it
```

cannot break the cycle.

\*3.f



- With --fast-first, the flow between the last two equations, has backward concomitance (:b).
- Even if all equations are scheduled in the same phase, code generation succeeds because there is no cycle.

#### Plan

Rate-Synchronous Mode

The Rosace Example

Flowgraphs and Scheduling

End-to-end Latency

Code Generation

Avoiding Cycles during Sequencing

**Scheduling Complications** 

Multi-threaded Scheduling

Conclusio









| vf | 0 | 1 | 2 | 3 | 10 | 11 | 12 | 13 | 14 | 15 | 28 | 29 |  |
|----|---|---|---|---|----|----|----|----|----|----|----|----|--|
| С  | 0 | 1 | 2 | 3 | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 |  |
| VS |   |   |   | 6 |    |    | 18 |    |    |    |    |    |  |





| vf | 0 | 1, | 2 | 3 | 10         | 11 | 12 | 13 | 14 | 15 | 28          | 29 |  |
|----|---|----|---|---|------------|----|----|----|----|----|-------------|----|--|
| С  | 0 | 1  | 2 | 3 | <i>J</i> 4 | 5  | 6  | 7  | 8  | 9  | <b>/</b> 10 | 11 |  |
| VS |   |    | - | 6 |            |    |    |    |    |    |             |    |  |



# Causal dependencies C Vf Vs



| vf | 0 | 1 | 2 | 3 | 10         | 11 | 12 | 13 | 14 | 15 | 28 | 29 |  |
|----|---|---|---|---|------------|----|----|----|----|----|----|----|--|
| С  | 0 | 1 | 2 | 3 | <i>J</i> 4 | 5  | 6  | 7  | 8  | 9  | 10 | 11 |  |
| VS |   |   | - | 6 |            |    | 18 |    |    |    |    |    |  |

# Adding equations to relax constraints/add buffering

#### Hold slow around fast reads



#### Hold fast around fast writes



# Adding equations to relax constraints/add buffering

#### Hold slow around fast reads



#### Hold fast around fast writes





- Which programs are valid? I.e., which have a semantics?
- Consider causality across the least common multiple of periods.
- Implicit dependencies to past elements on same flow.

- Which programs are valid? I.e., which have a semantics?
- Consider causality across the least common multiple of periods.
- Implicit dependencies to past elements on same flow.

```
vf = current(vs, (1%2));
vs = vf when (0%2);
vf
```

- Which programs are valid? I.e., which have a semantics?
- Consider causality across the least common multiple of periods.
- Implicit dependencies to past elements on same flow.

```
vf = current(vs, (1%2)); vf = current(vs, (1%2)); vs = vf when (0%2); vs = vf when (1%2); vf vs = vf when (1%2); vs = vf when
```

- Which programs are valid? I.e., which have a semantics?
- Consider causality across the least common multiple of periods.
- Implicit dependencies to past elements on same flow.

```
vf = current(vs, (1%2)); vf = current(vs, (1%2)); vf = current(vs, (0%2)); vs = vf when (0%2); vs = vf when (1%2); vs = vf whe
```

Causality relations between  $x :: \alpha, y :: \frac{\alpha}{2}$ , and  $z :: \frac{\alpha}{4}$ .

## Causality: 2

Causality relations between  $x :: \alpha, y :: \frac{\alpha}{2}$ , and  $z :: \frac{\alpha}{4}$ .







# Causality: non-harmonic rates with 'shifting'



# Causality: non-harmonic rates with 'shifting'



# Causality: non-harmonic rates with 'shifting'



### Plan

Rate-Synchronous Model

The Rosace Example

Flowgraphs and Scheduling

End-to-end Latency

Code Generation

Avoiding Cycles during Sequencing

Scheduling Complications

Multi-threaded Scheduling

Conclusio

# Multi-threaded code generation: ongoing experiments

### Why not?...

- 1. Generate ILP: equation  $\mapsto$  thread & phase
- 2. Forbid inter-thread communication within a cycle

# Multi-threaded code generation: ongoing experiments

### Why not?...

- 1. Generate ILP: equation  $\mapsto$  thread & phase
- 2. Forbid inter-thread communication within a cycle



"42-cm M-Gerät 14 Kurze Marinekanone L/12"

## Multi-threaded code generation: ongoing experiments

### Why not?...

- 1. Generate ILP: equation  $\mapsto$  thread & phase
- 2. Forbid inter-thread communication within a cycle

#### because

- the number of constraints explodes and the ILP solver may not be able to find a solution
- delayed communications may accumulate and increase end-to-end latency



"42-cm M-Gerät 14 Kurze Marinekanone L/12"

## Threads and phases

```
Source program: w = e; r = f(w);
```

$$(t_w \neq t_r) \wedge (p_w = p_r)$$
: extra synchronization required

| thread 1                                     | thread 2                                           |
|----------------------------------------------|----------------------------------------------------|
| • • •                                        | •••                                                |
| if (c $\%$ 2 == 0) { w = e; sem_post(wok); } | if (c % 2 == 0) { $sem_wait(wok)$ ; $r = f(w)$ ; } |
| •••                                          | •••                                                |

## Threads and phases

Source program: 
$$w = e$$
;  $r = f(w)$ ;

$$(t_w \neq t_r) \land (p_w = p_r)$$
: extra synchronization required

| thread 1                                    | thread 2                                           |
|---------------------------------------------|----------------------------------------------------|
| •••                                         | •••                                                |
| if (c % 2 == 0) { $w = e; sem_post(wok); }$ | if (c % 2 == 0) { $sem_wait(wok)$ ; $r = f(w)$ ; } |
| •••                                         | •••                                                |

### $(p_w \neq p_r)$ : no race condition

| thread 1                      | thread 2                         |
|-------------------------------|----------------------------------|
| • • •                         | • • •                            |
| if (c % 2 == 0) { $w = e$ ; } | if (c % 2 == 1) { $r = f(w)$ ; } |
| • • •                         | • • •                            |

## Threads and phases

Source program: w = e; r = f(w);

$$(t_w \neq t_r) \land (p_w = p_r)$$
: extra synchronization required

| thread 1                                    | thread 2                                       |
|---------------------------------------------|------------------------------------------------|
| • • •                                       | •••                                            |
| if (c % 2 == 0) { $w = e; sem_post(wok); }$ | if (c % 2 == 0) { $sem_wait(wok); r = f(w); }$ |
| •••                                         | •••                                            |

### $(p_w \neq p_r)$ : no race condition

| thread 1                      | thread 2                         |
|-------------------------------|----------------------------------|
| • • •                         | • • •                            |
| if (c $\%$ 2 == 0) { w = e; } | if (c % 2 == 1) { $r = f(w)$ ; } |
| • • •                         | • • •                            |

for now, require:  $t_w = t_r \lor p_w \neq p_r$  (may not be possible)

## Constraints: same thread or different phase

require: 
$$t_w = t_r \lor p_w \neq p_r$$

- Try a completely boolean encoding using phase/thread weights?
- w :: 1/4, r :: 1/12, --nthreads 2

```
th:0:ph:0:w + th:1:ph:0:r + th:1:ph:4:r + th:1:ph:8:r \le 1
 th:0:ph:1:w + th:1:ph:1:r + th:1:ph:5:r + th:1:ph:9:r < 1
th:0:ph:2:w + th:1:ph:2:r + th:1:ph:6:r + th:1:ph:10:r < 1
 th:1:ph:0:w + th:0:ph:0:r + th:0:ph:4:r + th:0:ph:8:r < 1
 th:1:ph:1:w + th:0:ph:1:r + th:0:ph:5:r + th:0:ph:9:r < 1
th:1:ph:2:w + th:0:ph:2:r + th:0:ph:6:r + th:0:ph:10:r < 1
```

Not very linear. . .

#### Resource Constraints

```
resource cpu : int
node f1(x : int) returns (y : int) requires (cpu = 5);
node f2(x : int) returns (y : int) requires (cpu = 2);
node f3(x : int) returns (y : int) requires (cpu = 2);
s2 = f2(s1);
s3 = f3(s2);
s4 = current(s3, (2 % 3));

resource balance cpu;
tel
```

### Existing encoding: per cycle

```
pw.def0.f1: pw.ph.0.f1 + pw.ph.1.f1 + pw.ph.2.f1 = 1
pw.def1.f1: -1 p.f1 + 2 pw.ph.2.f1 + pw.ph.1.f1 = 0
...
rsum.ph.0.cpu: rsum.ph.0.cpu - 2 pw.ph.0.f3 - 2 pw.ph.0.f2 - 5 pw.ph.0.f1 = 0
rsum.ph.1.cpu: rsum.ph.1.cpu - 2 pw.ph.1.f3 - 2 pw.ph.1.f2 - 5 pw.ph.1.f1 = 0
rsum.ph.2.cpu: rsum.ph.2.cpu - 2 pw.ph.2.f3 - 2 pw.ph.2.f2 - 5 pw.ph.2.f1 = 0
```

#### Resource Constraints

#### New possibility: per thread per cycle

```
. . .
tw.def1.thread.0: tw.1.thread.0 - thread.0 = 0
tw.def0.thread.0: tw.0.thread.0 + tw.1.thread.0 = 1
. . .
pw.def0.f1: pw.th.0.ph.0.f1 + pw.th.0.ph.1.f1 + pw.th.0.ph.2.f1
            + pw.th.1.ph.0.f1 + pw.th.1.ph.1.f1 + pw.th.1.ph.2.f1 = 1
pw.def1.f1: -1 p.f1 + 5 pw.th.1.ph.2.f1 + 4 pw.th.1.ph.1.f1
            + 3 pw.th.1.ph.0.f1 + 2 pw.th.0.ph.2.f1 + pw.th.0.ph.1.f1
            -3 thread.0 = 0
. . .
rsum.th.0.ph.0.cpu: rsum.th.0.ph.0.cpu - 2 pw.th.0.ph.0.f3
                    -2 pw.th.0.ph.0.f2 - 5 pw.th.0.ph.0.f1 = 0
rsum.th.0.ph.1.cpu: rsum.th.0.ph.1.cpu - 2 pw.th.0.ph.1.f3
                    -2 pw.th.0.ph.1.f2 - 5 pw.th.0.ph.1.f1 = 0
rsum.th.1.ph.0.cpu: rsum.th.1.ph.0.cpu - 2 pw.th.1.ph.0.f3
                    -2 pw.th.1.ph.0.f2 - 5 pw.th.1.ph.0.f1 = 0
. . .
```

# Preliminary Experiments on largest Airbus case-study

- Oct. 2024: It does not work
  - » cplex: runs for days until out-of-memory
  - » set mip strategy file 3: runs for longer until dying

# Preliminary Experiments on largest Airbus case-study

- Oct. 2024: It does not work
  - » cplex: runs for days until out-of-memory
  - » set mip strategy file 3: runs for longer until dying
- Nov. 2024: It works a little bit
  - » Schedule on one thread and generate a MIP start file
  - » Scheduling on two cores then starts from a valid solution, which the optimizer can iteratively improve.
  - » Solution after  $\approx$ 35 hours

## Preliminary Experiments on largest Airbus case-study

- Oct. 2024: It does not work
- » cplex: runs for days until out-of-memory
- » set mip strategy file 3: runs for longer until dying
- Nov. 2024: It works a little bit
  - » Schedule on one thread and generate a MIP start file
  - » Scheduling on two cores then starts from a valid solution, which the optimizer can iteratively improve.
  - ≫ Solution after  $\approx$ 35 hours

- Mar. 2025: "Boolean" encoding for same-thread or different-phase
  - » cplex: 2 hours
  - » cp-sat (with L. Sylvestre): 20 minutes
- Ongoing: Pure SAT encoding (with L. Sylvestre)
  - » Translate pseudo-boolean constraints following Eén and Sörensson (suggested by K. Claessen and M. Sheeran)
  - » Re-encode end-to-end latency constraints



## Graph clustering prior to ILP

- Cluster flowgraph using metis<sup>1</sup> from Uni. Minnesota.
- Assign all equations in the same cluster to a single thread.
- Many partitions: not very effective at reducing solve time
- Few partitions: seems to preclude finding a valid schedule
- » E.g., with four partitions, we can rapidly test all combinations for feasability: 1100, 0110, 1010
- Cluster in a different way? Feedback from ILP solver?

https://github.com/karypislab/metis

## WiP: Inter-thread communications for fast equations

- Same thread or different phase: only possible if 1/n < 1.
- Prevents splitting writer/reader pairs when either is on the base clock

## WiP: Inter-thread communications for fast equations

- Same thread or different phase: only possible if 1/n < 1.
- Prevents splitting writer/reader pairs when either is on the base clock
- Current work: add a synchronization barrier

| thread 1                        | thread 2                           |
|---------------------------------|------------------------------------|
| w0 = e0;                        | • • •                              |
| • • •                           |                                    |
| sync();                         | sync();                            |
|                                 | r0 = f(w0);                        |
| if (c % 2 == 0) { $w1 = e1$ ; } | if (c % 2 == 1) { $r1 = f(w1)$ ; } |
| • • •                           | •••                                |

## WiP: Inter-thread communications for fast equations

- Same thread or different phase: only possible if 1/n < 1.
- Prevents splitting writer/reader pairs when either is on the base clock
- Current work: add a synchronization barrier

| thread 1                        | thread 2                                                |
|---------------------------------|---------------------------------------------------------|
| w0 = e0;                        | •••                                                     |
| • • •                           |                                                         |
| sync();                         | sync();                                                 |
|                                 | r0 = f(w0);                                             |
| if (c $\%$ 2 == 0) { w1 = e1; } | $sync();$ $r0 = f(w0);$ if (c % 2 == 1) { r1 = f(w1); } |
| • • •                           | • • •                                                   |

- Add (yet more) 0-1 variables to indicate before/after barrier.
- Add constraints for "same thread or different phase/side"
- Need to balance resources used before the barrier to minimize waiting.

## Plan

Rate-Synchronous Model

The Rosace Example

Flowgraphs and Scheduling

End-to-end Latency

Code Generation

Avoiding Cycles during Sequencing

Scheduling Complications

Multi-threaded Scheduling

Conclusion

#### Conclusion

- Block diagram language with deterministic stream semantics
- Scheduled using constraint solvers (cplex or cp-sat)
  - » Resource constraints and balancing
  - » End-to-end latency constraints
  - » Schedules have differing resource use, but calculate the same input/output function
- Prototype compiler with basic code generation.
- Tested on Airbus example with 5000 nodes (compiles in approx. 45 minutes).

#### References I

- Baharev, A., H. Schichl, A. Neumaier, and T. Achterberg (Apr. 2021). "An Exact Method for the Minimum Feedback Arc Set Problem". In: ACM J. Experimental Algorithmics 26.1, article 4.
- Biernacki, D., J.-L. Colaço, G. Hamon, and M. Pouzet (June 2008). "Clock-directed modular code generation for synchronous data-flow languages". In: Proc. 9th ACM SIGPLAN Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES 2008). Tucson, AZ, USA: ACM Press, pp. 121–130.
- Cohen, A., M. Duranton, C. Eisenbeis, C. Pagetti, F. Plateau, and M. Pouzet (Jan. 2006).
   "N-Synchronous Kahn networks: a relaxed model of synchrony for real-time systems". In: Proc. 33rd ACM SIGPLAN-SIGACT Symp. Principles of Programming Languages (POPL 2006). Charleston, SC, USA: ACM Press, pp. 180–193.
- Cohen, A., L. Mandel, F. Plateau, and M. Pouzet (Dec. 2008). "Abstraction of Clocks in Synchronous Data-flow Systems". In: Proc. 6th Asian Symp. Programming Languages and Systems (APLAS 2008).
   Ed. by G. Ramalingam. Vol. 5356. LNCS. Bangalore, India, pp. 237–254.
- Curic, A. (Sept. 2005). "Implementing Lustre Programs on Distributed Platforms with Real-Time Constraints". PhD thesis. Grenoble, France: Université Joseph Fourier.

I

#### References II

- Feiertag, N., K. Richter, J. Nordlander, and J. Jonsson (Nov. 2008). "A Compositional Framework for End-to-End Path Delay Calculation of Automotive Systems under Different Path Semantics". In: Workshop on Compositional Theory and Technology for Real-Time Embedded Systems (CRTS 2008, co-located with RTSS 2008). Barcelona, Spain.
- Forget, J., F. Boniol, D. Lesens, and C. Pagetti (Dec. 2008). "A Multi-Periodic Synchronous Data-Flow Language". In: Proc. 11th IEEE High Assurance Systems Engineering Symposium (HASE 2008). Nanjing, China, pp. 251–260.
- (Mar. 2010). "A Real-Time Architecture Design Language for Multi-Rate Embedded Control Systems".
   In: Proc. 25th ACM Symp. Applied Computing (SAC'10). Ed. by S. Y. Shin, S. Ossowski, M. Schumacher, M. J. Palakal, and C.-C. Hung. Sierre, Switzerland, pp. 527–534.
- Girault, A., C. Prévot, S. Quinton, R. Henia, and N. Sordon (Nov. 2018). "Improving and Estimating the Precision of Bounds on the Worst-Case Latency of Task Chains". In: IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems 37.11, pp. 2578–2589.
- Grötschel, M., M. Jünger, and G. Reinelt (Nov. 1984). "A Cutting Plane Algorithm for the Linear Ordering Problem". In: *Operations Research* 32.6, pp. 1195–1220.

#### References III

- Grötschel, M., M. Jünger, and G. Reinelt (July 2022). "Comments on "An Exact Method for the Minimum Feedback Arc Set Problem"". In: ACM J. Experimental Algorithmics 27.1, article 3.
- looss, G., M. Pouzet, A. Cohen, D. Potop-Butucaru, J. Souyris, V. Bregeon, and P. Baufreton (Mar. 2020). "1-Synchronous Programming of Large Scale, Multi-Periodic Real-Time Applications with Functional Degrees of Freedom". preprint.
- Johnson, D. B. (Mar. 1975). "Finding All the Elementary Circuits of a Directed Graph". In: SIAM J. Computing 4.1, pp. 77–84.
- Mandel, L., F. Plateau, and M. Pouzet (June 2010). "Lucy-n: a n-Synchronous extension of Lustre". In: Proc. 10th Int. Conf. on Mathematics of Program Construction (MPC 2010). Ed. by C. Bolduc,
   J. Desharnais, and B. Ktari. Vol. 6120. LNCS. Québec City, Canada, pp. 288–309.
- Pagetti, C., J. Forget, F. Boniol, M. Cordovilla, and D. Lesens (Sept. 2011). "Multi-task implementation of multi-periodic synchronous programs". In: Discrete Event Dynamic Systems 21.3, pp. 307–338.

#### References IV

- Pagetti, C., D. Saussié, R. Gratia, E. Noulard, and P. Siron (Apr. 2014). "The ROSACE Case Study: From Simulink Specification to Multi/Many-Core Execution". In: 20th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2014). Berlin, Germany, pp. 309–318.
- Pouzet, M. (Apr. 2006). Lucid Synchrone, v. 3. Tutorial and reference manual. Université Paris-Sud.
- Smarandache, I. M., T. Gautier, and P. Le Guernic (Sept. 1999). "Validation of Mixed Signal-Alpha Real-Time Systems through Affine Calculus on Clock Synchronisation Constraints". In: *Proc. World Congress on Formal Methods in the Development of Computing Systems (FM'99)*. Ed. by J. M. Wing, J. Woodcock, and J. Davies. Vol. 1709. LNCS. Toulouse, France, pp. 1364–1383.
- Wyss, R., F. Boniol, J. Forget, and C. Pagetti (Dec. 2012). "A Synchronous Language with Partial Delay Specification for Real-Time Systems Programming". In: Proc. 10th Asian Symp. Programming Languages and Systems (APLAS 2012). Ed. by R. Jhala and A. Igarashi. Vol. 7705. LNCS. Kyoto, Japan, pp. 223–238.