#### **Worst Case Execution Time Prediction**





#### **Hard Real-Time Systems**

Controllers in planes, cars, plants, ... are expected to finish their tasks within reliable time bounds.





# **Timing Validation**

Schedulability analysis has to show that all timing requirements will be met

Takes into account:

System design (event based, time triggered, ...)

Outside world (maximal arrival rates, minimal interval between events, ...)

Scheduling policy (round robin, RMA, time triggered, RTOS, ...)

All results from the Scheduling Theory require the Worst Case Execution Time (WCET) of the tasks to be known



. . .



# Certification

Certificates

Stand alone tool for e.g. TÜVs

To proof timings to obtain certificates from TÜVs Certificate:

Program terminates in 82 ms on MicroS...





#### **Support during SW-Development**







# **Modern Hardware**

Multiple memories, caches, pipelines, branch prediction, ...

Performance depends on execution history. This makes the prediction difficult

No information means: assume the worst

Switching off caching reduces performance by a factor of 30 (EADS study)





# **General Problems with State Based Processor Features**

Problem: Modern hardware <=> predictability of execution time

Software monitoring, dual loop benchmark, direct measurement with logic analyzer, hardware simulation are no longer generally applicable.

Choosing the fastest available processor, praying, or crossing fingers is not a true alternative.





# **A Traditional Approach**

Partition the application into code snippets, Determine their worst-case inputs, Measure their runtime with these inputs, Combine these results to find the worst-case path and its runtime.

Error-prone and expensive!





# **Some Architectural Challenges**

The empty cache is not necessarily the "worst case cache"

The global round robin counter/PLRU state bits can be changed by interrupt routines

Unified cache => instructions and data interfere => measurements for all possibilities of interference necessary

A cache miss is not necessarily the worst case





# **Solution: Static WCET Analysis**

The WCET analyzer computes save upper bounds of the execution time of the tasks in a program for all inputs

Static program analysis based on Abstract Interpretation

The analysis design is proven to be correct





#### **WCET Analyzer**

Input: an executable program, starting points, loop iteration counts, call targets of indirect function calls, and a description of bus and memory speeds Computes **Worst-Case Execution Time** bounds of tasks

| WcetAl ?_ ×                                                                                                                                      |
|--------------------------------------------------------------------------------------------------------------------------------------------------|
| Files Options GDL Advanced Messages                                                                                                              |
| Executable<br>verPC/wcetppc/data/minmax.elf CFG<br>Start at: main<br>Supporting Files<br>AIP_File: n/PowerPC/wcetppc/T1.aip<br>Loop Counts File: |
|                                                                                                                                                  |
| Project File: ann/PowerPC/wcetppc/data/minmax                                                                                                    |
| A <u>b</u> out <u>A</u> nalyze <u>V</u> isualize <u>C</u> ancel                                                                                  |







# Scope

The WCET analyzer assumes no interference from the outside. Effects of interrupts, IO, timers, other (co-) processors are not reflected in the predicted runtime and have to be considered separately.





# Input

An executable (e.g. in ELF or COFF format).

User annotations

Call targets for all indirect function calls

Upper bounds on the iteration counts of all loops (and recursions)

A description of the (external) memories and buses (i.e. a list of memory areas with minimal and maximal access times)

To be provided once for a new board

A task

Can be arbitrarily selected by a start address or a function name A task denotes a sequentially executed piece of code (no threads, no parallelism, no waiting for external events)

The code of a task is compiled by a C-compiler from a restricted subset of ANSI-C (no dynamic data structures, no setjmp/longjmp)





#### Call Graph Calls contributing to the WCET are red







#### **Control Flow Graph** Worst case path is red



### Cycle-Wise Evolution of Cache/ Pipeline States





# Cache/ Pipeline **State** st



|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 17                                                                                       |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
| <pre>ste_170 ste_170 ste_170 ste_170 ste_170 stepset s</pre> | <pre>JIT_LEN: 2<br/>SUCC: 0x1000a4<br/>IQ:<br/>FBPU_PREDLEVEL: 0<br/>FBPU_INSINCEX: NONE<br/>FBPU_INSINCEX: NONE<br/>FBPU_CRDEP: (NONE,NONE)<br/>DU_RRFREE: (G: 6, F: 6, CTR: 1, CR: 1, LR: 1)<br/>CQ:<br/>IU1: (R: NONE, W: NONE, C: 0)<br/>SRU: (R: NONE, W: NONE, C: 0)<br/>SRU: (R: NONE, W: NONE, C: 0)<br/>LSU: (R0: NONE, R1: NONE, EA: NONE, ACC: NONE)<br/>STORE[0]: (NONE, NIONE, L: 1, I: NONE)<br/>STORE[0]: (NONE, NONE, L: 1, I: NONE)<br/>STORE[1]: (NONE, NONE, L: 1, I: NONE)<br/>STORE[2]: (NONE, NONE, L: 1, I: NONE)<br/>LSU_SPENDING: 0<br/>LSU_SPENDING: 0<br/>LSU_MEMIDX: 0<br/>LSU_MEMIDX: 0<br/>LSU_MEMIDX: 0<br/>BU_LB: 0x1000a4(4) HH<br/>BU_EA: I C1:0, C2:2, CL:0x1000a0<br/>NEXTBRANCH: 0<br/>BRANCH: FFFFFFFFFFFFFFFFF<br/>SPEC: (31, 31)<br/>BRANCHES: []<br/>ACT_CTX: 0<br/>NOP_CNT: 0</pre>                                                               | instruction 0×0000000000000004<br>op_id: 0×7c000000<br>cycles = 63<br>flag = in progress |
| :<br>st<br>1: {{0×100020}}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | <pre>JIT_LEN: 3<br/>SUCC: 0x1000a4<br/>IQ:<br/>F0FU_PREOLEVEL: 0<br/>F0FU_STATE: WAIT(0x1000a4)<br/>F0FU_INSINDEX: NUME<br/>F0FU_CROEP: (NONE,NONE)<br/>OU_RRFREE: (G: 6, F: 6, CTR: 1, CR: 1, LR: 1)<br/>CQ:<br/>IU1: (R: NONE, W: NONE, C: 0)<br/>SRU: (R: NONE, W: NONE, C: 0)<br/>SRU: (R: NONE, W: NONE, C: 0)<br/>LSU: (R0: NONE, W: NONE, C: 0)<br/>SRU: (R: NONE, W: NONE, C: 0)<br/>LSU: (RO: NONE, R1: NONE, EA: NONE, ACC: NONE)<br/>STORE[0]: (NONE, NONE, L: 1, I: NONE)<br/>STORE[1]: (NONE, NONE, L: 1, I: NONE)<br/>STORE[2]: (NONE, NONE, L: 1, I: NONE)<br/>STORE[2]: (NONE, NONE, L: 1, I: NONE)<br/>LSU_STATE: IDLE<br/>FPU: (R: NONE, W0: NONE, W1: NONE, W2: NONE, C0: 0, C1: 0, C2: 0, B: 0)<br/>DU_LA: I C1:0, C2:1, CL:0x1000a0<br/>NEXTORANCH: 0<br/>BRANCH: FFFFFFFFFFFFFFFFFFFFFFF<br/>SPEC: (31, 31)<br/>BRANCHES: []<br/>ACT_CTX: 0<br/>NOP_SPEC_CNT: 0</pre> | instruction 0×0000000000000004<br>op_id: 0×7c000000<br>cycles = 64<br>flag = in progress |

# **Overall Structure**







# The ColdFire Pipeline

#### Fetch Pipeline of 4 stages

Instruction Address Generation (IAG)

Instruction Fetch Cycle 1 (IC1) Instruction Fetch Cycle 2 (IC2) Instruction Early Decode (IED)

# Instruction Buffer (IB) for 8 instructions

## Execution Pipeline of 2 stages

Decoding and register operand fetching (1 cycle) Memory access and execution

(1 – many cycles)







# Pipeline Model





# **PPC755** Pipeline

Complex branch prediction Superscalar: up to two instructions per cycle dispatched and retired Out-of-order execution Integer instructions, Floating point instructions and Loads may be reordered Speculative execution After predicted branches, instructions are executed speculatively Speculative loads may occur





# **Pipeline of the PPC755**







# Pipeline Model







# Analysis of Loops

Loops are analyzed like procedures This allows for Virtual inlining Virtual unrolling Better address resolution Burst accesses Selectable precision **Optional user** constraints



| routine: routine_00000000 |             |                                               |                                                              |  |  |
|---------------------------|-------------|-----------------------------------------------|--------------------------------------------------------------|--|--|
| ×182a8                    |             |                                               |                                                              |  |  |
| cmpi.1 #1, d0             |             |                                               |                                                              |  |  |
| beq                       | .w #0×182   | .d8 <cos.< th=""><th>_ftg_01&gt;</th></cos.<> | _ftg_01>                                                     |  |  |
|                           |             |                                               |                                                              |  |  |
|                           |             | 1                                             |                                                              |  |  |
| max t                     | :: 13       |                                               | Context 0: count=1, time=2                                   |  |  |
|                           |             |                                               | Context 1: count=1, time=13                                  |  |  |
|                           | <b>U</b> 18 | 3262                                          | Context 2: count=1, time=4                                   |  |  |
|                           | n - 1       | ,202<br>Го                                    | Context 3. count=1, time=4<br>Context 4: count=1 time=4      |  |  |
|                           |             | Ľ                                             | Context 5: count=1, time=4                                   |  |  |
|                           | ы           | t.ω #0x:                                      | Context 6: count=1, time=4                                   |  |  |
|                           |             |                                               | Context 7: count=1, time=4                                   |  |  |
|                           |             | nax #: 1                                      | Context 8: count=1, time=4                                   |  |  |
|                           |             | nax t: 1                                      | Context 9: count=1, time=4                                   |  |  |
|                           |             |                                               | Context IU: count=1, time=4<br>Context 11: count=1, time=4   |  |  |
|                           |             |                                               | Context 12: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 13: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 14: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 15: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 16: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 17: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 18: count=1, time=4<br>Context 19: count=1, time=4   |  |  |
|                           |             |                                               | Context 20: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 21: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 22: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 23: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 24: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 25: count=1, time=4<br>Context 26: count=1, time=4   |  |  |
|                           | 8           | 182cc                                         | Context 20. count=1, time=4<br>Context 27: count=1, time=4   |  |  |
|                           |             | adda .1                                       | Context 28: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 29: count=1, time=4                                  |  |  |
|                           |             | nove.l (                                      | Context 30: count=1, time=4                                  |  |  |
|                           |             |                                               | Context 31: count=1, time=4                                  |  |  |
|                           |             | Con                                           | text 0: count=1 time=52 -4                                   |  |  |
|                           |             | Con                                           | text 1: count=1. time=22 =4                                  |  |  |
|                           | ×182        | d4 Con                                        | text 2: count=1, time=22 ==4                                 |  |  |
|                           | bra         | .w Con                                        | text 3: count=1, time=22 ==4                                 |  |  |
|                           |             | Con                                           | text 4: count=1, time=22 e=4                                 |  |  |
|                           |             | Con                                           | text 5: count=1, time=22 ==4                                 |  |  |
|                           |             | M Copi                                        | text 6: COUNT=1, time=22 e=4                                 |  |  |
|                           |             | Cop                                           | text 8: count=1, time=22 p=4                                 |  |  |
|                           | 1.100       | Con                                           | text 9: count=1, time=22 ==4                                 |  |  |
| Cont                      |             | Con                                           | text 10: count=1, time=22e=4                                 |  |  |
| max Cont                  |             | ax Con                                        | text 11: count=1, time=22 <sub>e=4</sub>                     |  |  |
|                           | m           | ax Coni                                       | text 12: count=1, time=22 <sub>8=4</sub>                     |  |  |
|                           | L L         | Con                                           | text 13: count=1, time=22e=4                                 |  |  |
| Cont Cont                 |             |                                               | ιεχι 14: count=1, t1me=22β=4<br>text 15: count=1 =time=22L=4 |  |  |
|                           |             |                                               | text 16: count=1. time=226=4                                 |  |  |
|                           |             |                                               | ····· · · · · · · · · · · · · · · ·                          |  |  |

ว4

# Limitations

Well behaved code (ABI) No exceptions Only aligned accesses No VM settings Some instructions not to be used





# Advantages

The WCET analyzer allows you to:

inspect the timing behavior of (timing critical parts of) your code

- The analysis results
  - are determined without the need to change the code

hold for all executions (for the intrinsic cache and pipeline behavior)





# **Advantages**

- The results are **precise**
- The computation is **fast**
- The WCET analyzer is easy to use
- The WCET analyzer works on **optimized code**
- The WCET analyzer **saves development time** by avoiding the tedious and time consuming (instrument and) execute (or emulate) and measure cycle for a set of inputs over and over again





# **Planned Extensions**

#### Support for code generators

- E.g. recognition of generated patterns to avoid the need of user annotations on generated code
- Automatic detection of the number of loop iterations in the executable
- Further targets





# Support for a New Processor Model

Front end for executables Modeling of the pipeline according to the documentation

(Modeling of the cache)

Verification of the pipeline model on the real hardware or reliable emulator





# References

LCTRTS' 97. Ferdinand, Martin, and Wilhelm. Applying Compiler Techniques to Cache Behavior Prediction.

RTSS' 98. Theiling and Ferdinand. Combining Abstract Interpretation and ILP for Microarchitecture Modeling and Program Path Analysis.

STTT Journal. Martin. PAG -- An Efficient Program Analyzer Generator

LCTES' 99. Schneider and Ferdinand. Pipeline Behavior Prediction for Superscalar Processors by Abstract Interpretation.

RTS Journal. Ferdinand and Wilhelm. Efficient and Precise Cache Behavior Prediction for Real-Time Systems.

RTS Journal. Kästner and Thesing. Cache-Aware Pre-Runtime Scheduling.

RTS Journal. Theiling and Ferdinand. Fast and Precise WCET Prediction by Separated Cache and Path Analysis.

RTSS '00. Schneider. Cache and Pipeline Sensitive Fixed Priority Scheduling for Preemptive Real-Time Systems.

RTCSA' 00. Theiling. Extracting Safe and Precise Control Flow from Binaries

LCTES' 01. Theiling. Generating Decision Trees for Decoding Binaries

EMSOFT' 01. Ferdinand et al., Reliable and Precise WCET Determination for a Real-Life Procesor

MSP' 02. Heckmann. The Influence of Replacement Strategy on Static Prediction of Cache Contents

ECRTS' 02. Thesing, Langenbach, Heckmann. Pipeline Behavior Prediction Analysis for Real-Time Systems by Pipeline Modeling

EMSOFT' 02. Theiling. ILP-based Interprocedural Path Analysis

SAS' 02. Langenbach, Thesing, Heckmann. Pipeline Modeling for Timing Analysis WCET' 02. Langenbach, Ferdinand, Wilhelm. Worst Case Execution Time Prediction







email: info@AbsInt.com http://www.AbsInt.com