

# HW Implementation of Track Reconstruction for the New HADES LVL2 Trigger

Justus-Liebig-University in Giessen

Ming Liu



## Tracking in MDCs



- 4 MDCs
- Particles' tracks bended in the magnetic area
- Straight line tracks from target to MDC II, and from MDC III to MDC IV
- Inner and outer tracks pointing to RICH (Johannes Roskoss) and TOF (Andreas Kopp) respectively and helping them to find patterns
- Inner tracking being implemented in HW currently. Both inner and outer algorithms ready in SW.



## Tracking Principle

#### MDC – Chamber (front view)







## Tracking Principle





Special thanks to: Vladimir Pechenov

**Daniel Kirschner** 

Geydar Agakishiev



#### **HW Platform**

- 5 FPGAs on a compute node (Shuo Yang)
- External and internal links
- Large memory capability
- Embedded hardcore PowerPC CPUs on the FPGA chip (embedded Linux OS)
- Peripheral cores on the FPGA
- Development work on Xilinx commercial boards with same family FPGAs



#### Bus-based Design in FPGA



- Bus-based platform
  - PLB (fast)
  - OPB (slow)
- PowerPC 405 CPU
- Algo. Processing engines (Tracking Processing Unit)
- Other peripherals:
  - Gigabit Ethernet
  - DDR memory
  - Flash memory
  - RS232
  - .....



#### LocalLink-based Design in FPGA



- LocalLink-based platform
- Multi-Port Memory Controller (8 ports)
  - Heavy traffic avoided on the PLB bus
- Direct access to the memory from the device
- Large performance improvement expected



#### TPU Design



- Addr. LUT
- Projection LUT
- Accumulate Unit
- Peak finder
- IP interface (IPIF)



#### Implementation Results

| Resources            | TPU                             | compute node<br>platform     | PLB-IPIF                       | system with<br>TPU (sum)         |
|----------------------|---------------------------------|------------------------------|--------------------------------|----------------------------------|
| 4-input<br>LUTs      | 5175 out<br>of 50560<br>(10.2%) | 8531 out of<br>50560 (16.9%) | 2900 out<br>of 50560<br>(5.7%) | 16606 out<br>of 50560<br>(32.8%) |
| Slice Flip-<br>Flops | 1715 out<br>of 50560<br>(3.4%)  | 5724 out of<br>50560 (11.3%) | 1640 out<br>of 50560<br>(3.2%) | 9079 out<br>of 50560<br>(18.0%)  |
| Block<br>RAMs        | 41 out of<br>232 (17.7%)        | 18 out of 232<br>(7.8%)      | 0                              | 59 out of 232<br>(25.4%)         |
| DSP<br>Slices        | 0                               | 8 out of 128<br>(6.3%)       | 0                              | 8 out of 128<br>(6.3%)           |

Table 1. Resource consumption

- Resource utilization is acceptable for Virtex4 FX60 FPGA.
- Timing limitation: 125 MHz without optimization
- We choose 100 MHz, matching the speed of PLB.



#### Performance Evaluation

- A C program running on the server (Xeon 2.4G) as the software reference
- Measurement setup: 30 fired wires/sub-event, 5.7 Kbits LUT/wire in average (1,510,256 bytes/2110 wires)
- **SW performance** = 0.82K sub-events/s
- When DMA\_done interrupt used, HW performance = 0.83K sub-events/s
- When DMA\_done polling used, HW performance = 4.5K sub-events/s (5.5 times speedup)
- PowerPC was engaged in the DMA initialization and DMA\_done interrupt handler. Software overhead was largely introduced then.
- A HW master logic will take the place of the CPU+DMA solution for small overhead and higher performance.
- In theory, speedup of around 20~30 per module is expected according to the simulation



#### Summary and Future Work

- Basic principle of the inner track reconstruction was implemented in Xilinx FPGA.
- Working as a processing engine in compute nodes, the TPU works to find out track candidates.
- It is feasible to implement the entire system in FPGA. The speedup of 20~30 is expected.
- Multiple TPU modules will be inserted in the system for high processing speed.
- Design for inner tracking is to be optimized. Outer tracking implementation will also be the future work.



# Thanks for your attention!