

## Optimizing Intel Data Direct I/O Technology for Multi-hundred-gigabit Networks

**Alireza Farshin<sup>+</sup>, Amir Roozbeh**<sup>+\*</sup>, Gerald Q. Maguire Jr.<sup>+</sup>, Dejan Kostic<sup>+</sup>

farshin@kth.se

amirrsk@kth.se maguire@kth.se dmk@kth.se

+ KTH Royal Institute of Technology (EECS)

**\*** Ericsson Research

## What is DDIO?

Data Direct I/O Technology (DDIO) transfers packets directly to Last Level Cache (LLC) rather than main memory. DDIO updates a cache line if it is already available in LLC; otherwise, it allocates the cache line in a limited portion of LLC (i.e., **2 ways** in a n-way set-associative cache).

## How to Fine-tune DDIO

A little-discussed register called "**IIO LLC WAYS**" can be used to tune the capacity of DDIO. Fine-tuning DDIO enables us to process packets with a larger number of RX descriptors while providing the *same or better* performance.



DDIO was introduced to improve the performance of I/O applications by mitigating expensive DRAM accesses.

## DDIO Can Become a Bottleneck

Faster link speeds causes DDIO fail to provide the expected benefits, as new incoming packets can repeatedly evict previously received packets (i.e., not-yet-processed and already-processed packets) from the LLC. The probability of eviction is high when:

- High #Receive (RX) descriptors
- High load imbalance factor
- Receiving rate ≈100 Gbps
- I/O intensive application
- Packet size  $\geq$  512 Byte





Different applications have different levels of sensitivity to DDIO.



Moreover, performance of DDIO only matters when an application is **I/O bound**, rather than CPU/memory bound. 5 Toward 200 Gbps

**Problem:** DDIO can degrade performance with faster link speeds, due to the higher cache injection rate.



Approach: LLC could be bypassed for





There is no **one-size-fits-all** approach to utilize DDIO. Therefore, it is important to optimize DDIO based on the characteristics of applications and their workload, especially for multi-hundred-gigabit networks. low-priority or DDIO-insensitive application, thus making room for the high-priority or highly-DDIO-sensitive applications. **Bypassing** could be done via:

• Disabling DDIO for an specific I/O device or

• Exploiting a remote processor's socket to DMA data



