← Back to Index

Background

In our current system, the access performance of RAD (Runtime Adjustable Data) parameters—configuration variables that control features, frequency band settings, and algorithm thresholds in the data plane—has not met the expected target. Because these parameters are accessed thousands of times per slot in high-frequency L1/L2 processing, even minor memory access latency can accumulate into a significant performance bottleneck.

This article discusses three directions for optimization:


Optimization 1: Remove Function Calls

Current Architecture

The current system uses a static variable inside a function as a cache for RAD parameters.

auto& GetRadValue(uint32_t radIndex)
{
    static RadDb radDb;
    return radDb;
}
class RadDb
{
    RAD rad1;
    RAD rad2;
    RAD rad3;
    RAD rad4;
}

Compared to reading from the platform interface every time, this approach gives a clear performance improvement — in the worst case, the access cost is reduced to an LL Cache Miss, which is already a solid baseline.

However, by looking at the generated assembly code, we found a problem that can be improved further: every access to a RAD parameter goes through a function call to GetRadValue() to retrieve the RadDb instance, instead of reading the specific variable’s (like rad1, rad2) memory address directly.

Assembly Analysis

Below is the assembly code generated by the current implementation (to be added):

📋 [To be added] Assembly snippet of the current implementation (showing the function call path to GetRadValue())

As we can see, every RAD parameter access includes a full function call sequence (e.g., call or bl instructions) to GetRadValue(). The compiler cannot replace this function call with the direct memory addresses of the individual parameters (rad1, rad2, etc.). This means extra work for each access: setting up a stack frame, jumping to the function, and returning — all of which add up.

Solution: Promote to a File-scope Static Variable

The key limitation here is that the function GetRadValue() acts as an opaque barrier across translation units (if not aggressively inlined with LTO), forcing caller code to issue a function call to fetch the RadDb reference before it can access the specific RAD parameter.

By promoting the RadDb instance (or individual RAD parameters) to a file-scope (global) static or using an extern declaration, the compiler can see the parameters’ exact memory addresses at compile time. It can then replace the GetRadValue() function call with a direct memory access to rad1, rad2, etc., removing the call overhead entirely.

Assembly code after the optimization (to be added):

📋 [To be added] Assembly snippet after optimization (direct memory access, no function call)

Comparing the two assembly outputs, we can clearly see that the function call path is gone, the instruction count is lower, and the access latency is reduced.


Optimization 2: Reduce the Memory Footprint of RAD Parameters

Current Data Structure

The current RAD parameter is defined as follows:

RAD {
    radIndex   // index of this RAD parameter
    MaxValue   // upper bound of the valid range
    MinValue   // lower bound of the valid range
    value      // the actual runtime value (the only field that changes)
}

In this structure, value is the only field that is read or written at runtime. The other three fields — radIndex, MaxValue, and MinValue — are fixed after system initialization and are rarely accessed during normal execution.

The problem is that all four fields are stored together in the same struct (Array of Structures, AoS layout). When we load a RAD parameter, all fields are pulled into the same Cache Line together. The three static fields take up space in the Cache Line but carry no useful information at runtime. When multiple RAD parameters exist in the same memory area, this waste becomes more significant and reduces the number of useful value fields that can fit in one Cache Line.

Solution: Separate Static Metadata from Dynamic Data

To fix this, we can use a separate array to hold radIndex, MaxValue, and MinValue for all RAD parameters. The active configuration data structure itself only keeps the value field (shifting towards a Structure of Arrays, SoA layout). When the static metadata fields are needed (e.g., for OAM configuration validation), they can be looked up by index.

This cleanly separates static metadata from dynamic data, placing them in different memory regions. When multiple RAD parameters are defined in the same area, loading any one of them only brings its value field into the cache. The static fields stay out of the picture. This means more value fields can fit in a single Cache Line, making full use of the available cache space and significantly boosting the cache hit rate during data plane processing.

Cache Line behavior comparison before and after optimization (to be added):

📋 [To be added] Diagram comparing Cache Line utilization before and after optimization


Optimization 3: Compile-Time RAD Parameters (A New Type of RAD)

Background

It is important to note that this is not a further upgrade to Optimization 1 and 2, but rather the introduction of a completely different type of RAD parameter. Optimizations 1 and 2 target parameters that must remain adjustable at runtime. However, there is another category of parameters: feature flags used during development to hide new code.

For these feature flags, the performance goal is strict: when the new feature is not enabled, the new code should have zero impact on the final binary — as if the code was never written. We are willing to sacrifice runtime configurability for absolute zero overhead.

To achieve this, the compiler must be able to determine at compile time that a code branch is unreachable, and remove it completely. This requires the RAD parameter’s value to be perfectly known at compile time, eliminating the need for any memory access entirely.

Language features such as constexpr and constinit are the right tools for this new, compile-time configurable RAD.

Implementation: Template-based RAD Definition

By encoding all RAD metadata as template parameters, every field becomes a compile-time constant. The compiler can read the RAD value directly during compilation and use it to drive further optimizations:

template<uint32_t radIndex, uint32_t maxValue, uint32_t minValue, uint32_t defaultValue>
struct RAD {
    static_assert(radIndex < 2500, "radIndex Invalid.");
    static_assert(defaultValue <= maxValue && defaultValue >= minValue,
                  "value out of range.");

    constexpr static uint32_t kRadIndex  = radIndex;
    constexpr static uint32_t kMaxValue  = maxValue;
    constexpr static uint32_t kMinValue  = minValue;
    constexpr static uint32_t value      = defaultValue;
};

Effect: Dead Code Elimination by the Compiler

Because value is declared as constexpr, the compiler can evaluate any condition that depends on it at compile time. This process is called Constant Folding. When the condition is always false, the compiler removes the entire branch from the output — a process known as Dead Code Elimination.

Consider the following example:

if (rad.value) {           // rad.value is a compile-time constant: false
    doNewFeature();        // this branch is removed by the compiler
} else {
    doLegacyCalculation(); // only this remains in the final binary
}

When rad.value is compile-time false, the compiler drops the if branch entirely. The final binary contains only doLegacyCalculation(), which is exactly the same output as if the new feature code had never been written.

Assembly comparison (to be added):

📋 [To be added] Assembly comparison with and without compile-time RAD optimization (before and after branch elimination)

Bonus: Compile-time Validity Checks

The template-based definition brings one more benefit: static_assert moves the validity checks to compile time. Any out-of-range radIndex or invalid defaultValue will cause a build error immediately, rather than producing a hard-to-debug failure at runtime.


Summary

This article explored optimization strategies for RAD parameters across two different usage scenarios:

For Runtime-Configurable Parameters: - Assembly level: By exposing the variable’s exact memory address rather than hiding it behind a function call (like GetRadValue()), the compiler can replace function calls with direct memory access, reducing per-access overhead. - Memory layout: By separating static metadata (radIndex, MaxValue, MinValue) from the dynamic field (value) using Structure of Arrays (SoA) or similar decoupling, Cache Line space is used more efficiently. This significantly improves data density and cache hit rates.

For Compile-Time Feature Flags (A New Paradigm): - Compile-time visibility: For parameters that do not need runtime adjustment, using a template-based constexpr definition ensures the RAD value is fully known at compile time. The compiler can then apply Constant Folding and Dead Code Elimination, guaranteeing that disabled feature code has absolute zero impact on the final binary.

These optimizations address the classic trade-off between runtime flexibility and extreme latency reduction, providing a comprehensive toolkit for high-performance parameter management.

📋 [To be added] Overall performance comparison before and after all optimizations