In our current system, the access performance of RAD (Runtime Adjustable Data) parameters—configuration variables that control features, frequency band settings, and algorithm thresholds in the data plane—has not met the expected target. Because these parameters are accessed thousands of times per slot in high-frequency L1/L2 processing, even minor memory access latency can accumulate into a significant performance bottleneck.
This article discusses three directions for optimization:
The current system uses a static variable inside a
function as a cache for RAD parameters.
auto& GetRadValue(uint32_t radIndex)
{
static RadDb radDb;
return radDb;
}
class RadDb
{
RAD rad1;
RAD rad2;
RAD rad3;
RAD rad4;
}Compared to reading from the platform interface every time, this approach gives a clear performance improvement — in the worst case, the access cost is reduced to an LL Cache Miss, which is already a solid baseline.
However, by looking at the generated assembly code, we found a
problem that can be improved further: every access to a RAD
parameter goes through a function call to GetRadValue()
to retrieve the RadDb instance, instead of reading the
specific variable’s (like rad1, rad2)
memory address directly.
Below is the assembly code generated by the current implementation (to be added):
📋 [To be added] Assembly snippet of the current implementation (showing the function call path to
GetRadValue())
As we can see, every RAD parameter access includes a full
function call sequence (e.g., call or bl
instructions) to GetRadValue(). The compiler cannot
replace this function call with the direct memory addresses of the
individual parameters (rad1, rad2, etc.).
This means extra work for each access: setting up a stack frame,
jumping to the function, and returning — all of which add up.
The key limitation here is that the function
GetRadValue() acts as an opaque barrier across
translation units (if not aggressively inlined with LTO), forcing
caller code to issue a function call to fetch the RadDb
reference before it can access the specific RAD
parameter.
By promoting the RadDb instance (or individual RAD
parameters) to a file-scope (global) static or using an
extern declaration, the compiler can see the
parameters’ exact memory addresses at compile time. It can then
replace the GetRadValue() function call with a direct
memory access to rad1, rad2, etc.,
removing the call overhead entirely.
Assembly code after the optimization (to be added):
📋 [To be added] Assembly snippet after optimization (direct memory access, no function call)
Comparing the two assembly outputs, we can clearly see that the function call path is gone, the instruction count is lower, and the access latency is reduced.
The current RAD parameter is defined as follows:
RAD {
radIndex // index of this RAD parameter
MaxValue // upper bound of the valid range
MinValue // lower bound of the valid range
value // the actual runtime value (the only field that changes)
}
In this structure, value is the only field that is
read or written at runtime. The other three fields —
radIndex, MaxValue, and
MinValue — are fixed after system initialization and
are rarely accessed during normal execution.
The problem is that all four fields are stored together in the
same struct (Array of Structures, AoS layout). When we load a RAD
parameter, all fields are pulled into the same Cache Line together.
The three static fields take up space in the Cache Line but carry no
useful information at runtime. When multiple RAD parameters exist in
the same memory area, this waste becomes more significant and
reduces the number of useful value fields that can fit
in one Cache Line.
To fix this, we can use a separate array to hold
radIndex, MaxValue, and
MinValue for all RAD parameters. The active
configuration data structure itself only keeps the
value field (shifting towards a Structure of Arrays,
SoA layout). When the static metadata fields are needed (e.g., for
OAM configuration validation), they can be looked up by index.
This cleanly separates static metadata from dynamic data, placing
them in different memory regions. When multiple RAD parameters are
defined in the same area, loading any one of them only brings its
value field into the cache. The static fields stay out
of the picture. This means more value fields can fit in
a single Cache Line, making full use of the available cache space
and significantly boosting the cache hit rate during data plane
processing.
Cache Line behavior comparison before and after optimization (to be added):
📋 [To be added] Diagram comparing Cache Line utilization before and after optimization
It is important to note that this is not a further upgrade to Optimization 1 and 2, but rather the introduction of a completely different type of RAD parameter. Optimizations 1 and 2 target parameters that must remain adjustable at runtime. However, there is another category of parameters: feature flags used during development to hide new code.
For these feature flags, the performance goal is strict: when the new feature is not enabled, the new code should have zero impact on the final binary — as if the code was never written. We are willing to sacrifice runtime configurability for absolute zero overhead.
To achieve this, the compiler must be able to determine at compile time that a code branch is unreachable, and remove it completely. This requires the RAD parameter’s value to be perfectly known at compile time, eliminating the need for any memory access entirely.
Language features such as constexpr and
constinit are the right tools for this new,
compile-time configurable RAD.
By encoding all RAD metadata as template parameters, every field becomes a compile-time constant. The compiler can read the RAD value directly during compilation and use it to drive further optimizations:
template<uint32_t radIndex, uint32_t maxValue, uint32_t minValue, uint32_t defaultValue>
struct RAD {
static_assert(radIndex < 2500, "radIndex Invalid.");
static_assert(defaultValue <= maxValue && defaultValue >= minValue,
"value out of range.");
constexpr static uint32_t kRadIndex = radIndex;
constexpr static uint32_t kMaxValue = maxValue;
constexpr static uint32_t kMinValue = minValue;
constexpr static uint32_t value = defaultValue;
};Because value is declared as constexpr,
the compiler can evaluate any condition that depends on it at
compile time. This process is called Constant
Folding. When the condition is always false,
the compiler removes the entire branch from the output — a process
known as Dead Code Elimination.
Consider the following example:
if (rad.value) { // rad.value is a compile-time constant: false
doNewFeature(); // this branch is removed by the compiler
} else {
doLegacyCalculation(); // only this remains in the final binary
}When rad.value is compile-time false,
the compiler drops the if branch entirely. The final
binary contains only doLegacyCalculation(), which is
exactly the same output as if the new feature code had never been
written.
Assembly comparison (to be added):
📋 [To be added] Assembly comparison with and without compile-time RAD optimization (before and after branch elimination)
The template-based definition brings one more benefit:
static_assert moves the validity checks to compile
time. Any out-of-range radIndex or invalid
defaultValue will cause a build error immediately,
rather than producing a hard-to-debug failure at runtime.
This article explored optimization strategies for RAD parameters across two different usage scenarios:
For Runtime-Configurable Parameters: -
Assembly level: By exposing the variable’s exact
memory address rather than hiding it behind a function call (like
GetRadValue()), the compiler can replace function calls
with direct memory access, reducing per-access overhead. -
Memory layout: By separating static metadata
(radIndex, MaxValue,
MinValue) from the dynamic field (value)
using Structure of Arrays (SoA) or similar decoupling, Cache Line
space is used more efficiently. This significantly improves data
density and cache hit rates.
For Compile-Time Feature Flags (A New Paradigm):
- Compile-time visibility: For parameters that do
not need runtime adjustment, using a template-based
constexpr definition ensures the RAD value is fully
known at compile time. The compiler can then apply Constant Folding
and Dead Code Elimination, guaranteeing that disabled feature code
has absolute zero impact on the final binary.
These optimizations address the classic trade-off between runtime flexibility and extreme latency reduction, providing a comprehensive toolkit for high-performance parameter management.
📋 [To be added] Overall performance comparison before and after all optimizations