OCUDU on Kubernetes – #1 Production Deployment

This post describes a reference deployment of OCUDU on Kubernetes targeting a production-grade cluster.

It is not a minimal example, and it is not intended to be the fastest way to get a gNB running. Instead, it presents the intended feature surface of the ocudu-gnb Helm chart when all major capabilities are enabled and composed together in a single deployment.

Everything that follows in this series either simplifies this setup or builds on top of it. If something works here, deviations later are deliberate.

Scope and assumptions

This post assumes that you already operate a Kubernetes cluster capable of running high-performance workloads.

Specifically:

Kubernetes is installed and reasonably tuned
SR-IOV is available on the worker nodes
The SR-IOV CNI and device plugin are installed and functional

The scenario itself is intentionally narrow:

A single gNB
Single-sector deployment, split 7.2
No high availability
No SMO

This keeps the focus on mechanics and tradeoffs, rather than orchestration layers or architectural abstractions.

Why SR-IOV is the default dataplane

Throughout this series, DPDK is treated as the default dataplane for Open Fronthaul (OFH) traffic of the OCUDU gNB on Kubernetes.

This does not mean it is always required, nor that it is always the right choice. It does mean that it is the configuration against which all other options are evaluated. In practice, it delivers the best performance and the most predictable behavior, and for that reason I strongly recommend using it whenever possible.

DPDK integrates cleanly into Kubernetes through two components:

The SR-IOV CNI, which handles network attachment
The SR-IOV device plugin, which exposes network interfaces as schedulable resources

From Kubernetes’ perspective, an interface provided by the SR-IOV device plugin is simply another resource request. From the gNB’s perspective, it is a high-performance network interface with predictable characteristics.

Production deployment scenario

With the first configuration in this post, the goal is to demonstrate a production-grade deployment of the OCUDU gNB. The following sections explain what this requires and how it is configured.

To ensure proper isolation of network interfaces, the SR-IOV device plugin is used. This allows network interfaces dedicated to OFH traffic to be passed directly into the gNB pod.

The following section must be present in the values.yaml of the OCUDU gNB:

sriovConfig:
  enabled: true
  extendedResourceName: "intel.com/intel_sriov_dpdk"

This configuration enables all necessary mechanisms in the gNB entrypoint script to automatically update the network_interface and du_mac sections of the runtime configuration using values derived from the dynamically assigned network interface. Make sure that the extendedResourceName matches the request and limits in your values.yaml.

Using the SR-IOV plugin requires DPDK to be enabled in the gNB. This is done by adding the hal section to the configuration. The --lcores argument is updated dynamically at runtime and aligned with the CPU cores assigned to the gNB container. Make sure this section exists in the config section of your values.yaml.

hal:
  eal_args: "--lcores (0-1)@(0-11)"

To request the DPDK-enabled interface via the SR-IOV plugin, the following resource requests and limits are configured:

resources:
  limits:
    cpu: 16
    memory: 8Gi
    hugepages-1Gi: 2Gi
    intel.com/intel_sriov_dpdk: 1
  requests:
    cpu: 16
    memory: 8Gi
    hugepages-1Gi: 2Gi
    intel.com/intel_sriov_dpdk: 1

The intel.com/intel_sriov_dpdk resource corresponds to the extended resource name configured in the SR-IOV device plugin.

To run the gNB with the required privileges, the following security context is configured:

securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    add:
      - IPC_LOCK
      - SYS_ADMIN
      - SYS_RAWIO
      - NET_RAW
      - SYS_NICE
  privileged: false

If the core network runs outside of the Kubernetes cluster, the N2 and N3 interfaces must be exposed using a LoadBalancer service. It is important you set the loadBalancerIP, automatic querying at runtime is not supported. The Chart will fail complaining about LB_IP not being set. An example configuration is shown below:

service:
  enabled: true
  type: LoadBalancer
  loadBalancerIP: "10.10.10.41"
  ports:
    n2:
      enabled: true
      port: 38412
    n3:
      enabled: true
      port: 2152

With the OFH and core network connectivity in place, the next step is metrics and logging.

To expose the metrics interface, a dedicated service is created. This service can be configured as LoadBalancer, NodePort, or ClusterIP. In this scenario, the core network is external to the cluster, so a LoadBalancer is used.

metricsService:
  enabled: true
  type: LoadBalancer
  loadBalancerIP: "10.10.10.42"
  port: 8001

This service exposes the port, but the gNB must also be configured to listen on it. This is done in the runtime configuration section of the Helm chart:

remote_control:
  enabled: true
  port: 8001
  bind_addr: 0.0.0.0

It is important that the port defined in the remote_control section matches the port configured in metricsService.

Metrics reporting itself is enabled and configured as follows:

metrics:
  autostart_stdout_metrics: true
  enable_json: true
  enable_log: false
  periodicity:
    du_report_period: 1000
    cu_up_report_period: 1000
    cu_cp_report_period: 1000
  layers:
    enable_ru: true
    enable_sched: true
    enable_rlc: true
    enable_mac: true
    enable_pdcp: true
    enable_du_low: true

I typically set autostart_stdout_metrics: true so that console metrics are visible directly in the pod logs. I also recommend keeping enable_log: false to avoid uncontrolled growth of local storage. Enabling log-based metrics is suitable only for short-lived deployments or testing scenarios.

enable_json: true enables JSON-formatted metrics over the TCP socket. The reporting periods are specified in milliseconds, where 1000 ms corresponds to one second.

As a final step, persistence for logs can be enabled. This should be done carefully, especially for long-running deployments, as excessive log growth can cause disk pressure or storage exhaustion.

persistence:
  enabled: true
  type: pvc
  preserveOldLogs: false
  mountPath: "/tmp"
  pvc:
    storageClassName: ""
    accessMode: ReadWriteOnce
    size: 10Gi

If preserveOldLogs is set to true, the entrypoint script moves existing log files into a timestamped subdirectory on restart instead of overwriting them.

Finally, logging behavior is controlled via log levels. Ensure the log file path is located within the PVC mount path.

log:
  filename: /tmp/gnb.log
  all_level: warning

For detailed information about the OCUDU gNB runtime configuration, refer to the srsRAN Project Configuration Reference.

Example configs: ocudu-gnb-scenario1.zip

Test lab deployment scenario

The second configuration demonstrates a simplified deployment of the OCUDU gNB. It is not suitable for production use but is well suited for test labs where strict isolation and security are less critical.

The main difference compared to the production deployment is that the SR-IOV device plugin is not used. Instead, hostNetwork: true is enabled, giving the pod direct access to the host’s network stack.

network:
  hostNetwork: true

In this case, the pod is run in privileged mode with the necessary capabilities:

securityContext:
  allowPrivilegeEscalation: true
  capabilities:
    add:
      - SYS_NICE
      - NET_ADMIN
  privileged: true

As a sanity check, SR-IOV must be disabled:

sriovConfig:
  enabled: false

In this configuration, the entrypoint script does not update network_interface or du_mac fields automatically. These values must be correctly set in the runtime configuration. The hal section is still required, but CPU core assignment is handled automatically.

Resource requests are simplified accordingly:

resources:
  limits:
    cpu: 16
    memory: 8Gi
    hugepages-1Gi: 2Gi
  requests:
    cpu: 16
    memory: 8Gi
    hugepages-1Gi: 2Gi

For metrics, a convenient option in lab setups is to expose the metrics interface via NodePort:

metricsService:
  enabled: true
  type: NodePort
  port: 8001
  targetPort: 8001
  nodePort: 30801

This exposes the metrics endpoint directly on the worker node. Alternatively, ClusterIP can be used if the metrics consumer runs inside the same cluster.

metricsService:
  enabled: true
  type: ClusterIP
  port: 8001
  targetPort: 8001

As before, ensure the port matches the configuration inside the gNB runtime config.

For log persistence in a lab environment, using hostPath is often sufficient and convenient:

persistence:
  enabled: true
  type: hostPath
  hostPath:
    path: /var/lib/ocudu-gnb
    pathType: DirectoryOrCreate

This creates the directory /var/lib/ocudu-gnb on the worker node if it does not already exist.

Example configs: ocudu-gnb-scenario2.zip

Key takeaway

This is the reference deployment for the series.

Every later post is either a simplification, a specialization, or an extension of what is shown here.

If something feels complex, that is intentional. RAN is complex, and Kubernetes does not remove that complexity. It gives it structure.

This post is not about selling Kubernetes. It is about showing how OCUDU behaves when all relevant knobs are available.

What comes next

In the next post, I’ll walk through deploying the SRS ONAP SMO Helm chart and configuring the OCUDU gNB to integrate with it.

Stay tuned!