Runners

Bazel Remote Execution Workers

Deploy and manage distributed build infrastructure with Bazel Remote Execution


Bazel Remote Execution (RBE) transforms your build process into a distributed system, allowing you to scale builds across multiple worker machines for faster, more efficient compilation and testing.

BuildBuddy Integration#

BuildBuddy is a fully managed Bazel platform providing build result streaming, remote caching, and remote build execution. It's free for individuals and open source projects, with an Enterprise tier for advanced features.

BuildBuddy Cloud Quickstart#

Get started with BuildBuddy Cloud in minutes by adding two lines to your .bazelrc:

1
# .bazelrc - BuildBuddy Cloud configuration
2
build --bes_results_url=https://app.buildbuddy.io/invocation/
3
build --bes_backend=grpcs://remote.buildbuddy.io

After configuring, builds will display a URL for viewing results:

1
$ bazel build //src:main
2
INFO: Streaming build results to: https://app.buildbuddy.io/invocation/24a37b8f-4cf2-4909-9522-3cc91d2ebfc4
3
INFO: Build completed successfully, 42 total actions

BuildBuddy Authentication#

Configure API key authentication for private build logs:

1
# .bazelrc - Authenticated BuildBuddy configuration
2
build --bes_results_url=https://app.buildbuddy.io/invocation/
3
build --bes_backend=grpcs://remote.buildbuddy.io
4
build --remote_header=x-buildbuddy-api-key=YOUR_API_KEY

Retrieve your API key from app.buildbuddy.io/docs/setup after creating an account.

BuildBuddy Remote Cache#

Enable remote caching to share build artifacts across your team:

1
# .bazelrc - Remote cache configuration
2
build --bes_results_url=https://app.buildbuddy.io/invocation/
3
build --bes_backend=grpcs://remote.buildbuddy.io
4
build --remote_cache=grpcs://remote.buildbuddy.io
5
build --remote_header=x-buildbuddy-api-key=YOUR_API_KEY
6
7
# Optional optimizations
8
build --remote_download_minimal
9
build --experimental_remote_cache_compression

BuildBuddy Remote Build Execution#

For distributed builds, enable remote execution:

1
# .bazelrc - Full RBE configuration
2
build --bes_results_url=https://app.buildbuddy.io/invocation/
3
build --bes_backend=grpcs://remote.buildbuddy.io
4
build --remote_executor=grpcs://remote.buildbuddy.io
5
build --remote_cache=grpcs://remote.buildbuddy.io
6
build --remote_header=x-buildbuddy-api-key=YOUR_API_KEY
7
8
# Platform configuration
9
build --host_platform=@buildbuddy_toolchain//:platform
10
build --platforms=@buildbuddy_toolchain//:platform
11
build --extra_toolchains=@buildbuddy_toolchain//:cc_toolchain
12
13
# Performance tuning
14
build --jobs=50
15
build --remote_timeout=3600

BuildBuddy Self-Hosted Deployment#

For organizations requiring on-premises infrastructure, BuildBuddy offers self-hosted deployment options.

Docker Compose Deployment#

Deploy a minimal BuildBuddy instance with Docker Compose:

1
# docker-compose.yaml
2
version: '3.8'
3
4
services:
5
buildbuddy:
6
image: gcr.io/flame-public/buildbuddy-app-onprem:latest
7
ports:
8
- "8080:8080" # Web UI
9
- "1985:1985" # gRPC (BES + Remote Cache)
10
- "1986:1986" # gRPCS (TLS)
11
volumes:
12
- buildbuddy-data:/data
13
environment:
14
- BB_DATABASE_DATA_SOURCE=sqlite3:///data/buildbuddy.db
15
- BB_STORAGE_DISK_ROOT_DIR=/data/storage
16
- BB_CACHE_DISK_ROOT_DIR=/data/cache
17
- BB_CACHE_MAX_SIZE_BYTES=10737418240 # 10GB
18
restart: unless-stopped
19
20
volumes:
21
buildbuddy-data:

Start the service:

1
docker-compose up -d
2
3
# Verify the service is running
4
curl http://localhost:8080/health

Configure Bazel to use your self-hosted instance:

1
# .bazelrc - Self-hosted BuildBuddy
2
build --bes_results_url=http://buildbuddy.internal:8080/invocation/
3
build --bes_backend=grpc://buildbuddy.internal:1985
4
build --remote_cache=grpc://buildbuddy.internal:1985

Kubernetes Deployment#

Deploy BuildBuddy on Kubernetes with Helm:

1
# Add BuildBuddy Helm repository
2
helm repo add buildbuddy https://helm.buildbuddy.io
3
helm repo update
4
5
# Install BuildBuddy
6
helm install buildbuddy buildbuddy/buildbuddy-enterprise \
7
--namespace buildbuddy \
8
--create-namespace \
9
--set ingress.enabled=true \
10
--set ingress.host=buildbuddy.your-domain.com \
11
--set database.external.enabled=false \
12
--set redis.enabled=true \
13
--set executor.enabled=true \
14
--set executor.replicas=4

Kubernetes values configuration:

1
# values.yaml - BuildBuddy Kubernetes configuration
2
ingress:
3
enabled: true
4
host: buildbuddy.your-domain.com
5
tls:
6
enabled: true
7
secretName: buildbuddy-tls
8
9
database:
10
external:
11
enabled: true
12
datasource: "postgres://user:[email protected]:5432/buildbuddy?sslmode=require"
13
14
redis:
15
enabled: true
16
replicas: 3
17
18
cache:
19
disk:
20
enabled: true
21
rootDirectory: /data/cache
22
maxSizeBytes: 107374182400 # 100GB
23
24
executor:
25
enabled: true
26
replicas: 8
27
resources:
28
requests:
29
cpu: "4"
30
memory: "8Gi"
31
limits:
32
cpu: "8"
33
memory: "16Gi"
34
nodeSelector:
35
workload-type: bazel-executor

Self-Hosted with Remote Executors#

For distributed execution across multiple machines:

1
# executor-config.yaml
2
executor:
3
app_target: "grpc://buildbuddy-app.internal:1985"
4
root_directory: "/data/executor"
5
host_id: "executor-${HOSTNAME}"
6
7
local_cache:
8
max_size_bytes: 10737418240 # 10GB
9
root_directory: "/data/cache"
10
11
runner:
12
pool:
13
name: "default"
14
runner_type: CONTAINER
15
16
container:
17
default_image: "gcr.io/flame-public/executor-docker-default:latest"
18
enable_dockerd: true

Deploy executor nodes:

1
#!/bin/bash
2
# deploy-executor.sh
3
4
docker run -d \
5
--name buildbuddy-executor \
6
--privileged \
7
-v /var/run/docker.sock:/var/run/docker.sock \
8
-v /data/executor:/data/executor \
9
-v /data/cache:/data/cache \
10
-e HOSTNAME=$(hostname) \
11
gcr.io/flame-public/buildbuddy-executor:latest \
12
--config=/etc/executor-config.yaml

DevOps Hub Integration for BuildBuddy#

Register your self-hosted BuildBuddy instance with DevOps Hub:

1
#!/usr/bin/env python3
2
# register-buildbuddy.py
3
4
import os
5
import requests
6
7
def register_buildbuddy_instance():
8
"""Register BuildBuddy instance with DevOps Hub."""
9
config = {
10
"platform": "buildbuddy",
11
"deployment_type": "self-hosted",
12
"instance_url": os.environ.get("BUILDBUDDY_URL", "http://buildbuddy.internal:8080"),
13
"grpc_endpoint": os.environ.get("BUILDBUDDY_GRPC", "grpc://buildbuddy.internal:1985"),
14
"capabilities": [
15
"build-event-streaming",
16
"remote-cache",
17
"remote-execution"
18
],
19
"executor_count": int(os.environ.get("EXECUTOR_COUNT", 4)),
20
"cache_size_gb": int(os.environ.get("CACHE_SIZE_GB", 100))
21
}
22
23
response = requests.post(
24
"https://assistance.bg/api/runners/register",
25
headers={
26
"Authorization": f"Bearer {os.environ['DEVOPS_HUB_TOKEN']}",
27
"Content-Type": "application/json"
28
},
29
json=config
30
)
31
32
if response.status_code == 200:
33
print(f"✅ BuildBuddy instance registered")
34
return response.json()
35
else:
36
print(f"❌ Registration failed: {response.text}")
37
return None
38
39
if __name__ == "__main__":
40
register_buildbuddy_instance()

Platform Overview#

Bazel Remote Execution is Google's distributed build system that executes build actions on remote worker machines rather than locally. Instead of running builds on a single machine, RBE distributes work across a cluster of workers, providing:

Distributed Computing Execute build actions across multiple machines simultaneously, dramatically reducing build times for large codebases.

Remote Caching Share build artifacts across your team and CI/CD systems, eliminating redundant compilation work.

Consistent Environments Run all builds in containerized environments, ensuring reproducible results regardless of the underlying infrastructure.

Horizontal Scaling Add worker capacity on-demand to handle varying build loads and peak development times.

Prerequisites#

Before setting up Bazel RBE workers, ensure you have:

  • Bazel 6.0+ installed on client machines
  • Docker or Podman for containerized execution environments
  • gRPC networking knowledge for worker coordination
  • Container registry access for execution environment images
  • Network connectivity between workers and build clients
  • DevOps Hub account with runner management permissions

Infrastructure Setup#

Worker Cluster Deployment#

Deploy your RBE worker cluster across multiple machines for optimal performance:

1
# Create dedicated network for worker communication
2
docker network create bazel-rbe-cluster
3
4
# Deploy worker coordinator
5
docker run -d \
6
--name rbe-coordinator \
7
--network bazel-rbe-cluster \
8
-p 8980:8980 \
9
-e COORDINATOR_PORT=8980 \
10
gcr.io/bazel-remote/bazel-remote:latest
11
12
# Launch worker instances
13
for i in {1..4}; do
14
docker run -d \
15
--name rbe-worker-$i \
16
--network bazel-rbe-cluster \
17
--privileged \
18
-v /tmp/worker-$i:/tmp \
19
-e WORKER_ID=worker-$i \
20
-e COORDINATOR_ENDPOINT=rbe-coordinator:8980 \
21
gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest
22
done

Network Configuration#

Configure network access for distributed builds:

1
# Configure firewall for gRPC communication
2
sudo ufw allow 8980/tcp # Coordinator port
3
sudo ufw allow 8981/tcp # Worker registration port
4
sudo ufw allow 9090/tcp # Metrics endpoint
5
6
# Set up load balancer for worker pool
7
cat > /etc/nginx/sites-available/bazel-rbe << 'EOF'
8
upstream rbe_workers {
9
server worker-1:8980;
10
server worker-2:8980;
11
server worker-3:8980;
12
server worker-4:8980;
13
}
14
15
server {
16
listen 443 ssl;
17
server_name cluster.assistance.bg;
18
19
location / {
20
grpc_pass grpc://rbe_workers;
21
grpc_set_header Host $host;
22
}
23
}
24
EOF

Worker Installation#

Linux Worker Setup#

Install and configure RBE workers on Linux systems:

1
#!/bin/bash
2
# install-rbe-worker-linux.sh
3
4
# Download Bazel Remote Execution worker
5
wget https://github.com/bazelbuild/remote-apis-sdks/releases/latest/download/worker-linux-amd64
6
chmod +x worker-linux-amd64
7
sudo mv worker-linux-amd64 /usr/local/bin/bazel-rbe-worker
8
9
# Create worker configuration
10
cat > /etc/bazel-rbe-worker.yaml << 'EOF'
11
worker:
12
instance_name: "default_instance"
13
platform:
14
properties:
15
- name: "OSFamily"
16
value: "Linux"
17
- name: "container-image"
18
value: "docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest"
19
20
resources:
21
cpu_count: 8
22
memory_bytes: 34359738368 # 32GB
23
24
endpoints:
25
execution: "grpc://cluster.assistance.bg:443"
26
cache: "grpc://cache.assistance.bg:443"
27
28
logging:
29
level: "INFO"
30
file: "/var/log/bazel-rbe-worker.log"
31
EOF
32
33
# Create systemd service
34
cat > /etc/systemd/system/bazel-rbe-worker.service << 'EOF'
35
[Unit]
36
Description=Bazel Remote Execution Worker
37
After=network.target
38
39
[Service]
40
Type=simple
41
User=bazel-worker
42
ExecStart=/usr/local/bin/bazel-rbe-worker --config=/etc/bazel-rbe-worker.yaml
43
Restart=always
44
RestartSec=5
45
46
[Install]
47
WantedBy=multi-user.target
48
EOF
49
50
# Create worker user
51
sudo useradd -r -s /bin/false bazel-worker
52
sudo systemctl enable bazel-rbe-worker
53
sudo systemctl start bazel-rbe-worker

macOS Worker Setup#

Configure RBE workers on macOS systems:

1
#!/bin/bash
2
# install-rbe-worker-macos.sh
3
4
# Download macOS worker binary
5
curl -L -o /usr/local/bin/bazel-rbe-worker \
6
https://github.com/bazelbuild/remote-apis-sdks/releases/latest/download/worker-darwin-amd64
7
chmod +x /usr/local/bin/bazel-rbe-worker
8
9
# Create launch daemon configuration
10
cat > ~/Library/LaunchAgents/com.bazel.rbe.worker.plist << 'EOF'
11
<?xml version="1.0" encoding="UTF-8"?>
12
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
13
<plist version="1.0">
14
<dict>
15
<key>Label</key>
16
<string>com.bazel.rbe.worker</string>
17
<key>ProgramArguments</key>
18
<array>
19
<string>/usr/local/bin/bazel-rbe-worker</string>
20
<string>--config=/usr/local/etc/bazel-rbe-worker.yaml</string>
21
</array>
22
<key>RunAtLoad</key>
23
<true/>
24
<key>KeepAlive</key>
25
<true/>
26
</dict>
27
</plist>
28
EOF
29
30
# Load the service
31
launchctl load ~/Library/LaunchAgents/com.bazel.rbe.worker.plist

Windows Worker Setup#

Set up RBE workers on Windows systems:

1
# install-rbe-worker-windows.ps1
2
3
# Download Windows worker binary
4
$workerUrl = "https://github.com/bazelbuild/remote-apis-sdks/releases/latest/download/worker-windows-amd64.exe"
5
Invoke-WebRequest -Uri $workerUrl -OutFile "C:\Program Files\Bazel\bazel-rbe-worker.exe"
6
7
# Create worker configuration
8
$config = @"
9
worker:
10
instance_name: "default_instance"
11
platform:
12
properties:
13
- name: "OSFamily"
14
value: "Windows"
15
- name: "container-image"
16
value: "docker://mcr.microsoft.com/windows/servercore:ltsc2019"
17
18
resources:
19
cpu_count: 8
20
memory_bytes: 34359738368
21
22
endpoints:
23
execution: "grpc://cluster.assistance.bg:443"
24
cache: "grpc://cache.assistance.bg:443"
25
"@
26
27
$config | Out-File -FilePath "C:\Program Files\Bazel\worker-config.yaml" -Encoding UTF8
28
29
# Create Windows service
30
sc create "BazelRBEWorker" binPath= "C:\Program Files\Bazel\bazel-rbe-worker.exe --config=C:\Program Files\Bazel\worker-config.yaml"
31
sc start "BazelRBEWorker"

Configuration#

Worker Pools#

Configure worker pools for different build requirements:

1
# worker-pool-config.yaml
2
worker_pools:
3
- name: "linux-x64-large"
4
platform:
5
properties:
6
OSFamily: "Linux"
7
Arch: "x86_64"
8
cores: "16"
9
memory: "64GB"
10
11
- name: "linux-arm64"
12
platform:
13
properties:
14
OSFamily: "Linux"
15
Arch: "aarch64"
16
cores: "8"
17
memory: "32GB"
18
19
- name: "macos-x64"
20
platform:
21
properties:
22
OSFamily: "Darwin"
23
Arch: "x86_64"
24
cores: "8"
25
memory: "32GB"

Resource Allocation#

Configure CPU and memory allocation per worker:

1
# Configure worker resource limits
2
cat > /etc/bazel-worker/resource-limits.conf << 'EOF'
3
# CPU allocation (cores)
4
MAX_CPU_CORES=8
5
MIN_CPU_CORES=2
6
7
# Memory allocation (GB)
8
MAX_MEMORY_GB=32
9
MIN_MEMORY_GB=4
10
11
# Disk space (GB)
12
MAX_DISK_GB=500
13
MIN_DISK_GB=50
14
15
# Concurrent actions
16
MAX_CONCURRENT_ACTIONS=4
17
EOF

Execution Environments#

Set up containerized execution environments:

1
# Dockerfile.rbe-worker-env
2
FROM ubuntu:20.04
3
4
# Install build dependencies
5
RUN apt-get update && apt-get install -y \
6
build-essential \
7
clang \
8
python3 \
9
python3-pip \
10
nodejs \
11
npm \
12
default-jdk \
13
git \
14
curl
15
16
# Install Bazel
17
RUN curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor > bazel.gpg \
18
&& mv bazel.gpg /etc/apt/trusted.gpg.d/ \
19
&& echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" > /etc/apt/sources.list.d/bazel.list \
20
&& apt-get update && apt-get install -y bazel
21
22
WORKDIR /workspace
23
CMD ["/bin/bash"]

DevOps Hub Integration#

Worker Registration#

Register RBE workers with DevOps Hub for centralized management:

1
#!/usr/bin/env python3
2
# register-rbe-worker.py
3
4
import os
5
import requests
6
import platform
7
import psutil
8
import docker
9
10
def get_system_info():
11
"""Gather system information for worker registration."""
12
return {
13
"hostname": platform.node(),
14
"os": platform.system(),
15
"architecture": platform.machine(),
16
"cpu_cores": psutil.cpu_count(),
17
"memory_gb": round(psutil.virtual_memory().total / (1024**3)),
18
"disk_gb": round(psutil.disk_usage('/').total / (1024**3))
19
}
20
21
def get_docker_info():
22
"""Get Docker daemon information."""
23
try:
24
client = docker.from_env()
25
info = client.info()
26
return {
27
"docker_version": info.get("ServerVersion"),
28
"storage_driver": info.get("Driver"),
29
"containers_running": info.get("ContainersRunning", 0)
30
}
31
except Exception:
32
return {"docker_available": False}
33
34
def register_worker_with_devops_hub():
35
"""Register this worker with DevOps Hub."""
36
system_info = get_system_info()
37
docker_info = get_docker_info()
38
39
worker_config = {
40
"worker_id": os.environ.get('WORKER_ID', system_info['hostname']),
41
"platform": "bazel-rbe",
42
"capabilities": [
43
f"{system_info['os'].lower()}-{system_info['architecture'].lower()}",
44
"docker" if docker_info.get("docker_available", True) else "no-docker",
45
"remote-cache",
46
"distributed-builds"
47
],
48
"resources": {
49
"cpu_cores": system_info["cpu_cores"],
50
"memory_gb": system_info["memory_gb"],
51
"disk_gb": system_info["disk_gb"]
52
},
53
"system_info": system_info,
54
"docker_info": docker_info,
55
"endpoints": {
56
"execution": "grpc://0.0.0.0:8980",
57
"health": "http://0.0.0.0:9090/health"
58
}
59
}
60
61
try:
62
response = requests.post(
63
"https://assistance.bg/api/runners/register",
64
headers={
65
"Authorization": f"Bearer {os.environ['DEVOPS_HUB_TOKEN']}",
66
"Content-Type": "application/json"
67
},
68
json=worker_config,
69
timeout=30
70
)
71
72
if response.status_code == 200:
73
print(f"✅ Worker {worker_config['worker_id']} registered successfully")
74
return response.json()
75
else:
76
print(f"❌ Registration failed: {response.status_code} - {response.text}")
77
return None
78
79
except requests.exceptions.RequestException as e:
80
print(f"❌ Registration error: {e}")
81
return None
82
83
if __name__ == "__main__":
84
if not os.environ.get('DEVOPS_HUB_TOKEN'):
85
print("❌ DEVOPS_HUB_TOKEN environment variable required")
86
exit(1)
87
88
registration = register_worker_with_devops_hub()
89
if registration:
90
print(f"🎉 Worker registered with ID: {registration.get('worker_id')}")

Build Metrics Reporting#

Monitor and report build performance metrics:

1
#!/usr/bin/env python3
2
# build-metrics-reporter.py
3
4
import time
5
import requests
6
import json
7
from datetime import datetime
8
9
class BuildMetricsReporter:
10
def __init__(self, devops_hub_token, worker_id):
11
self.token = devops_hub_token
12
self.worker_id = worker_id
13
self.base_url = "https://assistance.bg/api/runners"
14
15
def report_build_metrics(self, build_data):
16
"""Report build execution metrics to DevOps Hub."""
17
metrics = {
18
"worker_id": self.worker_id,
19
"timestamp": datetime.utcnow().isoformat(),
20
"build_id": build_data.get("build_id"),
21
"actions_executed": build_data.get("actions_executed", 0),
22
"actions_cached": build_data.get("actions_cached", 0),
23
"total_duration_seconds": build_data.get("duration"),
24
"peak_memory_mb": build_data.get("peak_memory"),
25
"cpu_utilization_percent": build_data.get("cpu_usage"),
26
"cache_hit_rate": build_data.get("cache_hit_rate", 0.0),
27
"artifacts_produced": build_data.get("artifacts_count", 0)
28
}
29
30
try:
31
response = requests.post(
32
f"{self.base_url}/metrics",
33
headers={
34
"Authorization": f"Bearer {self.token}",
35
"Content-Type": "application/json"
36
},
37
json=metrics
38
)
39
40
if response.status_code == 200:
41
print(f"✅ Build metrics reported for build {build_data.get('build_id')}")
42
else:
43
print(f"⚠️ Metrics reporting failed: {response.status_code}")
44
45
except Exception as e:
46
print(f"❌ Error reporting metrics: {e}")
47
48
def report_health_status(self, status="healthy"):
49
"""Report worker health status."""
50
health_data = {
51
"worker_id": self.worker_id,
52
"status": status,
53
"timestamp": datetime.utcnow().isoformat(),
54
"uptime_seconds": time.time() - self.start_time if hasattr(self, 'start_time') else 0
55
}
56
57
requests.post(
58
f"{self.base_url}/health",
59
headers={"Authorization": f"Bearer {self.token}"},
60
json=health_data
61
)
62
63
# Example usage
64
if __name__ == "__main__":
65
reporter = BuildMetricsReporter(
66
os.environ['DEVOPS_HUB_TOKEN'],
67
os.environ['WORKER_ID']
68
)
69
70
# Example build metrics
71
build_metrics = {
72
"build_id": "build_12345",
73
"actions_executed": 150,
74
"actions_cached": 75,
75
"duration": 180, # 3 minutes
76
"peak_memory": 2048, # MB
77
"cpu_usage": 85.5,
78
"cache_hit_rate": 0.67,
79
"artifacts_count": 25
80
}
81
82
reporter.report_build_metrics(build_metrics)

Multi-Platform Support#

Container-Based Execution#

Configure execution environments for different platforms:

1
# execution-environments.yaml
2
platforms:
3
linux_x64:
4
container_image: "docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest"
5
env_vars:
6
- "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
7
exec_properties:
8
- "container-image=docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest"
9
10
linux_arm64:
11
container_image: "docker://gcr.io/flame-public/rbe-ubuntu16-04-arm64@sha256:latest"
12
env_vars:
13
- "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
14
exec_properties:
15
- "container-image=docker://gcr.io/flame-public/rbe-ubuntu16-04-arm64@sha256:latest"
16
17
windows_x64:
18
container_image: "docker://mcr.microsoft.com/windows/servercore:ltsc2019"
19
env_vars:
20
- "PATH=C:\\Windows\\System32;C:\\Windows"
21
exec_properties:
22
- "container-image=docker://mcr.microsoft.com/windows/servercore:ltsc2019"

Architecture Support#

ARM and x64 Configuration#

Configure workers for different CPU architectures:

1
# Configure ARM64 worker
2
cat > /etc/bazel-rbe/arm64-worker.yaml << 'EOF'
3
worker:
4
instance_name: "arm64_instance"
5
platform:
6
properties:
7
- name: "OSFamily"
8
value: "Linux"
9
- name: "Arch"
10
value: "aarch64"
11
- name: "container-image"
12
value: "docker://gcr.io/flame-public/rbe-ubuntu16-04-arm64@sha256:latest"
13
14
resources:
15
cpu_count: 8
16
memory_bytes: 17179869184 # 16GB
17
EOF
18
19
# Configure x64 worker
20
cat > /etc/bazel-rbe/x64-worker.yaml << 'EOF'
21
worker:
22
instance_name: "x64_instance"
23
platform:
24
properties:
25
- name: "OSFamily"
26
value: "Linux"
27
- name: "Arch"
28
value: "x86_64"
29
- name: "container-image"
30
value: "docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest"
31
32
resources:
33
cpu_count: 16
34
memory_bytes: 34359738368 # 32GB
35
EOF

Test Build#

DevOps Hub Integration Example#

Create a test build with DevOps Hub integration:

1
# .bazelrc - Bazel configuration for DevOps Hub RBE
2
# Remote execution configuration
3
build --remote_executor=grpc://cluster.assistance.bg:443
4
build --remote_cache=grpc://cache.assistance.bg:443
5
build --remote_timeout=3600
6
build --remote_default_exec_properties=container-image=docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest
7
8
# Authentication with DevOps Hub
9
build --remote_header=Authorization=Bearer
10
build --remote_instance_name=default_instance
11
12
# Optimization settings
13
build --remote_download_minimal
14
build --experimental_remote_cache_compression
15
build --experimental_remote_merkle_tree_cache
16
17
# Platform mapping for cross-compilation
18
build --platforms=@bazel_tools//platforms:linux_x86_64
19
build:arm64 --platforms=@bazel_tools//platforms:linux_aarch64
20
build:macos --platforms=@bazel_tools//platforms:darwin_x86_64
21
22
# DevOps Hub specific settings
23
build --remote_header=X-DevOps-Hub-Project=your-project-id
24
build --remote_header=X-Worker-Pool=default
25
26
# Logging and monitoring
27
build --experimental_remote_grpc_log=/tmp/grpc.log
28
build --remote_print_execution_messages=failure
1
# BUILD - Example Bazel build file
2
load("@rules_python//python:defs.bzl", "py_binary", "py_test")
3
4
py_binary(
5
name = "hello_world",
6
srcs = ["hello_world.py"],
7
deps = [
8
"//lib:common_utils",
9
],
10
)
11
12
py_test(
13
name = "hello_world_test",
14
srcs = ["hello_world_test.py"],
15
deps = [
16
":hello_world",
17
"@pip//pytest",
18
],
19
# Run on specific worker pool
20
exec_properties = {
21
"worker-pool": "linux-x64-large",
22
"requires-network": "true",
23
},
24
)
25
26
# Cross-platform binary
27
py_binary(
28
name = "hello_world_arm64",
29
srcs = ["hello_world.py"],
30
deps = [
31
"//lib:common_utils",
32
],
33
target_compatible_with = [
34
"@platforms//cpu:aarch64",
35
"@platforms//os:linux",
36
],
37
)

Run the test build:

1
# Test local build
2
bazel build //src:hello_world
3
4
# Test remote execution
5
bazel build //src:hello_world --config=remote
6
7
# Test ARM64 cross-compilation
8
bazel build //src:hello_world_arm64 --config=arm64
9
10
# Test with specific worker pool
11
bazel build //src:hello_world \
12
--remote_default_exec_properties=worker-pool=linux-x64-large
13
14
# Monitor build with metrics
15
bazel build //... \
16
--experimental_remote_grpc_log=/tmp/rbe-grpc.log \
17
--remote_print_execution_messages=all

Production Deployment#

Scaling Strategy#

Implement auto-scaling for your RBE worker cluster:

1
#!/bin/bash
2
# auto-scale-rbe-workers.sh
3
4
METRICS_URL="https://assistance.bg/api/runners/metrics"
5
MIN_WORKERS=2
6
MAX_WORKERS=20
7
TARGET_CPU_UTILIZATION=70
8
9
# Get current worker metrics
10
get_worker_metrics() {
11
curl -s -H "Authorization: Bearer $DEVOPS_HUB_TOKEN" \
12
"$METRICS_URL" | jq -r '.workers[] | select(.platform == "bazel-rbe")'
13
}
14
15
# Calculate average CPU utilization
16
avg_cpu=$(get_worker_metrics | jq -r '.cpu_utilization_percent' | awk '{sum+=$1; count++} END {print sum/count}')
17
current_workers=$(get_worker_metrics | jq -s 'length')
18
19
# Scale up if CPU utilization is high
20
if (( $(echo "$avg_cpu > $TARGET_CPU_UTILIZATION" | bc -l) )); then
21
if [ "$current_workers" -lt "$MAX_WORKERS" ]; then
22
echo "🔄 Scaling up: CPU at ${avg_cpu}%"
23
new_worker_id="worker-$(date +%s)"
24
25
docker run -d \
26
--name "$new_worker_id" \
27
--network bazel-rbe-cluster \
28
--privileged \
29
-e WORKER_ID="$new_worker_id" \
30
-e DEVOPS_HUB_TOKEN="$DEVOPS_HUB_TOKEN" \
31
gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest
32
33
python3 register-rbe-worker.py
34
fi
35
fi
36
37
# Scale down if CPU utilization is low
38
if (( $(echo "$avg_cpu < 30" | bc -l) )); then
39
if [ "$current_workers" -gt "$MIN_WORKERS" ]; then
40
echo "🔄 Scaling down: CPU at ${avg_cpu}%"
41
# Gracefully shutdown least utilized worker
42
oldest_worker=$(docker ps --format "table {{.Names}}" | grep "rbe-worker" | tail -1)
43
docker stop "$oldest_worker"
44
docker rm "$oldest_worker"
45
fi
46
fi

Monitoring and Health Checks#

Set up comprehensive monitoring:

1
#!/usr/bin/env python3
2
# rbe-health-monitor.py
3
4
import time
5
import requests
6
import docker
7
from prometheus_client import start_http_server, Gauge, Counter
8
9
# Prometheus metrics
10
worker_status = Gauge('rbe_worker_status', 'Worker health status', ['worker_id'])
11
build_duration = Gauge('rbe_build_duration_seconds', 'Build duration', ['worker_id'])
12
cache_hit_rate = Gauge('rbe_cache_hit_rate', 'Cache hit rate', ['worker_id'])
13
actions_executed = Counter('rbe_actions_total', 'Total actions executed', ['worker_id'])
14
15
def monitor_workers():
16
"""Monitor RBE worker health and performance."""
17
client = docker.from_env()
18
19
while True:
20
try:
21
# Check worker containers
22
workers = client.containers.list(filters={'name': 'rbe-worker'})
23
24
for worker in workers:
25
worker_id = worker.name
26
27
# Check container health
28
if worker.status == 'running':
29
worker_status.labels(worker_id=worker_id).set(1)
30
else:
31
worker_status.labels(worker_id=worker_id).set(0)
32
33
# Get worker stats
34
stats = worker.stats(stream=False)
35
cpu_usage = calculate_cpu_percent(stats)
36
37
# Report to DevOps Hub
38
health_data = {
39
"worker_id": worker_id,
40
"status": "healthy" if worker.status == 'running' else "unhealthy",
41
"cpu_usage": cpu_usage,
42
"timestamp": time.time()
43
}
44
45
requests.post(
46
"https://assistance.bg/api/runners/health",
47
headers={"Authorization": f"Bearer {os.environ['DEVOPS_HUB_TOKEN']}"},
48
json=health_data
49
)
50
51
time.sleep(30) # Check every 30 seconds
52
53
except Exception as e:
54
print(f"❌ Monitoring error: {e}")
55
time.sleep(60)
56
57
def calculate_cpu_percent(stats):
58
"""Calculate CPU usage percentage from Docker stats."""
59
cpu_delta = stats['cpu_stats']['cpu_usage']['total_usage'] - \
60
stats['precpu_stats']['cpu_usage']['total_usage']
61
system_delta = stats['cpu_stats']['system_cpu_usage'] - \
62
stats['precpu_stats']['system_cpu_usage']
63
64
if system_delta > 0:
65
return (cpu_delta / system_delta) * 100.0
66
return 0.0
67
68
if __name__ == "__main__":
69
# Start Prometheus metrics server
70
start_http_server(9090)
71
print("🔍 RBE health monitor started on port 9090")
72
73
monitor_workers()

Fault Tolerance#

Implement failover and recovery mechanisms:

1
#!/bin/bash
2
# rbe-failover.sh
3
4
BACKUP_REGIONS=("us-west-2" "eu-central-1" "ap-northeast-1")
5
PRIMARY_ENDPOINT="cluster.assistance.bg:443"
6
7
check_primary_health() {
8
grpc_health_probe -addr="$PRIMARY_ENDPOINT" -service=""
9
return $?
10
}
11
12
failover_to_backup() {
13
local region=$1
14
echo "🔄 Failing over to backup region: $region"
15
16
# Update .bazelrc to use backup endpoint
17
sed -i "s|grpc://cluster.assistance.bg:443|grpc://$region-cluster.assistance.bg:443|g" .bazelrc
18
19
# Notify DevOps Hub of failover
20
curl -X POST "https://assistance.bg/api/runners/failover" \
21
-H "Authorization: Bearer $DEVOPS_HUB_TOKEN" \
22
-H "Content-Type: application/json" \
23
-d "{\"primary_endpoint\": \"$PRIMARY_ENDPOINT\", \"backup_endpoint\": \"$region-cluster.assistance.bg:443\"}"
24
}
25
26
# Main failover logic
27
if ! check_primary_health; then
28
echo "❌ Primary RBE cluster unhealthy"
29
30
for region in "${BACKUP_REGIONS[@]}"; do
31
if grpc_health_probe -addr="$region-cluster.assistance.bg:443" -service=""; then
32
failover_to_backup "$region"
33
break
34
fi
35
done
36
else
37
echo "✅ Primary RBE cluster healthy"
38
fi

Your Bazel Remote Execution cluster is now configured for production use with DevOps Hub integration. The distributed workers provide scalable build infrastructure while maintaining visibility and control through centralized monitoring and management.