Bazel Remote Execution Workers
Deploy and manage distributed build infrastructure with Bazel Remote Execution
Bazel Remote Execution (RBE) transforms your build process into a distributed system, allowing you to scale builds across multiple worker machines for faster, more efficient compilation and testing.
BuildBuddy Integration#
BuildBuddy is a fully managed Bazel platform providing build result streaming, remote caching, and remote build execution. It's free for individuals and open source projects, with an Enterprise tier for advanced features.
BuildBuddy Cloud Quickstart#
Get started with BuildBuddy Cloud in minutes by adding two lines to your .bazelrc:
1# .bazelrc - BuildBuddy Cloud configuration2build --bes_results_url=https://app.buildbuddy.io/invocation/3build --bes_backend=grpcs://remote.buildbuddy.ioAfter configuring, builds will display a URL for viewing results:
1$ bazel build //src:main2INFO: Streaming build results to: https://app.buildbuddy.io/invocation/24a37b8f-4cf2-4909-9522-3cc91d2ebfc43INFO: Build completed successfully, 42 total actionsBuildBuddy Authentication#
Configure API key authentication for private build logs:
1# .bazelrc - Authenticated BuildBuddy configuration2build --bes_results_url=https://app.buildbuddy.io/invocation/3build --bes_backend=grpcs://remote.buildbuddy.io4build --remote_header=x-buildbuddy-api-key=YOUR_API_KEYRetrieve your API key from app.buildbuddy.io/docs/setup after creating an account.
BuildBuddy Remote Cache#
Enable remote caching to share build artifacts across your team:
1# .bazelrc - Remote cache configuration2build --bes_results_url=https://app.buildbuddy.io/invocation/3build --bes_backend=grpcs://remote.buildbuddy.io4build --remote_cache=grpcs://remote.buildbuddy.io5build --remote_header=x-buildbuddy-api-key=YOUR_API_KEY67# Optional optimizations8build --remote_download_minimal9build --experimental_remote_cache_compressionBuildBuddy Remote Build Execution#
For distributed builds, enable remote execution:
1# .bazelrc - Full RBE configuration2build --bes_results_url=https://app.buildbuddy.io/invocation/3build --bes_backend=grpcs://remote.buildbuddy.io4build --remote_executor=grpcs://remote.buildbuddy.io5build --remote_cache=grpcs://remote.buildbuddy.io6build --remote_header=x-buildbuddy-api-key=YOUR_API_KEY78# Platform configuration9build --host_platform=@buildbuddy_toolchain//:platform10build --platforms=@buildbuddy_toolchain//:platform11build --extra_toolchains=@buildbuddy_toolchain//:cc_toolchain1213# Performance tuning14build --jobs=5015build --remote_timeout=3600BuildBuddy Self-Hosted Deployment#
For organizations requiring on-premises infrastructure, BuildBuddy offers self-hosted deployment options.
Docker Compose Deployment#
Deploy a minimal BuildBuddy instance with Docker Compose:
1# docker-compose.yaml2version: '3.8'34services:5 buildbuddy:6 image: gcr.io/flame-public/buildbuddy-app-onprem:latest7 ports:8 - "8080:8080" # Web UI9 - "1985:1985" # gRPC (BES + Remote Cache)10 - "1986:1986" # gRPCS (TLS)11 volumes:12 - buildbuddy-data:/data13 environment:14 - BB_DATABASE_DATA_SOURCE=sqlite3:///data/buildbuddy.db15 - BB_STORAGE_DISK_ROOT_DIR=/data/storage16 - BB_CACHE_DISK_ROOT_DIR=/data/cache17 - BB_CACHE_MAX_SIZE_BYTES=10737418240 # 10GB18 restart: unless-stopped1920volumes:21 buildbuddy-data:Start the service:
1docker-compose up -d23# Verify the service is running4curl http://localhost:8080/healthConfigure Bazel to use your self-hosted instance:
1# .bazelrc - Self-hosted BuildBuddy2build --bes_results_url=http://buildbuddy.internal:8080/invocation/3build --bes_backend=grpc://buildbuddy.internal:19854build --remote_cache=grpc://buildbuddy.internal:1985Kubernetes Deployment#
Deploy BuildBuddy on Kubernetes with Helm:
1# Add BuildBuddy Helm repository2helm repo add buildbuddy https://helm.buildbuddy.io3helm repo update45# Install BuildBuddy6helm install buildbuddy buildbuddy/buildbuddy-enterprise \7 --namespace buildbuddy \8 --create-namespace \9 --set ingress.enabled=true \10 --set ingress.host=buildbuddy.your-domain.com \11 --set database.external.enabled=false \12 --set redis.enabled=true \13 --set executor.enabled=true \14 --set executor.replicas=4Kubernetes values configuration:
1# values.yaml - BuildBuddy Kubernetes configuration2ingress:3 enabled: true4 host: buildbuddy.your-domain.com5 tls:6 enabled: true7 secretName: buildbuddy-tls89database:10 external:11 enabled: true12 datasource: "postgres://user:[email protected]:5432/buildbuddy?sslmode=require"1314redis:15 enabled: true16 replicas: 31718cache:19 disk:20 enabled: true21 rootDirectory: /data/cache22 maxSizeBytes: 107374182400 # 100GB2324executor:25 enabled: true26 replicas: 827 resources:28 requests:29 cpu: "4"30 memory: "8Gi"31 limits:32 cpu: "8"33 memory: "16Gi"34 nodeSelector:35 workload-type: bazel-executorSelf-Hosted with Remote Executors#
For distributed execution across multiple machines:
1# executor-config.yaml2executor:3 app_target: "grpc://buildbuddy-app.internal:1985"4 root_directory: "/data/executor"5 host_id: "executor-${HOSTNAME}"67 local_cache:8 max_size_bytes: 10737418240 # 10GB9 root_directory: "/data/cache"1011 runner:12 pool:13 name: "default"14 runner_type: CONTAINER1516 container:17 default_image: "gcr.io/flame-public/executor-docker-default:latest"18 enable_dockerd: trueDeploy executor nodes:
1#!/bin/bash2# deploy-executor.sh34docker run -d \5 --name buildbuddy-executor \6 --privileged \7 -v /var/run/docker.sock:/var/run/docker.sock \8 -v /data/executor:/data/executor \9 -v /data/cache:/data/cache \10 -e HOSTNAME=$(hostname) \11 gcr.io/flame-public/buildbuddy-executor:latest \12 --config=/etc/executor-config.yamlDevOps Hub Integration for BuildBuddy#
Register your self-hosted BuildBuddy instance with DevOps Hub:
1#!/usr/bin/env python32# register-buildbuddy.py34import os5import requests67def register_buildbuddy_instance():8 """Register BuildBuddy instance with DevOps Hub."""9 config = {10 "platform": "buildbuddy",11 "deployment_type": "self-hosted",12 "instance_url": os.environ.get("BUILDBUDDY_URL", "http://buildbuddy.internal:8080"),13 "grpc_endpoint": os.environ.get("BUILDBUDDY_GRPC", "grpc://buildbuddy.internal:1985"),14 "capabilities": [15 "build-event-streaming",16 "remote-cache",17 "remote-execution"18 ],19 "executor_count": int(os.environ.get("EXECUTOR_COUNT", 4)),20 "cache_size_gb": int(os.environ.get("CACHE_SIZE_GB", 100))21 }2223 response = requests.post(24 "https://assistance.bg/api/runners/register",25 headers={26 "Authorization": f"Bearer {os.environ['DEVOPS_HUB_TOKEN']}",27 "Content-Type": "application/json"28 },29 json=config30 )3132 if response.status_code == 200:33 print(f"✅ BuildBuddy instance registered")34 return response.json()35 else:36 print(f"❌ Registration failed: {response.text}")37 return None3839if __name__ == "__main__":40 register_buildbuddy_instance()Platform Overview#
Bazel Remote Execution is Google's distributed build system that executes build actions on remote worker machines rather than locally. Instead of running builds on a single machine, RBE distributes work across a cluster of workers, providing:
Distributed Computing Execute build actions across multiple machines simultaneously, dramatically reducing build times for large codebases.
Remote Caching Share build artifacts across your team and CI/CD systems, eliminating redundant compilation work.
Consistent Environments Run all builds in containerized environments, ensuring reproducible results regardless of the underlying infrastructure.
Horizontal Scaling Add worker capacity on-demand to handle varying build loads and peak development times.
Prerequisites#
Before setting up Bazel RBE workers, ensure you have:
- Bazel 6.0+ installed on client machines
- Docker or Podman for containerized execution environments
- gRPC networking knowledge for worker coordination
- Container registry access for execution environment images
- Network connectivity between workers and build clients
- DevOps Hub account with runner management permissions
Infrastructure Setup#
Worker Cluster Deployment#
Deploy your RBE worker cluster across multiple machines for optimal performance:
1# Create dedicated network for worker communication2docker network create bazel-rbe-cluster34# Deploy worker coordinator5docker run -d \6 --name rbe-coordinator \7 --network bazel-rbe-cluster \8 -p 8980:8980 \9 -e COORDINATOR_PORT=8980 \10 gcr.io/bazel-remote/bazel-remote:latest1112# Launch worker instances13for i in {1..4}; do14 docker run -d \15 --name rbe-worker-$i \16 --network bazel-rbe-cluster \17 --privileged \18 -v /tmp/worker-$i:/tmp \19 -e WORKER_ID=worker-$i \20 -e COORDINATOR_ENDPOINT=rbe-coordinator:8980 \21 gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest22doneNetwork Configuration#
Configure network access for distributed builds:
1# Configure firewall for gRPC communication2sudo ufw allow 8980/tcp # Coordinator port3sudo ufw allow 8981/tcp # Worker registration port4sudo ufw allow 9090/tcp # Metrics endpoint56# Set up load balancer for worker pool7cat > /etc/nginx/sites-available/bazel-rbe << 'EOF'8upstream rbe_workers {9 server worker-1:8980;10 server worker-2:8980;11 server worker-3:8980;12 server worker-4:8980;13}1415server {16 listen 443 ssl;17 server_name cluster.assistance.bg;1819 location / {20 grpc_pass grpc://rbe_workers;21 grpc_set_header Host $host;22 }23}24EOFWorker Installation#
Linux Worker Setup#
Install and configure RBE workers on Linux systems:
1#!/bin/bash2# install-rbe-worker-linux.sh34# Download Bazel Remote Execution worker5wget https://github.com/bazelbuild/remote-apis-sdks/releases/latest/download/worker-linux-amd646chmod +x worker-linux-amd647sudo mv worker-linux-amd64 /usr/local/bin/bazel-rbe-worker89# Create worker configuration10cat > /etc/bazel-rbe-worker.yaml << 'EOF'11worker:12 instance_name: "default_instance"13 platform:14 properties:15 - name: "OSFamily"16 value: "Linux"17 - name: "container-image"18 value: "docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest"1920resources:21 cpu_count: 822 memory_bytes: 34359738368 # 32GB2324endpoints:25 execution: "grpc://cluster.assistance.bg:443"26 cache: "grpc://cache.assistance.bg:443"2728logging:29 level: "INFO"30 file: "/var/log/bazel-rbe-worker.log"31EOF3233# Create systemd service34cat > /etc/systemd/system/bazel-rbe-worker.service << 'EOF'35[Unit]36Description=Bazel Remote Execution Worker37After=network.target3839[Service]40Type=simple41User=bazel-worker42ExecStart=/usr/local/bin/bazel-rbe-worker --config=/etc/bazel-rbe-worker.yaml43Restart=always44RestartSec=54546[Install]47WantedBy=multi-user.target48EOF4950# Create worker user51sudo useradd -r -s /bin/false bazel-worker52sudo systemctl enable bazel-rbe-worker53sudo systemctl start bazel-rbe-workermacOS Worker Setup#
Configure RBE workers on macOS systems:
1#!/bin/bash2# install-rbe-worker-macos.sh34# Download macOS worker binary5curl -L -o /usr/local/bin/bazel-rbe-worker \6 https://github.com/bazelbuild/remote-apis-sdks/releases/latest/download/worker-darwin-amd647chmod +x /usr/local/bin/bazel-rbe-worker89# Create launch daemon configuration10cat > ~/Library/LaunchAgents/com.bazel.rbe.worker.plist << 'EOF'11<?xml version="1.0" encoding="UTF-8"?>12<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">13<plist version="1.0">14<dict>15 <key>Label</key>16 <string>com.bazel.rbe.worker</string>17 <key>ProgramArguments</key>18 <array>19 <string>/usr/local/bin/bazel-rbe-worker</string>20 <string>--config=/usr/local/etc/bazel-rbe-worker.yaml</string>21 </array>22 <key>RunAtLoad</key>23 <true/>24 <key>KeepAlive</key>25 <true/>26</dict>27</plist>28EOF2930# Load the service31launchctl load ~/Library/LaunchAgents/com.bazel.rbe.worker.plistWindows Worker Setup#
Set up RBE workers on Windows systems:
1# install-rbe-worker-windows.ps123# Download Windows worker binary4$workerUrl = "https://github.com/bazelbuild/remote-apis-sdks/releases/latest/download/worker-windows-amd64.exe"5Invoke-WebRequest -Uri $workerUrl -OutFile "C:\Program Files\Bazel\bazel-rbe-worker.exe"67# Create worker configuration8$config = @"9worker:10 instance_name: "default_instance"11 platform:12 properties:13 - name: "OSFamily"14 value: "Windows"15 - name: "container-image"16 value: "docker://mcr.microsoft.com/windows/servercore:ltsc2019"1718resources:19 cpu_count: 820 memory_bytes: 343597383682122endpoints:23 execution: "grpc://cluster.assistance.bg:443"24 cache: "grpc://cache.assistance.bg:443"25"@2627$config | Out-File -FilePath "C:\Program Files\Bazel\worker-config.yaml" -Encoding UTF82829# Create Windows service30sc create "BazelRBEWorker" binPath= "C:\Program Files\Bazel\bazel-rbe-worker.exe --config=C:\Program Files\Bazel\worker-config.yaml"31sc start "BazelRBEWorker"Configuration#
Worker Pools#
Configure worker pools for different build requirements:
1# worker-pool-config.yaml2worker_pools:3 - name: "linux-x64-large"4 platform:5 properties:6 OSFamily: "Linux"7 Arch: "x86_64"8 cores: "16"9 memory: "64GB"1011 - name: "linux-arm64"12 platform:13 properties:14 OSFamily: "Linux"15 Arch: "aarch64"16 cores: "8"17 memory: "32GB"1819 - name: "macos-x64"20 platform:21 properties:22 OSFamily: "Darwin"23 Arch: "x86_64"24 cores: "8"25 memory: "32GB"Resource Allocation#
Configure CPU and memory allocation per worker:
1# Configure worker resource limits2cat > /etc/bazel-worker/resource-limits.conf << 'EOF'3# CPU allocation (cores)4MAX_CPU_CORES=85MIN_CPU_CORES=267# Memory allocation (GB)8MAX_MEMORY_GB=329MIN_MEMORY_GB=41011# Disk space (GB)12MAX_DISK_GB=50013MIN_DISK_GB=501415# Concurrent actions16MAX_CONCURRENT_ACTIONS=417EOFExecution Environments#
Set up containerized execution environments:
1# Dockerfile.rbe-worker-env2FROM ubuntu:20.0434# Install build dependencies5RUN apt-get update && apt-get install -y \6 build-essential \7 clang \8 python3 \9 python3-pip \10 nodejs \11 npm \12 default-jdk \13 git \14 curl1516# Install Bazel17RUN curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor > bazel.gpg \18 && mv bazel.gpg /etc/apt/trusted.gpg.d/ \19 && echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" > /etc/apt/sources.list.d/bazel.list \20 && apt-get update && apt-get install -y bazel2122WORKDIR /workspace23CMD ["/bin/bash"]DevOps Hub Integration#
Worker Registration#
Register RBE workers with DevOps Hub for centralized management:
1#!/usr/bin/env python32# register-rbe-worker.py34import os5import requests6import platform7import psutil8import docker910def get_system_info():11 """Gather system information for worker registration."""12 return {13 "hostname": platform.node(),14 "os": platform.system(),15 "architecture": platform.machine(),16 "cpu_cores": psutil.cpu_count(),17 "memory_gb": round(psutil.virtual_memory().total / (1024**3)),18 "disk_gb": round(psutil.disk_usage('/').total / (1024**3))19 }2021def get_docker_info():22 """Get Docker daemon information."""23 try:24 client = docker.from_env()25 info = client.info()26 return {27 "docker_version": info.get("ServerVersion"),28 "storage_driver": info.get("Driver"),29 "containers_running": info.get("ContainersRunning", 0)30 }31 except Exception:32 return {"docker_available": False}3334def register_worker_with_devops_hub():35 """Register this worker with DevOps Hub."""36 system_info = get_system_info()37 docker_info = get_docker_info()3839 worker_config = {40 "worker_id": os.environ.get('WORKER_ID', system_info['hostname']),41 "platform": "bazel-rbe",42 "capabilities": [43 f"{system_info['os'].lower()}-{system_info['architecture'].lower()}",44 "docker" if docker_info.get("docker_available", True) else "no-docker",45 "remote-cache",46 "distributed-builds"47 ],48 "resources": {49 "cpu_cores": system_info["cpu_cores"],50 "memory_gb": system_info["memory_gb"],51 "disk_gb": system_info["disk_gb"]52 },53 "system_info": system_info,54 "docker_info": docker_info,55 "endpoints": {56 "execution": "grpc://0.0.0.0:8980",57 "health": "http://0.0.0.0:9090/health"58 }59 }6061 try:62 response = requests.post(63 "https://assistance.bg/api/runners/register",64 headers={65 "Authorization": f"Bearer {os.environ['DEVOPS_HUB_TOKEN']}",66 "Content-Type": "application/json"67 },68 json=worker_config,69 timeout=3070 )7172 if response.status_code == 200:73 print(f"✅ Worker {worker_config['worker_id']} registered successfully")74 return response.json()75 else:76 print(f"❌ Registration failed: {response.status_code} - {response.text}")77 return None7879 except requests.exceptions.RequestException as e:80 print(f"❌ Registration error: {e}")81 return None8283if __name__ == "__main__":84 if not os.environ.get('DEVOPS_HUB_TOKEN'):85 print("❌ DEVOPS_HUB_TOKEN environment variable required")86 exit(1)8788 registration = register_worker_with_devops_hub()89 if registration:90 print(f"🎉 Worker registered with ID: {registration.get('worker_id')}")Build Metrics Reporting#
Monitor and report build performance metrics:
1#!/usr/bin/env python32# build-metrics-reporter.py34import time5import requests6import json7from datetime import datetime89class BuildMetricsReporter:10 def __init__(self, devops_hub_token, worker_id):11 self.token = devops_hub_token12 self.worker_id = worker_id13 self.base_url = "https://assistance.bg/api/runners"1415 def report_build_metrics(self, build_data):16 """Report build execution metrics to DevOps Hub."""17 metrics = {18 "worker_id": self.worker_id,19 "timestamp": datetime.utcnow().isoformat(),20 "build_id": build_data.get("build_id"),21 "actions_executed": build_data.get("actions_executed", 0),22 "actions_cached": build_data.get("actions_cached", 0),23 "total_duration_seconds": build_data.get("duration"),24 "peak_memory_mb": build_data.get("peak_memory"),25 "cpu_utilization_percent": build_data.get("cpu_usage"),26 "cache_hit_rate": build_data.get("cache_hit_rate", 0.0),27 "artifacts_produced": build_data.get("artifacts_count", 0)28 }2930 try:31 response = requests.post(32 f"{self.base_url}/metrics",33 headers={34 "Authorization": f"Bearer {self.token}",35 "Content-Type": "application/json"36 },37 json=metrics38 )3940 if response.status_code == 200:41 print(f"✅ Build metrics reported for build {build_data.get('build_id')}")42 else:43 print(f"⚠️ Metrics reporting failed: {response.status_code}")4445 except Exception as e:46 print(f"❌ Error reporting metrics: {e}")4748 def report_health_status(self, status="healthy"):49 """Report worker health status."""50 health_data = {51 "worker_id": self.worker_id,52 "status": status,53 "timestamp": datetime.utcnow().isoformat(),54 "uptime_seconds": time.time() - self.start_time if hasattr(self, 'start_time') else 055 }5657 requests.post(58 f"{self.base_url}/health",59 headers={"Authorization": f"Bearer {self.token}"},60 json=health_data61 )6263# Example usage64if __name__ == "__main__":65 reporter = BuildMetricsReporter(66 os.environ['DEVOPS_HUB_TOKEN'],67 os.environ['WORKER_ID']68 )6970 # Example build metrics71 build_metrics = {72 "build_id": "build_12345",73 "actions_executed": 150,74 "actions_cached": 75,75 "duration": 180, # 3 minutes76 "peak_memory": 2048, # MB77 "cpu_usage": 85.5,78 "cache_hit_rate": 0.67,79 "artifacts_count": 2580 }8182 reporter.report_build_metrics(build_metrics)Multi-Platform Support#
Container-Based Execution#
Configure execution environments for different platforms:
1# execution-environments.yaml2platforms:3 linux_x64:4 container_image: "docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest"5 env_vars:6 - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"7 exec_properties:8 - "container-image=docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest"910 linux_arm64:11 container_image: "docker://gcr.io/flame-public/rbe-ubuntu16-04-arm64@sha256:latest"12 env_vars:13 - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"14 exec_properties:15 - "container-image=docker://gcr.io/flame-public/rbe-ubuntu16-04-arm64@sha256:latest"1617 windows_x64:18 container_image: "docker://mcr.microsoft.com/windows/servercore:ltsc2019"19 env_vars:20 - "PATH=C:\\Windows\\System32;C:\\Windows"21 exec_properties:22 - "container-image=docker://mcr.microsoft.com/windows/servercore:ltsc2019"Architecture Support#
ARM and x64 Configuration#
Configure workers for different CPU architectures:
1# Configure ARM64 worker2cat > /etc/bazel-rbe/arm64-worker.yaml << 'EOF'3worker:4 instance_name: "arm64_instance"5 platform:6 properties:7 - name: "OSFamily"8 value: "Linux"9 - name: "Arch"10 value: "aarch64"11 - name: "container-image"12 value: "docker://gcr.io/flame-public/rbe-ubuntu16-04-arm64@sha256:latest"1314resources:15 cpu_count: 816 memory_bytes: 17179869184 # 16GB17EOF1819# Configure x64 worker20cat > /etc/bazel-rbe/x64-worker.yaml << 'EOF'21worker:22 instance_name: "x64_instance"23 platform:24 properties:25 - name: "OSFamily"26 value: "Linux"27 - name: "Arch"28 value: "x86_64"29 - name: "container-image"30 value: "docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest"3132resources:33 cpu_count: 1634 memory_bytes: 34359738368 # 32GB35EOFTest Build#
DevOps Hub Integration Example#
Create a test build with DevOps Hub integration:
1# .bazelrc - Bazel configuration for DevOps Hub RBE2# Remote execution configuration3build --remote_executor=grpc://cluster.assistance.bg:4434build --remote_cache=grpc://cache.assistance.bg:4435build --remote_timeout=36006build --remote_default_exec_properties=container-image=docker://gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest78# Authentication with DevOps Hub9build --remote_header=Authorization=Bearer10build --remote_instance_name=default_instance1112# Optimization settings13build --remote_download_minimal14build --experimental_remote_cache_compression15build --experimental_remote_merkle_tree_cache1617# Platform mapping for cross-compilation18build --platforms=@bazel_tools//platforms:linux_x86_6419build:arm64 --platforms=@bazel_tools//platforms:linux_aarch6420build:macos --platforms=@bazel_tools//platforms:darwin_x86_642122# DevOps Hub specific settings23build --remote_header=X-DevOps-Hub-Project=your-project-id24build --remote_header=X-Worker-Pool=default2526# Logging and monitoring27build --experimental_remote_grpc_log=/tmp/grpc.log28build --remote_print_execution_messages=failure1# BUILD - Example Bazel build file2load("@rules_python//python:defs.bzl", "py_binary", "py_test")34py_binary(5 name = "hello_world",6 srcs = ["hello_world.py"],7 deps = [8 "//lib:common_utils",9 ],10)1112py_test(13 name = "hello_world_test",14 srcs = ["hello_world_test.py"],15 deps = [16 ":hello_world",17 "@pip//pytest",18 ],19 # Run on specific worker pool20 exec_properties = {21 "worker-pool": "linux-x64-large",22 "requires-network": "true",23 },24)2526# Cross-platform binary27py_binary(28 name = "hello_world_arm64",29 srcs = ["hello_world.py"],30 deps = [31 "//lib:common_utils",32 ],33 target_compatible_with = [34 "@platforms//cpu:aarch64",35 "@platforms//os:linux",36 ],37)Run the test build:
1# Test local build2bazel build //src:hello_world34# Test remote execution5bazel build //src:hello_world --config=remote67# Test ARM64 cross-compilation8bazel build //src:hello_world_arm64 --config=arm64910# Test with specific worker pool11bazel build //src:hello_world \12 --remote_default_exec_properties=worker-pool=linux-x64-large1314# Monitor build with metrics15bazel build //... \16 --experimental_remote_grpc_log=/tmp/rbe-grpc.log \17 --remote_print_execution_messages=allProduction Deployment#
Scaling Strategy#
Implement auto-scaling for your RBE worker cluster:
1#!/bin/bash2# auto-scale-rbe-workers.sh34METRICS_URL="https://assistance.bg/api/runners/metrics"5MIN_WORKERS=26MAX_WORKERS=207TARGET_CPU_UTILIZATION=7089# Get current worker metrics10get_worker_metrics() {11 curl -s -H "Authorization: Bearer $DEVOPS_HUB_TOKEN" \12 "$METRICS_URL" | jq -r '.workers[] | select(.platform == "bazel-rbe")'13}1415# Calculate average CPU utilization16avg_cpu=$(get_worker_metrics | jq -r '.cpu_utilization_percent' | awk '{sum+=$1; count++} END {print sum/count}')17current_workers=$(get_worker_metrics | jq -s 'length')1819# Scale up if CPU utilization is high20if (( $(echo "$avg_cpu > $TARGET_CPU_UTILIZATION" | bc -l) )); then21 if [ "$current_workers" -lt "$MAX_WORKERS" ]; then22 echo "🔄 Scaling up: CPU at ${avg_cpu}%"23 new_worker_id="worker-$(date +%s)"2425 docker run -d \26 --name "$new_worker_id" \27 --network bazel-rbe-cluster \28 --privileged \29 -e WORKER_ID="$new_worker_id" \30 -e DEVOPS_HUB_TOKEN="$DEVOPS_HUB_TOKEN" \31 gcr.io/flame-public/rbe-ubuntu16-04@sha256:latest3233 python3 register-rbe-worker.py34 fi35fi3637# Scale down if CPU utilization is low38if (( $(echo "$avg_cpu < 30" | bc -l) )); then39 if [ "$current_workers" -gt "$MIN_WORKERS" ]; then40 echo "🔄 Scaling down: CPU at ${avg_cpu}%"41 # Gracefully shutdown least utilized worker42 oldest_worker=$(docker ps --format "table {{.Names}}" | grep "rbe-worker" | tail -1)43 docker stop "$oldest_worker"44 docker rm "$oldest_worker"45 fi46fiMonitoring and Health Checks#
Set up comprehensive monitoring:
1#!/usr/bin/env python32# rbe-health-monitor.py34import time5import requests6import docker7from prometheus_client import start_http_server, Gauge, Counter89# Prometheus metrics10worker_status = Gauge('rbe_worker_status', 'Worker health status', ['worker_id'])11build_duration = Gauge('rbe_build_duration_seconds', 'Build duration', ['worker_id'])12cache_hit_rate = Gauge('rbe_cache_hit_rate', 'Cache hit rate', ['worker_id'])13actions_executed = Counter('rbe_actions_total', 'Total actions executed', ['worker_id'])1415def monitor_workers():16 """Monitor RBE worker health and performance."""17 client = docker.from_env()1819 while True:20 try:21 # Check worker containers22 workers = client.containers.list(filters={'name': 'rbe-worker'})2324 for worker in workers:25 worker_id = worker.name2627 # Check container health28 if worker.status == 'running':29 worker_status.labels(worker_id=worker_id).set(1)30 else:31 worker_status.labels(worker_id=worker_id).set(0)3233 # Get worker stats34 stats = worker.stats(stream=False)35 cpu_usage = calculate_cpu_percent(stats)3637 # Report to DevOps Hub38 health_data = {39 "worker_id": worker_id,40 "status": "healthy" if worker.status == 'running' else "unhealthy",41 "cpu_usage": cpu_usage,42 "timestamp": time.time()43 }4445 requests.post(46 "https://assistance.bg/api/runners/health",47 headers={"Authorization": f"Bearer {os.environ['DEVOPS_HUB_TOKEN']}"},48 json=health_data49 )5051 time.sleep(30) # Check every 30 seconds5253 except Exception as e:54 print(f"❌ Monitoring error: {e}")55 time.sleep(60)5657def calculate_cpu_percent(stats):58 """Calculate CPU usage percentage from Docker stats."""59 cpu_delta = stats['cpu_stats']['cpu_usage']['total_usage'] - \60 stats['precpu_stats']['cpu_usage']['total_usage']61 system_delta = stats['cpu_stats']['system_cpu_usage'] - \62 stats['precpu_stats']['system_cpu_usage']6364 if system_delta > 0:65 return (cpu_delta / system_delta) * 100.066 return 0.06768if __name__ == "__main__":69 # Start Prometheus metrics server70 start_http_server(9090)71 print("🔍 RBE health monitor started on port 9090")7273 monitor_workers()Fault Tolerance#
Implement failover and recovery mechanisms:
1#!/bin/bash2# rbe-failover.sh34BACKUP_REGIONS=("us-west-2" "eu-central-1" "ap-northeast-1")5PRIMARY_ENDPOINT="cluster.assistance.bg:443"67check_primary_health() {8 grpc_health_probe -addr="$PRIMARY_ENDPOINT" -service=""9 return $?10}1112failover_to_backup() {13 local region=$114 echo "🔄 Failing over to backup region: $region"1516 # Update .bazelrc to use backup endpoint17 sed -i "s|grpc://cluster.assistance.bg:443|grpc://$region-cluster.assistance.bg:443|g" .bazelrc1819 # Notify DevOps Hub of failover20 curl -X POST "https://assistance.bg/api/runners/failover" \21 -H "Authorization: Bearer $DEVOPS_HUB_TOKEN" \22 -H "Content-Type: application/json" \23 -d "{\"primary_endpoint\": \"$PRIMARY_ENDPOINT\", \"backup_endpoint\": \"$region-cluster.assistance.bg:443\"}"24}2526# Main failover logic27if ! check_primary_health; then28 echo "❌ Primary RBE cluster unhealthy"2930 for region in "${BACKUP_REGIONS[@]}"; do31 if grpc_health_probe -addr="$region-cluster.assistance.bg:443" -service=""; then32 failover_to_backup "$region"33 break34 fi35 done36else37 echo "✅ Primary RBE cluster healthy"38fiYour Bazel Remote Execution cluster is now configured for production use with DevOps Hub integration. The distributed workers provide scalable build infrastructure while maintaining visibility and control through centralized monitoring and management.