Troubleshooting
This page uses symptom -> cause -> diagnostic -> fix. Start here when doctor, target setup, metrics, or LLM calls fail.
Quick Table
| Symptom | Likely cause | Diagnostic command | Fix |
|---|---|---|---|
sysbench not found |
Package missing | command -v sysbench |
sudo apt-get install -y sysbench |
| No perf metrics | perf unavailable or blocked |
perf stat true |
Install matching linux-tools; check kernel perf permissions. |
| RAPL power metrics missing | CPU/kernel lacks powercap or permission | ls /sys/class/powercap |
Run without power metrics or use supported hardware/kernel. |
| Scheduler path missing | debugfs not mounted or kernel lacks files |
ls /sys/kernel/debug/sched |
sudo mount -t debugfs none /sys/kernel/debug |
| CPU frequency path missing | cpufreq driver unavailable | ls /sys/devices/system/cpu/cpufreq |
Remove cpufreq params or load/use supported driver. |
| Governor rejected | Governor not available for policy | cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors |
Use an available governor. |
GEMINI_API_KEY missing under sudo |
Environment not preserved | sudo --preserve-env=GEMINI_API_KEY env \| grep GEMINI |
Add the key to --preserve-env. |
OPENROUTER_API_KEY missing under sudo |
Environment not preserved | sudo --preserve-env=OPENROUTER_API_KEY env \| grep OPENROUTER |
Add the key to --preserve-env. |
| BenchBase jar missing | Submodule not built | ls deps/benchbase/target/benchbase-postgres/benchbase.jar |
Build BenchBase with PostgreSQL profile. |
| PostgreSQL connection failure | DB/user/password/service mismatch | PGPASSWORD=password psql -h localhost -U admin -d benchbase -c 'select 1;' |
Re-run sudo scripts/setup_tpcc_postgres.sh. |
| BenchBase timeout | DB load issue, slow host, too-short timeout | Inspect benchbase_windows/window_<n>/ logs |
Increase benchbase_timeout_buffer_seconds; verify DB health. |
| Sysbench ends early | Continuous duration too short or process failed | Inspect sysbench_cpu_windows/continuous_sysbench.log |
Increase sysbench_continuous_duration; check sysbench stderr. |
| Missing metric in history | Parser did not emit it | jq '.history[0].metrics' results/<run>/optimization_history_*.json |
Fix parser/config metric name before changing tuner. |
| Optional tuner import error | Extra dependency missing | Read traceback for smac, mlos, or torch |
Install .[bayesian], .[mlos], .[dqn], or .[tuners]. |
Missing sysbench
sudo apt-get install -y sysbench
command -v sysbench
sysbench --version
If taskset: failed to execute sysbench appears, the package is missing inside
the environment used by sudo.
perf Metrics Missing
perf stat true
sudo perf stat true
If these fail, install matching kernel tools:
sudo apt-get install -y linux-tools-common linux-tools-generic linux-tools-$(uname -r)
Some hosts restrict perf through kernel settings. SemaTune can still run when some system metrics are unavailable, but proxy-metric prompts will have less context.
RAPL Missing
ls /sys/class/powercap
find /sys/class/powercap -maxdepth 2 -name energy_uj
RAPL availability depends on CPU, kernel, firmware, and permissions. Missing RAPL should not block target parsing unless your config depends on power as the primary metric.
Scheduler Debugfs Missing
ls /sys/kernel/debug/sched
sudo mount -t debugfs none /sys/kernel/debug
ls /sys/kernel/debug/sched
If scheduler files still do not exist, the kernel may not expose those debugfs knobs. Remove scheduler parameters from the config or use a compatible kernel.
CPU Frequency Path Missing
ls /sys/devices/system/cpu/cpufreq
for f in /sys/devices/system/cpu/cpufreq/policy*/scaling_available_governors; do
echo "$f: $(cat "$f")"
done
If cpufreq is unavailable, remove scaling_governor, scaling_min_freq,
scaling_max_freq, and epp from the config. For Intel P-state controls,
check:
ls /sys/devices/system/cpu/intel_pstate
LLM API Key Errors
Set only the key for the provider you use:
export GEMINI_API_KEY="<your Gemini API key>"
export OPENROUTER_API_KEY="<your OpenRouter API key>"
Preserve it under sudo:
sudo --preserve-env=PATH,OS_PARAM_TUNING_ROOT,GEMINI_API_KEY \
OS_PARAM_TUNING_ROOT="$(pwd)" \
.venv/bin/os-param-tuning run \
--config config/examples/sysbench_cpu_llm_single.json
Do not add real keys to JSON configs.
BenchBase Jar Missing
git submodule update --init deps/benchbase
cd deps/benchbase
./mvnw -P postgres -DskipTests package
tar -xzf target/benchbase-postgres.tgz -C target
cd ../..
Then verify:
ls deps/benchbase/target/benchbase-postgres/benchbase.jar
PostgreSQL Connection Failures
The TPCC XML expects admin/password on database benchbase at
localhost:5432.
sudo scripts/setup_tpcc_postgres.sh
PGPASSWORD=password psql -h localhost -U admin -d benchbase -c 'select 1;'
sudo systemctl is-active postgresql
If your host uses a non-systemd PostgreSQL service, start PostgreSQL with the
host's service manager, then rerun the psql check.
Sysbench Continuous Duration Too Short
sysbench_cpu runs one live interval-reporting process by default. Long LLM
latency can outlive the process if sysbench_continuous_duration is too short.
Check:
tail -n 40 results/<run>/sysbench_cpu_windows/continuous_sysbench.log
Fix:
{
"sysbench_interval_reporting": true,
"sysbench_continuous_duration": 3600
}
Missing Metrics
Inspect the optimization history:
jq '.history[0].metrics' results/<run>/optimization_history_*.json
If a metric is not there, fix the target parser or use a metric that exists. For LLM prompt metrics, also inspect:
jq '{iteration, response_text}' results/<run>/llm_api_logs/llm_responses.jsonl
Optional Tuner Import Errors
pip install -e ".[bayesian]"
pip install -e ".[mlos]"
pip install -e ".[dqn]"
pip install -e ".[tuners]"
Base pytest, docs build, fixed runs, Q-learning, and replay should not require
SMAC, MLOS Core, or PyTorch.
Last-Resort Restore
If a host is left in an unknown state, use Safety and Restore for manual commands. On a disposable benchmark host, rebooting is often the cleanest reset.