Behavioral Score Diffusion

Abstract

Diffusion-based trajectory optimization has emerged as a powerful planning paradigm, but existing methods require either learned score networks trained on large datasets or analytical dynamics models for score computation. We introduce Behavioral Score Diffusion (BSD), a training-free and model-free trajectory planner that computes the diffusion score function directly from a library of trajectory data via kernel-weighted estimation.

At each denoising step, BSD retrieves relevant trajectories using a triple-kernel weighting scheme—diffusion proximity, state context, and goal relevance—and computes a Nadaraya-Watson estimate of the denoised trajectory. The diffusion noise schedule naturally controls kernel bandwidths, creating a multi-scale nonparametric regression: broad averaging of global behavioral patterns at high noise, fine-grained local interpolation at low noise. Safety is preserved by applying shielded rollout on kernel-estimated state trajectories.

We prove pointwise consistency of the kernel score estimate for arbitrary continuous dynamics, characterize its MSE rate, and establish formal equivalence to regularized DeePC for LTI systems. Empirically, BSD achieves 98.5% of the model-based baseline's average reward across four robotic systems (3D–6D state spaces) using only 1,000 pre-collected trajectories, and substantially outperforms nearest-neighbor retrieval (18–63% improvement).

98.5%

of model-based reward
(no dynamics model needed)

0

training required
(training-free)

18–63%

improvement over
nearest-neighbor retrieval

4

formal theoretical
propositions

Method

BSD replaces Model-Based Diffusion's dynamics rollout with a kernel regression over stored trajectory data. At each denoising step i, given a noisy trajectory Y_i, BSD computes kernel weights over the dataset using three kernels:

Diffusion kernel: Gaussian similarity between the noisy trajectory and stored controls, with bandwidth coupled to the diffusion noise level.
Context kernel: Matches the current initial state to stored trajectory start states.
Goal kernel: Scores alignment of stored trajectory endpoints with the desired goal.

The denoised trajectory is then a Nadaraya-Watson weighted average of stored trajectories. This creates a natural multi-scale structure: at high noise (early denoising), broad kernels average over diverse trajectories capturing global behavioral patterns; at low noise (late denoising), narrow kernels interpolate locally between nearby trajectories. A multi-sample selection mechanism with K = 20,000 candidates and reward-weighted softmax handles exploitation.

Safety shielding from Safe-MPD transfers directly: the shield operates on kernel-estimated states identically to dynamics-predicted ones. We prove this formally (Proposition 4: Safety Inheritance).

Demonstrations

Denoising Process

BSD iteratively refines a noisy trajectory into a goal-reaching parking maneuver. At early steps (high noise), kernel weights average broadly over the trajectory dataset; at late steps (low noise), weights concentrate on nearby high-quality trajectories.

Bicycle system: 100 denoising steps, 1,000 stored trajectories

Comparison: MBD vs. BSD-fix vs. NN

Side-by-side comparison of planned trajectories on the Bicycle parking task. MBD (model-based) and BSD-fix (ours, data-only) produce smooth, direct paths to the target space. NN (nearest-neighbor, no diffusion) retrieves stored trajectories without refinement, yielding less directed paths.

Across Vehicle Systems

BSD-fix generalizes across four vehicle systems of increasing state dimensionality (3D–6D), producing smooth parking trajectories from data alone on each system.

Results

Figure 1. Main results across four robotic systems of increasing state dimensionality (3D–6D). Each dot shows the mean reward; whiskers indicate bootstrapped 95% confidence intervals (10,000 resamples). Vertical dashed lines mark the MBD (model-based) reference. BSD-fix nearly matches MBD on all systems while substantially outperforming the no-diffusion baseline (NN).

Figure 2. Per-trial reward distributions across all four systems (50 trials each). Half-violins show kernel density estimates; individual dots represent single trials; diamonds mark the mean. BSD-fix (red) closely matches MBD (blue) in both location and spread, while NN (grey) exhibits substantially lower and more dispersed rewards.

Figure 3. Performance relative to MBD (%) vs. state dimensionality. BSD-fix maintains near-parity through 5D; NN degrades steeply. The widening gap demonstrates that diffusion denoising becomes more valuable as complexity increases.

Figure 4. Paired per-trial reward comparison (same random seeds). High Pearson correlations (r ≥ 0.70, up to 0.99) indicate BSD-fix tracks MBD faithfully on individual trials, not just in aggregate.

Figure 5. Planned trajectories for Bicycle parking (5 trials each). BSD-fix produces smooth goal-reaching paths comparable to MBD. NN stops short of the goal.

Figure 6. Safety rate vs. planning time across all systems. BSD methods achieve comparable safety to MBD at moderate computational overhead.

Theoretical Contributions

We establish four formal results for BSD's kernel-based score estimation:

Pointwise Consistency (Proposition 1): Under standard regularity conditions, BSD's Nadaraya-Watson estimate converges in probability to the true conditional expectation for any continuous dynamics—no linearity or parametric assumptions.
MSE Bound (Proposition 2): Bias-variance decomposition with bias² = O(h⁴) and variance = O(1/(Nh^d)). The optimal bandwidth h* = O(N^-1/(d+4)) yields MSE rate O(N^-4/(d+4)). Explains why adaptive bandwidth degrades performance: shrinking h at late steps inflates variance faster than it reduces bias.
DeePC Equivalence (Proposition 3): For LTI systems with persistently exciting Hankel data and Gaussian kernels, BSD reduces to regularized DeePC with an entropic (KL) regularizer. Formalizes the connection between diffusion planning and behavioral systems theory.
Safety Inheritance (Proposition 4): Safety is a shield property, not a planner property. BSD inherits any correct shield from model-based diffusion without modification.

BibTeX

@article{li2026behavioral,
  title     = {Behavioral Score Diffusion: Model-Free Trajectory Planning
               via Kernel-Based Score Estimation from Data},
  author    = {Li, Shihao and Li, Jiachen and Xu, Jiamin and Chen, Dongmei},
  journal   = {arXiv preprint arXiv:2604.00391},
  year      = {2026},
  url       = {https://arxiv.org/abs/2604.00391},
}

Behavioral Score Diffusion:Model-Free Trajectory Planning via Kernel-Based Score Estimation from Data

BSD computes diffusion score functions directly from trajectory data via kernel-weighted estimation—no dynamics model, no neural network training.