Back to Projects

Spiking Neural Networks for Reinforcement Learning Control

Foundational Research | June 2024 | 12 min read

Presented at ICNCE 2024, Aachen Conference

Research Arc

This is the foundational work in my SNN control research. It establishes the core findings that led to the NeurIPS 2025 paper on adaptive surrogate gradients.

A2C Research Overview: Noise Robustness, Surrogate Gradients, and Training Performance

Within autonomous micro robotics, computational resources and energy constraints motivate the exploration of novel algorithms. This work explores the challenges and opportunities of deploying spiking neural networks (SNNs) in actor-critic reinforcement learning, progressing from simulated benchmarks to real-world drone control.

Part 1: Comparative Study on CartPole

The first phase compared SNN and ANN performance using the A2C (Advantage Actor-Critic) algorithm on the CartPole task, establishing fundamental insights about training dynamics and computational efficiency.

Why A2C for SNNs?

Unlike DQN, which requires experience replay and batch processing, A2C uses multiple workers operating in parallel, eliminating the need for experience replay. This opens doors to learning on resource-constrained systems where storing past experiences and batch training would be detrimental.

Network Architecture

To compare ANN to SNN fairly, both models use identical architectures—the only difference is replacing activation functions with Leaky Integrate-and-Fire (LIF) neurons. We tested two configurations:

  • Single Layer: 4 inputs → 246 hidden neurons → actor/critic outputs
  • Two Layer: 4 inputs → 128 → 128 → actor/critic outputs
Training comparison

Training comparison: ANN vs SNN with learnable leak (converges to β=0) vs SNN with fixed leak (β=0.65). SNNs converge slower but achieve comparable final performance.

Training with Surrogate Gradients

Due to the discrete spiking nature, traditional backpropagation cannot be directly applied. Surrogate gradients approximate the derivative of the spike function with a continuous, differentiable function during the backward pass.

Surrogate gradient function

The step function (forward pass) and surrogate function (backward pass) used for training SNNs with backpropagation.

Key Finding: Aggressive Pruning

One of the most significant discoveries was the effectiveness of pruning in SNNs. Dead neurons (0% spiking) and saturated neurons (100% spiking) can be removed without significant performance degradation:

SNN neuron activity distribution

Distribution of neuron activity in trained SNN, showing dead and saturated neurons that enable aggressive pruning.

Pruning Results

  • SNN (β=0): 246 → 11 neurons (95.5% reduction)
  • SNN (β=0.65): 246 → 21 neurons (91.5% reduction)
  • Minimal performance degradation after pruning

Noise Robustness

An important finding for real-world deployment: SNNs with temporal dynamics (leaky neurons) show improved robustness at higher noise levels compared to both ANNs and non-leaky SNNs.

Noise robustness comparison

Noise robustness analysis. SNNs with temporal dynamics (leaky neurons) outperform ANNs at noise levels above 0.04.

Part 2: Real-World Drone Landing

Building on the CartPole findings, we extended the research to autonomous drone landing—a more challenging task requiring velocity estimation from altitude sequences and real-world deployment.

The Challenge

The drone controller receives only sonar altitude readings and must learn to:

  • Estimate velocity from altitude sequences (no explicit velocity input)
  • Modulate thrust based on height and velocity
  • Land safely within velocity constraints

Critical Insight: Pre-training Required

Without supervised pre-training, SNNs struggled to develop velocity representations. Pre-training on a velocity estimation task reduced training time by an order of magnitude and enabled successful learning.

SNN controller performance

SNN controller performance in simulation. The network modulates thrust based on altitude and internally-estimated velocity.

Real-World Deployment

The trained SNN was deployed on a physical Parrot Bebop 2 MAV without additional tuning. The same network that worked in simulation achieved successful landings on real hardware—a significant validation of the approach.

Real MAV landing

The SNN controller deployed on a physical Parrot Bebop 2, demonstrating successful sim-to-real transfer.

Summary: What We Learned

Key Takeaways

  • Training: SNNs train slower than ANNs but achieve comparable performance
  • Pruning: SNNs can be pruned by 90%+ due to dead/saturated neurons
  • Noise Robustness: Leaky SNNs outperform ANNs under noisy conditions
  • Pre-training: Supervised pre-training is critical for complex tasks
  • Sim-to-Real: SNNs can transfer from simulation to real hardware

What Came Next

This foundational work revealed that surrogate gradient settings significantly impact training. This insight led to our NeurIPS 2025 research on adaptive surrogate gradients, which develops principled methods for gradient scheduling and addresses the warm-up period challenge for sequential SNN training.

Resources