Within the world of autonomous micro robotics, constraints in computational resources and energy usage is a pressing issue, motivating the exploration of novel innovative algorithms to bring intelligence to these systems. This work explores the challenges and opportunities that arise when deploying spiking neural networks as workers in actor-critic deep reinforcement learning, specifically using the A2C algorithm on the CartPole task.
Project Overview and Motivation
Reinforcement learning (RL) has experienced significant advancements in recent years, enabling robots to achieve human-like performance on complex tasks. Actor-critic reinforcement learning algorithms have shown to be pivotal in training efficiently in challenging environments, offering potential for higher sample efficiency through parallel training.
Concurrently, neuromorphic algorithms, such as spiking neural networks (SNN), have garnered attention in the field of deep learning. These bio-inspired algorithms have demonstrated utility in handling temporal data and offer high energy efficiency when run on specialized hardware. Combining the possibility to learn from experience through reinforcement learning with the energy-efficient characteristics of spiking neural networks holds great opportunity in the field of small mobile robotics, where computational resources are constrained.
The primary objective was to compare the performance and computational complexity of spiking neural networks against traditional artificial neural networks in actor-critic reinforcement learning, specifically addressing:
- Training Efficiency: Comparing convergence patterns between SNN and ANN architectures
- Computational Complexity: Analyzing energy efficiency and operational requirements
- Noise Robustness: Evaluating performance under realistic sensor noise conditions
- Pruning Opportunities: Exploring model compression techniques unique to SNNs
Technical Implementation
The project employed a comprehensive comparison framework using the A2C (Advantage Actor-Critic) algorithm on the CartPole task:
A2C Reinforcement Learning
The A2C algorithm was chosen for its advantages over DQN-based approaches. Unlike DQN, which requires experience replay and batch processing, A2C uses multiple workers operating in parallel, eliminating the need for experience replay. This opens doors to learning on resource-constrained systems, where the overhead required for storing past experiences and training in batches can be detrimental to system performance.
CartPole Task
The CartPole task consists of balancing a pole on a cart as long as possible, receiving +1 reward for every timestep where the pole is balanced. The actor can apply either a force to the left or to the right on the cart, while observing the position and velocity of the cart and angle and angular velocity of the pole. For all models, the environment runs at a frequency of 20Hz.
Network Architecture and Neuron Models
To compare ANN to SNN, two models with the same architecture were used. The only difference between the models is the spiking neurons in the SNN, replacing the regular activation functions. The architecture is kept small to avoid long training times, which is an issue for SNN.
Single Layer Architecture: Input layer with 4 continuous neurons, one hidden layer with 246 neurons, and two output heads representing the actor and critic outputs.
Two Layer Architecture: Input layer with 4 neurons, two hidden layers with 128 neurons each, and two output heads.
Figure 1: Training performance comparison between SNN and ANN architectures with single hidden layer (246 neurons). The SNN shows different convergence characteristics due to its spiking nature.
Leaky Integrate-and-Fire (LIF) Neuron Model
The LIF neuron is a first-order model where the input current directly charges the membrane potential, which leaks over time at a rate β (the leakage parameter). When the potential exceeds a threshold, the neuron spikes and the membrane potential is reset. The dynamics are modeled by:
U[t+1] = β U[t] + I_in[t+1] - R·U_thr
Where R is 1 when the membrane potential exceeds the threshold and 0 otherwise.
Figure 2: The step function, surrogate function, and its gradient used for training spiking neural networks with backpropagation.
Training with Surrogate Gradients
Due to the discrete spiking nature of neurons, traditional backpropagation cannot be directly applied. Surrogate gradients provide a solution by using a continuous, differentiable function to mimic the spiking behavior during the backward pass. The surrogate function used is the fast sigmoid function with a slope of 25.
Results and Performance Analysis
The research revealed several key insights about training spiking neural networks with reinforcement learning:
Training Characteristics
As expected, for both the deeper model and the model with only one hidden layer, the neuromorphic solution converges slower. This can partly be explained by the way the surrogate gradient works. In conventional neural networks, virtually all weights contribute to the output at each timestep, but for spiking neural networks, neurons below the threshold do not spike and therefore do not contribute to the gradient flow.
Figure 3: Training of an ANN and SNN with one hidden layer of size 246. The first SNN learns to reduce β to zero (no temporal dynamics), while the second SNN has a fixed leak β=0.65.
Figure 4: Training of ANN and SNN with two hidden layers of size 128. The SNN has leaks of β=0.95 for both layers.
Computational Complexity Analysis
Using NeuroBench, a comprehensive comparison was made between the trained models. The results show where SNN models excel compared to their ANN equivalents:
- Activation Sparsity: SNNs achieve 68-92% activation sparsity compared to 0% for ANNs
- Synaptic Operations: SNNs require only accumulates (ACs) rather than multiply-accumulates (MACs) after spiking layers
- Effective Operations: Significant reduction in computational effort due to sparsity
Figure 5: Distribution of neuron activity in the trained SNN, showing the presence of dead and saturated neurons that enable pruning.
Pruning Results
One of the most significant findings was the effectiveness of pruning in spiking neural networks. Dead neurons (0% spiking activity) and saturated neurons (100% spiking activity) can be removed without significant performance degradation:
- Single Layer SNN (β=0): Reduced from 246 to 11 neurons (95.5% reduction)
- Single Layer SNN (β=0.65): Reduced from 246 to 21 neurons (91.5% reduction)
- Performance Impact: Minimal degradation in average performance while maintaining reasonable risk levels
This pruning capability is unique to spiking neural networks and provides significant advantages for deployment on resource-constrained devices.
Noise Robustness Analysis
An important aspect of the research was analyzing the robustness of controllers to sensor noise, which reflects real-world deployment conditions:
Figure 6: Noise robustness analysis of single layer models. SNNs with temporal dynamics (leaky neurons) show improved robustness at higher noise levels.
Figure 7: Noise robustness analysis of two-layer models. SNNs surpass ANN performance at noise levels above 0.04.
The results show that SNNs with temporal dynamics (leaky neurons) tend to be more noise robust than their non-leaky counterparts and traditional ANNs. At noise levels above 0.15, the leaky SNN models outperform their non-leaky counterparts, and at noise levels above 0.04, they surpass ANN performance.
Key Findings and Implications
The research revealed several important insights:
- Training Challenges: SNN training is noisier and slower than ANN training due to surrogate gradient limitations
- Computational Efficiency: SNNs achieve significant reductions in computational complexity through sparsity and efficient operations
- Pruning Potential: SNNs can be pruned by 90%+ without significant performance degradation
- Noise Robustness: SNNs with temporal dynamics show superior performance under noisy conditions
- Encoding Overhead: The encoding layer (linear layer to first spiking layer) accounts for the largest computational cost
Future Research Directions
Several promising avenues for future work have been identified:
- Efficient Encoding: Developing more efficient spiking encoders to reduce computational overhead
- Deeper Networks: Exploring pruning techniques for deeper SNN architectures
- Hardware Optimization: Leveraging specialized neuromorphic hardware for further efficiency gains
- Training Improvements: Developing better surrogate gradient methods for faster, more stable training
Impact and Significance
This work represents a significant contribution to the field of neuromorphic reinforcement learning, demonstrating that spiking neural networks can be effectively trained for control tasks while offering substantial advantages in computational efficiency and noise robustness.
The findings have important implications for autonomous micro robotics, where computational resources and energy efficiency are critical constraints. The ability to achieve comparable performance with significantly reduced computational complexity makes SNNs an attractive option for deployment on resource-constrained devices.
The research also highlights the unique advantages of spiking neural networks, particularly their ability to be aggressively pruned without performance degradation and their superior noise robustness compared to traditional artificial neural networks.