This repository contains the implementation of Neural Architecture Codesign (NAC), a framework for optimizing neural network architectures for physics applications with hardware efficiency in mind. NAC employs a two-stage optimization process to discover models that balance task performance with hardware constraints.
NAC automates the design of deep learning models for physics applications while considering hardware constraints. The framework uses neural architecture search and network compression in a two-stage approach:
- Global Search Stage: Explores diverse architectures while considering hardware constraints
- Local Search Stage: Fine-tunes and compresses promising candidates
- FPGA Synthesis (optional): Converts optimized models to FPGA-deployable code
The framework is demonstrated through two case studies:
- BraggNN: Fast X-ray Bragg peak analysis for materials science
- Jet Classification: Deep Sets architecture for particle physics
The framework is demonstrated through two case studies:
- Fast X-ray Bragg peak analysis for materials science
- Convolutional architecture with attention mechanisms
- Optimizes for peak position prediction accuracy and inference speed
- Particle physics classification using permutation-invariant architectures
- Optimizes classification accuracy and hardware efficiency
- Create a conda environment:
conda create --name NAC_env python=3.10.10
conda activate NAC_env
- Install dependencies:
pip install -r requirements.txt
- Download datasets:
- For BraggNN:
python data/get_dataset.py
- For Deep Sets: Download
normalized_data3.zip
and extract to/data/normalized_data3/
Run architecture search for either BraggNN or Deep Sets:
python global_search.py
The script will output results to global_search.txt
. For the Deep Sets model, results will be in Results/global_search.txt
.
Run model compression and optimization:
python local_search.py
Results will be saved in Results/deepsets_search_results.txt
or Results/bragg_search_results.txt
.
.
├── data/ # Dataset handling
├── examples/ # Example configs and search spaces
│ ├── BraggNN/
│ └── DeepSets/
├── models/ # Model architectures
├── utils/ # Utility functions
├── global_search.py # Global architecture search
├── local_search.py # Local optimization
└── requirements.txt
The global search explores a wide range of model architectures to find promising candidates that balance performance and hardware efficiency. This stage:
-
Example Model Starting Points:
- Uses pre-defined model configurations in
*_model_example_configs.yaml
as initial reference points - For BraggNN: includes baseline architectures like OpenHLS and original BraggNN
- For Deep Sets: includes baseline architectures of varying sizes (tiny to large)
- Uses pre-defined model configurations in
-
Explores Architecture Space:
- Search space defined in
*_search_space.yaml
specifies possible model variations - For BraggNN: explores combinations of convolutional, attention, and MLP blocks
- For Deep Sets: varies network widths, aggregation functions, and MLP architectures
- Search space defined in
-
Multi-Objective Optimization:
- Uses NSGA-II algorithm to optimize both task performance and hardware efficiency
- Evaluates models based on accuracy/mean distance and bit operations (BOPs)
- Maintains diverse population of candidate architectures
Run global search with:
python global_search.py
The local search takes promising architectures from the global search and optimizes them further through:
-
Training Optimization:
- Fine-tunes hyperparameters using tree-structured Parzen estimation
- Optimizes learning rates, batch sizes, and regularization
-
Model Compression:
- Quantization-aware training (4-32 bits)
- Iterative magnitude pruning (20 iterations, removing 20% parameters each time)
- Evaluates trade-offs between model size, accuracy, and hardware efficiency
-
Architecture Selection:
- Identifies best models across different operating points
- Balances accuracy, latency, and resource utilization
Run local search with:
python local_search.py
The framework achieves:
- 0.5% improved accuracy with 5.9× fewer BOPs (large model)
- 3% accuracy decrease for 39.2× fewer BOPs (small model)
- 4.92 μs latency with <10% FPGA resource utilization
- 1.06% improved accuracy with 7.2× fewer BOPs (medium model)
- 2.8% accuracy decrease for 30.25× fewer BOPs (tiny model)
- 70 ns latency with <3% FPGA resource utilization