RepCNN: Micro-Sized, Mighty Models for Wakeword Detection
AuthorsArnav Kundu, Prateeth Nayak, Priyanka Padmanabhan, Devang Naik
RepCNN: Micro-Sized, Mighty Models for Wakeword Detection
AuthorsArnav Kundu, Prateeth Nayak, Priyanka Padmanabhan, Devang Naik
Always-on machine learning models require a very low memory and compute footprint. Their restricted parameter count limits the model’s capacity to learn, and the effectiveness of the usual training algorithms to find the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger redundant multi-branched architecture. Then, for inference, we algebraically re-parameterize the trained model into the single-branched form with fewer parameters for a lower memory footprint and compute cost. Using this technique, we show that our always-on wake-word detector model, RepCNN, provides a good trade-off between latency and accuracy during inference. RepCNN re-parameterized models are 43% more accurate than a uni-branch convolutional model while having the same runtime. RepCNN also meets the accuracy of complex architectures like BC-ResNet, while having 2x lesser peak memory usage and 10x faster runtime.
Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge
January 9, 2026research area Knowledge Bases and Search, research area Methods and Algorithms
The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a fraction is used per prompt, and impractical for edge devices with limited inference-time memory and compute. We address this shortcoming by a memory-augmented architecture and a pretraining strategy aligned with…
Learning to Branch for Multi-Task Learning
June 9, 2020research area Computer Vision, research area Methods and Algorithmsconference ICML
Training multiple tasks jointly in one deep network yields reduced latency during inference and better performance over the single-task counterpart by sharing certain layers of a network. However, over-sharing a network could erroneously enforce over-generalization, causing negative knowledge transfer across tasks. Prior works rely on human intuition or pre-computed task relatedness scores for ad hoc branching structures. They provide suboptimal…