Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language

AuthorsAngie Boggust†, Donghao Ren, Yannick Assogba, Dominik Moritz, Arvind Satyanarayan†, Fred Hohman

Automated interpretability aims to translate large language model (LLM) features into human understandable descriptions. However, these natural language feature descriptions are often vague, inconsistent, and require manual relabeling. In response, we introduce semantic regexes, structured language descriptions of LLM features. By combining primitives that capture linguistic and semantic feature patterns with modifiers for contextualization, composition, and quantification, semantic regexes produce precise and expressive feature descriptions. Across quantitative benchmarks and qualitative analyses, we find that semantic regexes match the accuracy of natural language while yielding more concise and consistent feature descriptions. Moreover, their inherent structure affords new types of analyses, including quantifying feature complexity across layers, scaling automated interpretability from insights into individual features to model-wide patterns. Finally, in user studies, we find that semantic regex descriptions help people build accurate mental models of LLM feature activations.

† MIT CSAIL

Diagram illustrating the semantic regex language, showing primitives at the top and modifiers at the bottom used to compose feature activation patterns. — Figure 1: The semantic regex language consists of a set of primitives (top) that can be applied independently or combined with modifiers (bottom) to express diverse feature activation patterns.

Related readings and updates.

Rescribe: Authoring and Automatically Editing Audio Descriptions

October 8, 2020research area Accessibility, research area Human-Computer Interactionconference UIST

Audio descriptions make videos accessible to those who cannot see them by describing visual content in audio. Producing audio descriptions is challenging due to the synchronous nature of the audio description that must fit into gaps of other video content. An experienced audio description author will produce content that fits narration necessary to understand, enjoy, or experience the video content into the time available. This can be especially…

Can Global Semantic Context Improve Neural Language Models?

September 27, 2018research area Speech and Natural Language Processing

Entering text on your iPhone, discovering news articles you might enjoy, finding out answers to questions you may have, and many other language-related tasks depend upon robust natural language processing (NLP) models. Word embeddings are a category of NLP models that mathematically map words to numerical vectors. This capability makes it fairly straightforward to find numerically similar vectors or vector clusters, then reverse the mapping to get relevant linguistic information. Such models are at the heart of familiar apps like News, search, Siri, keyboards, and Maps. In this article, we explore whether we can improve word predictions for the QuickType keyboard using global semantic context.

Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language

Related readings and updates.

Rescribe: Authoring and Automatically Editing Audio Descriptions

Can Global Semantic Context Improve Neural Language Models?

Discover opportunities in Machine Learning.