Learning to Optimize Black-Box Evaluation Metrics

AuthorsChen Huang, Shuangfei Zhai, Pengsheng Guo, Josh Susskind

We study the problem of directly optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall. Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown. We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations. The learned value function is easily pluggable into existing optimizers like SGD and Adam, and is effective for rapidly finetuning a pre-trained model. This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision. MetricOpt achieves state-of-the-art performance on a variety of metrics for (image) classification, image retrieval and object detection. Solid benefits are found over competing methods, which often involve complex loss design or adaptation. MetricOpt also generalizes well to new tasks and model architectures.

Related readings and updates.

Evaluating Evaluation Metrics — The Mirage of Hallucination Detection

October 27, 2025research area Data Science and Annotation, research area Speech and Natural Language Processingconference EMNLP

Hallucinations pose a significant obstacle to the reliability and widespread adoption of language models, yet their accurate measurement remains a persistent challenge. While many task- and domain-specific metrics have been proposed to assess faithfulness and factuality concerns, the robustness and generalization of these metrics are still untested. In this paper, we conduct a large-scale empirical evaluation of 6 diverse sets of hallucination…

Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

May 15, 2019research area Methods and Algorithmsconference ICML

In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric. In this work we assess this assumption by meta-learning an adaptive loss function to directly optimize the evaluation metric. We propose a sample efficient reinforcement learning approach for adapting the loss dynamically during training. We empirically show how this formulation improves…

Learning to Optimize Black-Box Evaluation Metrics

Related readings and updates.

Evaluating Evaluation Metrics — The Mirage of Hallucination Detection

Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment

Discover opportunities in Machine Learning.