ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
AuthorsJingYuan Zhu, Shiyu Li, Andy Liu, Ping Huang, Jiulong Shan, Huimin Ma, Jian Yuan
ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
AuthorsJingYuan Zhu, Shiyu Li, Andy Liu, Ping Huang, Jiulong Shan, Huimin Ma, Jian Yuan
Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on bounding boxes, thereby facilitating data synthesis for object detection. Given a domain-specific object detection dataset, we first fine-tune a pre-trained diffusion model on both cropped foreground objects and entire images to fit target distributions. Then we propose to control the diffusion model using synthesized visual prompts with spatial constraints and object-wise textual descriptions. ODGEN exhibits robustness in handling complex scenes and specific domains. Further, we design a dataset synthesis pipeline to evaluate ODGEN on 7 domain-specific benchmarks to demonstrate its effectiveness. Adding training data generated by ODGEN improves up to 25.3% mAP@.50:.95 with object detectors like YOLOv5 and YOLOv7, outperforming prior controllable generative methods. In addition, we design an evaluation protocol based on COCO-2014 to validate ODGEN in general domains and observe an advantage up to 5.6% in mAP@.50:.95 against existing methods.
Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping
October 24, 2025research area Computer Visionconference NeurIPS
We revisit scene-level 3D object detection as the output of an object-centric framework capable of both localization and mapping using 3D oriented boxes as the underlying geometric primitive. While existing 3D object detection approaches operate globally and implicitly rely on the a priori existence of metric camera poses, our method, Rooms from Motion (RfM) operates on a collection of un-posed images. By replacing the standard 2D keypoint-based…
Semi-Supervised and Long-Tailed Object Detection with CascadeMatch
June 12, 2023research area Computer Vision, research area Methods and Algorithmsconference IJCV
This paper focuses on long-tailed object detection in the semi-supervised learning setting, which poses realistic challenges, but has rarely been studied in the literature. We propose a novel pseudo-labeling-based detector called CascadeMatch. Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds. To avoid manually tuning the thresholds, we design a new adaptive…