eventMay 28, 2026

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

Apple is presenting new research at the annual IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which takes place in person in Denver at the Colorado Convention Center from June 3 to June 7. We are proud to sponsor the conference, which brings together the scientific and industrial research communities in computer vision and pattern recognition. Below is an overview of Apple’s participation at CVPR 2026.

Jump to a section:

Schedule
Poster Presentations at the Apple Booth
Accepted Papers
Acknowledgements

Schedule

Stop by the Apple booth (#231) during exhibition hours. All times listed in MDT (local time):

Friday, June 5: 10:00 AM – 6:00 PM
Saturday, June 6: 10:00 AM – 6:00 PM
Sunday, June 7: 10:00 AM – 3:00 PM

Schedule

Wednesday, June 3

AFFINITY EVENT
LatinX in Computer Vision (LXCV/LXAI)
8:00 AM - 12:00 PM, Room 106
Kriti Goyal, Mohmed Hussein, and Prateek Singhal will be representing Apple at the LXCV/LXAI Mentoring Hour.

INVITED TALK
Efficient Deep Learning for Computer Vision (ECV) Workshop 2026
8:00 AM - 5:00 PM, Room 502
Oncel Tuzel will be giving an invited talk during the workshop.

KEYNOTE TALK
Generative AI for Sign Language (GenSign) Workshop
9:00 AM - 1:00 PM, Room 112
Colin Lea will be giving a keynote talk during the workshop.

INVITED TALK
Efficient and On-Device Generation (EDGE) Workshop 2026
1:00 PM - 5:00 PM, Room 203
Oncel Tuzel and Lu Jiang will be giving invited talks during the workshop.

AFFINITY EVENT
Women in Computer Vision (WiCV)
6:00 PM - 8:00 PM, Room 708 (workshop); Mentorship Dinner Offsite
Hsin-Ping (Cindy) Huang and Maggie Xiao will be representing Apple at the WiCV Mentorship Dinner.

Thursday, June 4

INVITED TALK
Video Large Language Models (VidLLMs) Workshop 2026
8:30 AM - 5:00 PM, Room 3A-3D
Afshin Dehghan will be giving an invited talk during the workshop.

Friday, June 5

SPOTLIGHT POSTER, AWARD CANDIDATE
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
4:00 PM - 6:00 PM, Exhibit Hall A & F, Poster Session 2, #178
Jiatao Gu, Ying Shen (University of Illinois Urbana-Champaign), Tianrong Chen, Laurent Dinh, Yuyang Wang, Miguel Angel Bautista, David Berthelot, Josh Susskind, Shuangfei Zhai

POSTER
From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs
4:00 PM - 6:00 PM, Exhibition Hall A & F, Poster Session 2, #453
Le Zhang (Mila - Quebec AI Institute Université de Montréal), Jihan Yang (New York University), Soundarya Krishnan, Jimit Majmudar, Hugh Ge, Prasoon Puri, Prathamesh Saraf, Shruti Bhargava, Dhivya Piraviperumal, Yinan Ling, Cindy Pan, Hong Yu, Aishwarya Agrawal (Mila - Quebec AI Institute Université de Montréal), Andy Tseng

SPOTLIGHT POSTER, AWARD CANDIDATE
What Matters in Practical Learned Image Compression
4:00 PM - 6:00 PM, Exhibition Hall A & F, Poster Session 2, #457
Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang, Sanjay Nair, Divija Hasteer, Oren Rippel

Saturday, June 6

FINDINGS POSTER
Bootstrapping Sign Language Annotations with Sign Language Models
7:30 AM - 9:00 AM, Exhibit Hall A, Findings Posters, #035
Colin Lea, Vassilis Baltatzis, Raja Kushalnagar (Gallaudet University), Lorna Quandt (Gallaudet University), Leah Findlater, Connor Gillis

POSTER
Velox: Learning Representations of 4D Geometry and Appearance
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 3, #527
Anagh Malik (University of Toronto), Xiaoming Zhao, Dorian Chan, David Lindell (University of Toronto), Oncel Tuzel, Rick Chang

POSTER
AMusE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
4:45 PM - 6:45 PM, Exhibition Hall A, Poster Session 4, #146
Sanjoy Chowdhury, Karren D. Yang (Nuance Labs), Chun-Liang Li, Xudong Liu, Fartash Faghri, Pavan Kumar Anasosalu Vasu, Oncel Tuzel, Dinesh Manocha (University of Maryland, College Park), Raviteja Vemulapalli

Sunday, June 7

FINDINGS POSTER
VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
7:30 AM - 9:00 AM, Exhibit Hall A, Findings Posters, #298
Pavan Kumar Anasosalu Vasu, Cem Koc, Fartash Faghri, Chun-Liang Li, Brian Feng, Jeff Lai, Meng Cao, Oncel Tuzel, Hadi Pour Ansari

ORAL
AToken: A Unified Tokenizer For Vision
9:00 AM - 10:15 AM, Four Seasons Ballroom, Oral Session 5B: Generalization and Adaptation
Jiasen Lu, Liangchen Song, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang

POSTER
AToken: A Unified Tokenizer For Vision
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 5, #007
Jiasen Lu, Liangchen Song, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang

POSTER
UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 5, #069
Rui Tian (Fudan University), Mingfei Gao, Haiming Gang, Jiasen Lu, Zhe Gan, Yinfei Yang, Zuxuan Wu (Fudan University), Afshin Dehghan

POSTER
TrajTok: Learning Trajectory Tokens enables better Video Understanding
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 5, #240
Chenhao Zheng (University of Washington), Jieyu Zhang (University of Washington), Oncel Tuzel, Chun-Liang Li, Ranjay Krishna (University of Washington)

POSTER
DSO: Direct Steering Optimization for Bias Mitigation
11:45 AM - 1:45 PM, Exhibition Hall F, Poster Session 6, #288
Lucas Monteiro Paes, Niv Sivakumar, Yinong Wang (Carnegie Mellon University), Masha Fedzechkina Donaldson, Barry Theobald, Luca Zappella, Nick Apostoloff

POSTER
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
3:30 PM - 5:30 PM, Exhibition Hall A, Poster Session 6, #098
Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jiasen Lu, Ashley Tong, Yinfei Yang, Wenze Hu, Zhe Gan

POSTER
SO-Bench: A Structural Output Evaluation of Multimodal LLMs
3:30 PM - 5:30 PM, Exhibition Hall A, Poster Session 6, #141
Di Feng, Kaixin Ma, Feng Nan, Haofeng Chen, Bohan Zhai, David Griffiths, Mingfei Gao, Zhe Gan, Eshan Verma, Yinfei Yang, Zhifeng Chen, Afshin Dehghan

POSTER
Learning Long-term Motion Embeddings for Efficient Kinematics Generation
3:30 PM - 5:30 PM, Exhibition Hall A, Poster Session 6, #595
Nick Stracke (Ludwig Maximilian University of Munich), Kolja Bauer (Ludwig Maximilian University of Munich), Stefan Andreas Baumann (Ludwig Maximilian University of Munich), Joshua Susskind, Miguel Angel Bautista, Björn Ommer (Ludwig Maximilian University of Munich)

Poster Presentations at the Apple Booth

Friday, June 5, 10:00 AM – 12:00 PM
Pavan Kumar Anasosalu Vasu will present VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models.

Friday, June 5, 2:00 PM – 4:00 PM
Byeongjoo Ahn and Jiasen Lu will present AToken: A Unified Tokenizer For Vision.
Sanjoy Chowdhury will present AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding.

Saturday, June 6, 10:00 AM – 12:00 PM
Jiatao Gu will present STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows.

Saturday, June 6, 2:00 PM – 4:00 PM
Rick Chang will present Velox: Learning Representations of 4D Geometry and Appearance.
Di Feng will present SO-Bench: A Structural Output Evaluation of Multimodal LLMs.

Accepted Papers

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding

Computer Vision, Methods and Algorithms2026

Sanjoy Chowdhury†, Karren D. Yang**, Xudong Liu, Fartash Faghri, Pavan Kumar Anasosalu Vasu, Oncel Tuzel, Dinesh Manocha†**, Chun-Liang Li**, Raviteja Vemulapalli

AToken: A Unified Tokenizer for Vision

Computer Vision, Methods and Algorithms2025

Jiasen Lu, Liangchen Song, Mingze Xu, Byeongjoo Ahn, Yanjun Wang, Chen Chen, Afshin Dehghan, Yinfei Yang

Bootstrapping Sign Language Annotations with Sign Language Models

Accessibility, Computer Vision2026

Colin Lea, Vasileios Baltatzis, Connor Gillis, Raja Kushalnagar†**, Lorna Quandt†**, Leah Findlater

DSO: Direct Steering Optimization for Bias Mitigation

Fairness, Methods and Algorithms2026

Lucas Monteiro Paes‡, Nivedha Sivakumar‡, Oliver Wang†‡**, Masha Fedzechkina, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff

From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs

Computer Vision2026

Le Zhang†**, Jihan Yang‡, Soundarya Krishnan, Jimit Majmudar, Xiou Ge, Prasoon Puri, Prathamesh Saraf, Shruti Bhargava, Dhivya Piraviperumal, Yinan Ling, Cindy Pan, Hong Yu, Aishwarya Agrawal†, Bo-Hsiang Tseng

Learning Long-Term Motion Embeddings for Efficient Kinematics Generation

Computer Vision, Methods and Algorithms2026

Nick Stracke†‡, Kolja Bauer†‡, Stefan Andreas Baumann†‡, Miguel Ángel Bautista, Josh Susskind, Björn Ommer†‡

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Computer Vision2025

Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan

SO-Bench: A Structural Output Evaluation of Multimodal LLMs

Computer Vision, Speech and Natural Language Processing2025

Di Feng, Kaixin Ma, Feng Nan, Haofeng Chen, Bohan Zhai, David Griffiths, Mingfei Gao, Zhe Gan, Eshan Verma, Yinfei Yang, Zhifeng Chen, Afshin Dehghan

STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows

Computer Vision, Methods and Algorithms2026

Jiatao Gu†, Ying Shen‡**, Tianrong Chen, Laurent Dinh, Yuyang Wang, Miguel Ángel Bautista, David Berthelot, Josh Susskind, Shuangfei Zhai

TrajTok: Learning Trajectory Tokens enables better Video Understanding

Computer Vision2026

Chenhao Zheng†‡, Jieyu Zhang†‡, Jianing Zhang†, Weikai Huang†‡, Ashutosh Kumar§, Quan Kong§, Oncel Tuzel, Chun-Liang Li, Ranjay Krishna†‡

UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning

Computer Vision2025

Rui Tian†, Mingfei Gao§‡, Haiming Gang, Jiasen Lu, Zhe Gan, Yinfei Yang, Zuxuan Wu†§, Afshin Dehghan

Velox: Learning Representations of 4D Geometry and Appearance

Computer Vision2026

Anagh Malik†, Dorian Chan, Xiaoming Zhao, David B. Lindell†, Oncel Tuzel, Jen-Hao Rick Chang

VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models

Computer Vision, Data Science and Annotation2026

Pavan Kumar Anasosalu Vasu*, Cem Koc*, Fartash Faghri*, Chun-Liang Li, Bo Feng, Zhengfeng Lai, Meng Cao, Oncel Tuzel, Hadi Pouransari*

What Matters in Practical Learned Image Compression

Computer Vision2026

Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang, Sanjay Nair, Divija Hasteer, Oren Rippel

Acknowledgements

Alex Colburn and Qi Shan are recognized as Outstanding Area Chairs.

Byeongjoo Ahn, Chen Chen, Fartash Faghri, Oncel Tuzel, and Xiaoming Zhao are Area Chairs.

Roman Bachmann is a Workshop Co-Organizer for “Workshop On Any-to-Any Multimodal Learning 2026”.

Jeffrey Bigham is a Workshop Co-Organizer for “VizWiz Grand Challenge Workshop 2026”.

Yingxue Zhou is a Workshop Co-Organizer for “Workshop on Deployment of Foundation Models for Embodied AI 2026”.

Sanjoy Chowdhury, Barry-John Theobald, Santhosh Kumar Ramakrishnan, and Raviteja Vemulapalli are recognized as Outstanding Reviewers.

Vassilis Baltatzis, Rick Chang, Dian Chen, Honor Chen, Di Feng, Peter (Zhe) Fu, Haiming Gang, Mingfei Gao, Kriti Goyal, Amin Karimi Monsefi, Mridul Khurana, Pavan Kumar Anasosalu Vasu, Colin Lea, Xianhang Li, Henry Liu, Xudong Liu, Yongxi Lu, Paul Rötzer, Prateek Singhal, Vasu Singla and Huangjie Zheng are Reviewers.

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

Schedule

Schedule

Wednesday, June 3

Thursday, June 4

Friday, June 5

Saturday, June 6

Sunday, June 7

Poster Presentations at the Apple Booth

Accepted Papers

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding

AToken: A Unified Tokenizer for Vision

Bootstrapping Sign Language Annotations with Sign Language Models

DSO: Direct Steering Optimization for Bias Mitigation

From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs

Learning Long-Term Motion Embeddings for Efficient Kinematics Generation

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

SO-Bench: A Structural Output Evaluation of Multimodal LLMs

STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows

TrajTok: Learning Trajectory Tokens enables better Video Understanding

UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning

Velox: Learning Representations of 4D Geometry and Appearance

VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models

What Matters in Practical Learned Image Compression

Acknowledgements

Related readings and updates.

Apple Machine Learning Research at NeurIPS 2024

Neural Information Processing Systems (NeurIPS) 2024

Discover opportunities in Machine Learning.