View publication

Despite increasing awareness of the need to support accessibility in mobile apps, many still lack support for key accessibility features. Developers and quality assurance testers often rely on manual testing to test accessibility features throughout the product lifecyle. However, manual testing can be tedious, often has an overwhelming scope, and test passes can be difficult to time amongst other development milestones. Recently, Large Language Models (LLMs) have been used for a variety of tasks including automation of UIs; however, none have yet explored their use in controlling assistive technologies for the purposes of supporting accessibility testing. In this paper, we explore the requirements of natural language based accessibility testing workflow through a formative study. Based on this, we present a system that takes as input a manual accessibility test (e.g., "Search for a show in VoiceOver") and uses an LLM combined with pixel-based UI Understanding models to convert the test into a chaptered, navigable video that a QA tester can use to pinpoint issues. In each video, we apply heuristics to detect and flag accessibility issues (e.g., Text size not increasing with Large Text enabled, VoiceOver navigation loops) to help QA testers more easily pinpoint issues. We evaluate this system through a 10 participant user study with accessibility QA professionals who indicated that the tool would be very useful in their current work and gave us several promising directions for future work.

AXNav
Figure 1: AXNav interprets accessibility test instructions specified in natural language, executes them on a remote cloud device using an LLM-based multiagent planner, and produces a chaptered video of the test annotated with heuristics that highlight potential accessibility issues. To execute a test, AXNav provisions a cloud iOS device; stages the device by installing the target app to be tested and enabling a specified assistive feature; synthesizes a tentative step-by-step plan to execute the test from the test instructions; executes each step of the plan, updating the plan as needed; and annotates a screen recording of the test with chapter markers and visual elements that point out potential accessibility issues.

1) Title: iOS: VoiceOver: Search for a Show

  1. Go to Settings > Accessibility > VoiceOver, and enable VoiceOver (VO)

  2. Launch the TV app
  3. Search for a show and verify that everything works as expected and there are accurate labels

  4. Turn off VO and verify that searching for a show works as expected

2) iOS: Podcasts: Dynamic Text in Search Tab

  1. In Settings > Accessibility > Display & Text Size, enable larger text and set to maximum size

  2. Launch Podcasts
  3. Verify all text (titles, headers, etc.) font size has adjusted consistently

  4. Set text size to minimum and repeat step 3
  5. Reset text size to default and verify all text returns to normal

3) iOS: Podcasts: Button Shapes across app

Expected Result: When Testing button shapes- we want to make sure that all text (not emojis or glyphs) get underlined if they are NOT inside of a button shape already. If the text is already within a button shape, it is a bug!

Figure 2: Three samples of manual accessibility test cases that AXNav can interpret and replay, from an internal regression testing database of manual tests. These tests validate the accessibility features of VoiceOver, Dynamic Type, and Button Shapes. Testing instructions typically consist of a title containing the app and feature under test, and a set of manual test instructions in natural language. The tests may also contain expected result descriptions. Some tests have specific, low-level instructions (1,2) and others give only a high-level instruction (3).
UI Nav
Figure 3: Planning and replanning workflow of AXNav's LLM-Based Multi-Agent Planner. The planner agent creates a tentative plan to navigate the app, and AXNav steps through the plan using tools to act on the UI, including interacting through VoiceOver's accessibility service. The Evaluation agent determines whether each action succeeds and replans if necessary. AX Nav continues executing steps until the tentative plan is complete.

Related readings and updates.

Towards Automated Accessibility Report Generation for Mobile Apps

Many apps have basic accessibility issues, like missing labels or low contrast. Automated tools can help app developers catch basic issues, but can be laborious to run or require writing dedicated tests. In this work, we developed a system to generate accessibility reports from mobile apps through a collaborative process with accessibility stakeholders at Apple. Our method combines varied data collection methods (e.g., app crawling, manual…
See paper details

When Can Accessibility Help? An Exploration of Accessibility Feature Recommendation on Mobile Devices

Numerous accessibility features have been developed and included in consumer operating systems to provide people with a variety of disabilities additional ways to access computing devices. Unfortunately, many users, especially older adults who are more likely to experience ability changes, are not aware of these features or do not know which combination to use. In this paper, we first quantify this problem via a survey with 100 participants…
See paper details