Accessibility engineering project

ScreenRecognition: VoiceOver support for apps that don't have any.

Some professional software is completely invisible to VoiceOver. No buttons, no labels, nothing. I built a macOS app that detects UI elements with machine learning, reads their text with OCR, and creates an accessible overlay so VoiceOver can interact with them.

Platform macOS

Technologies Swift, Objective-C, YOLO, Vision, NSAccessibility

Type Standalone application

Status Active development

What this project shows

I needed to use software that wasn't accessible, so I built a way in.

ScreenRecognition uses a YOLO model to find UI elements on screen, Vision framework OCR to read their text, and a transparent overlay of NSAccessibilityElements so VoiceOver can navigate and interact with them.

Skills demonstrated

macOS accessibility API implementation (NSAccessibility protocol)
Machine learning model training and integration (YOLO object detection)
Accessible UI architecture from first principles
OCR-based label extraction and intelligent label association
Built because I needed it — I use VoiceOver every day

How it works

Three layers working together to turn a visual-only interface into something VoiceOver can navigate.

Detection

A YOLO object detection model trained to recognize five common UI control types: buttons, disclosure triangles, images, links, and text areas. The model runs against a screen capture of the target application.

Recognition

Apple's Vision framework performs OCR on detected elements and their surrounding context to extract labels, placeholder text, and nearby descriptive content. A label association algorithm matches text to the correct control using spatial proximity and layout heuristics.

Overlay

A transparent window sits over the target application. Each detected element becomes an interactive NSAccessibilityElement with the correct role, label, and click action. VoiceOver sees and navigates these elements as if the app had built-in accessibility support.

The result

Apps that were blank to VoiceOver become partially navigable. Buttons can be clicked, text fields identified, links followed. It's not the same as native accessibility, but it makes inaccessible software usable.

Technical decisions

Design choices shaped by actually using VoiceOver every day.

Label association

YOLO often classifies nearby text labels as buttons. Rather than treating this as a model error, the label association algorithm identifies when a detected "button" is actually a label for an adjacent text field, using tight row-alignment tolerances to avoid picking up section headers.

Proximity fallback

When the ML model misses a label entirely, a fallback system uses OCR results and spatial proximity to find the nearest text that could plausibly describe a control. This catches cases the model was not trained on.

Text input handling

Text fields use NSTextView rather than NSTextField to avoid a VoiceOver bug where the first visit to a text field announces it as static text instead of an editable field. I found this because I use VoiceOver every day.

Why this matters

This is a different kind of accessibility work.

Most accessibility work is auditing and recommending fixes. This is building an entire accessibility layer from scratch for apps that have none. ML, platform APIs, and a lot of trial and error.

Built with

Swift and Objective-C
YOLO v8 object detection (trained on custom UI element dataset)
Apple Vision framework for OCR
NSAccessibility protocol for accessible element creation
Core Graphics for screen capture and coordinate mapping

Want to see more of my work?

The portfolio includes web and mobile accessibility audits alongside these engineering projects.

Contact Me Back to portfolio