Ensuring accessibility of video and audio in UW courses

Recommendations for April 2025

Prepared by the Course Content Multimedia Working Group:

Laura Baldwin, Service Manager, UWIT: Service Management (chair)
Sara Berk, Manager of Program Operations, Department of Biology
Chelsea Elkins, Access & Advocacy Coordinator, School of Public Health
Jon Keib, Director, Instructional Services, FSB: Associate Dean Academic Affairs
Paul Lovelady, Manager Of Program Operations, UWT: Information Technology
Lutz Maibaum, Assistant Teaching Professor, Department of Chemistry
Dr. Alexis Prybutok, Assistant Teaching Professor, Department of Chemical Engineering
Jason Smith, Multimedia Consulting Lead, Learning Technologies, Academic & Student Affairs
Terrill Thompson, Technology Manager, UWIT: Accessible Technology Services

Note: This report was submitted to the Digital Accessibility Initiative Course Content Accessibility Action Team for consideration. It contains options for meeting the DOJ mandate and does not include approved recommendations. To learn more about resulting Action Team recommendations to UW leadership visit the team’s Proposal for Making Course Content Accessible.

Background

In 2023, the Department of Justice (DOJ) issued new standards that require the University’s web content, including academic course content, to be accessible by April 24, 2026. To address this need, the University created a task force and multiple action teams. The Course Content Accessibility Action Team met to identify and prioritize key areas to address in course content. One of those areas is multimedia content.

Objective

The multimedia working group is charged with the following:

Validate the current state of inaccessible video/audio files used in courses (documented below).
Provide additional data and analytics to explain the extent of the issue.
Brainstorm multiple options for ways UW can address the issue by:
1. a. Using existing support structures and workflows in new or different ways.
2. b. Or creating new support structures and workflows.

The group was not required to assess the viability of the options, prioritize, or make recommendations but was tasked with identifying a suite of options that inform leadership’s decision-making around how the UW can improve digital accessibility in courses in the near term and proactively comply with the new DOJ guidelines in the future.

Requirements

The Department of Justice’s ruling requires enterprise-wide and local unit efforts to ensure content complies with the technical standards. This proactive approach includes:

Creating course content that is accessible to all students from the first day of class.
Remediating inaccessible existing course content that will be used in courses after April 24, 2026.
Archiving inaccessible course content that will not be used after April 24, 2026.

The Department of Justice (DOJ) has established specific requirements for video accessibility under the Web Content Accessibility Guidelines (WCAG) 2.1 AA. Here are the key points:

Transcripts: Audio content must include transcripts to ensure that students who are deaf or hard of hearing can access the content.
Captions: Video content must include captions to ensure that students who are deaf or hard of hearing can access the audio information. Transcripts alone are insufficient for video content, as the text must be synchronized with the video.
Audio Descriptions: Videos must provide audio descriptions for key visual content, making them accessible to individuals who are blind or have low vision.
For more information, review:
- 1.2.1 Audio-only and Video-only (Prerecorded) (Level A)
- 1.2.2 Captions (Prerecorded) (Level A)
- 1.2.3 Audio Description or Media Alternative (Prerecorded) (Level A)
- 1.2.4 Captions (Live) (Level AA)
- 1.2.5 Audio Description (Prerecorded) (Level AA)
- 1.4.2 Audio Control (Level A)

Current State

UW’s strategy for digital accessibility is primarily reactive, relying on individual remediation requests rather than proactive content development. Support services, such as Disability Resources for Students (DRS) and Accessible Technology Services (ATS), provide limited remediation for students with documented accommodations and high-priority cases. However, the scale of inaccessible content exceeds current support capacity.
Instructors can create audio and video content using many different tools; Panopto and Zoom are most commonly used and are supported centrally. In Canvas, instructors can upload videos directly, or they can embed video content from many different sources, including YouTube and Vimeo. Canvas is also centrally supported by UW-IT.

Captioning (prerecorded)

In Panopto, Zoom, and YouTube, automated, machine-generated captions (using Automatic Speech Recognition, or ASR) are added to recordings for free. If they are accurate, they may require no intervention from instructors. However, machine-generated captions are rarely accurate enough to meet accessibility requirements. Therefore, they should be reviewed and if necessary, edited, to ensure they are sufficiently accurate.

All of the previously listed platforms provide tools for editing captions. If machine-generated captions contain errors, these can be corrected. Depending on the number and severity of the errors, this can be a cost-effective way to attain a high level of accessibility, but requires that reviewing and editing machine-generated captions be added to the video publication workflow.

Instructors have the capability to create manual captions from scratch; however, this process is highly time-consuming. Estimates suggest that captioning a video can take between four to ten minutes per minute of recording, making it an inefficient use of time.

Alternatively, recordings can be captioned from scratch by a vendor. Washington State higher education institutions have a contract with 3Play Media for manually captioning and transcribing recordings, negotiated by the State Board of Community and Technical Colleges. As of March 1, 2025, UW accounts have slightly better rates for captioning and transcription than those available through the state contract:

3Play Media Manual Captioning Service	Per Minute	Per Hour
Standard Turnaround (4 days)	$1.85	$111.00
Expedited Turnaround Surcharge (2 days)	+ $0.50	+ $30.00
Rush Turnaround Surcharge (1 day)	+ $1.00	+ $60.00
Same Day Turnaround Surcharge (8 hours)	+ $2.25	+ $135.00
Two Hour Turnaround Surcharge (2 hours)	+ $5.00	+ $300.00
Difficult Audio Surcharge	+ $1.00	+ $60.00
Extended Turnaround Discount (10 days)	-$0.15	-$9.00

Panopto: Panopto offers a manual captioning service where they advertise 99+% accuracy:

Panopto Manual Captioning Service	Per Minute	Per Hour
Standard Turnaround Discount (4 days)	$1.00	$60.00
Expedited Turnaround Surcharge (2 days)	$1.25	$75.00
Rush Turnaround Surcharge (1 day)	$1.50	$90.00

Videos longer than 60 minutes contracted for the one day turnaround and 120 minutes contracted for the longer turnarounds, as well as videos with challenging audio (such as poor quality, accented speakers, etc.), may take longer. UW’s three year contract, which ends June 30, 2026, comes with $8400 of “free” captioning, and there is currently $7,499 remaining. Panopto’s manual captioning service requires payment up front in increments of $2,000.

No automated, machine-generated captions are created when recordings are uploaded directly into Canvas, so in this case, captions must be generated via other means and uploaded to Canvas along with the media file. Instructure (Canvas) does plan to remedy this.

Captioning (Live)

The tools described in the Captioning (pre-recorded) section (e.g., Panopto, Zoom, and YouTube) are all used for hosting live online events. In Zoom, 9,148 meetings were scheduled in the Canvas integration and recorded to Zoom Cloud in Autumn 2024. The number of live course-related events that were scheduled in other platforms, including YouTube and Panopto, is unknown.

Zoom supports live captions using automatic speech recognition software (ASR), and also provides a mechanism for integrating the services of live human captioners. (For details, review the Zoom help article Using a third-party closed captioning service).

Similarly, YouTube supports adding captions to live streams. (For details, review YouTube’s help page on Live caption requirements). Panopto also provides live streaming; however, UW-IT generally doesn’t recommend that instructors use it due to the time delays. Panopto does not provide live auto captions, but they work with vendors who can provide either manual or auto captions for live events. UW Event Services occasionally uses Panopto to stream and record events.

Another current option for UW instructors is the BigBlueButton tool in Canvas; it may be used for synchronous meetings and events that may also be recorded. (Note: this is open-source software that ships with Canvas, called “Conferences” in the documentation). BigBlueButton has not been endorsed by UW-IT or Learning Technologies (LT), and it is not centrally supported at the UW. According to the BigBlueButton help page on Using Closed Captions, meeting participants can be promoted to “Moderator”, after which they should have access to features that enable them to serve as a real-time captioner. However, this feature does not currently seem to be working. Their documentation also recommends using their application within the Chrome browser and utilizing Google’s built-in automatic speech recognition software functionality. (They do not provide this functionality natively).

Real-time captioning services at the UW are typically provided in collaboration with the University’s Disability Services Office (DSO). For details, review the DSO page on Interpreting and Real-Time Captioning. Current rates for this service are $195/hour with a two-hour minimum.

Transcripts

Transcripts are a text version of the audio content presented as standalone content, not necessarily synchronized with the playback of the recording. For audio-only recordings, transcripts are required for accessibility.

For captioned video, transcripts are not required by WCAG 2.1 Level AA. However, transcripts are recommended for all videos since they are more accessible than captions for people who are deaf-blind. They are also recommended because they provide an alternative means of consuming video content for people with slow Internet connections or other technical issues, people who want to quickly scan or search the video’s content, and people who simply prefer text.

Since a transcript is an alternative to the video, it should include both audio content and descriptions of important visual information.

Audio Description

Audio description is a separate narrative audio track that accompanies a video, describing important visual content for people who are unable to see the video. People who are blind can understand much of a video’s content by listening to its audio. However, if a video includes content that is only presented visually (e.g., on-screen text, images, or key actions that are not obvious from the audio) this visual information must be described. This benefits people who are unable to see the video due to blindness or low vision.

Videos vary widely in their need for audio description. If a lecturer is good at describing visual information as it’s presented, no audio description is necessary. In some cases, the need for audio description can be reduced or eliminated by providing supplemental materials such as slides, notes, and other digital documents that contain any information that is presented visually during the video, and clearly communicating which supplemental sources to refer to for particular context (e.g., “Let’s now examine Example 3 in the handout”). In contrast, audio description is critical for videos that contain primarily visual content, with little or no spoken audio. Otherwise, these recordings are completely inaccessible to people who are unable to see the video.

There are also a variety of methods for delivering audio description. For example, in Panopto, audio description is entered as text, to be read aloud using synthesized speech at designated times in the video. Users can toggle description on/off via a “Description” button on the Panopto media player.

Similarly, Able Player, an open source HTML media player developed and maintained by UW-IT Accessible Technology Services (ATS), supports audio description as a timed text file. Users can toggle audio description on or off using a “Description” button on the player control bar. If description is on, text is voiced by the web browser. Users can change voice characteristics in a Preferences dialog, and can control whether the video is automatically paused during description playback. Able Player can be added directly to websites and is also available in a WordPress plugin and Drupal module. Currently, there is no Able Player app (LTI) for Canvas.

Most other platforms do not support audio description via timed text and synthesized speech. Instead, the most common means of delivering audio description is to create two separate videos, one with audio description and one without. Then, wherever the non-described video is available, a link must be provided to the “audio described version”.

Some vendors who provide both captioning and audio description services (e.g., 3Play Media, Echo Labs) provide their own media players that are capable of playing videos from other sources, with captions and audio description accessible through the one interface that they provide.

The American Council of the Blind has compiled a comprehensive list of commercial services for producing audio description. UW-IT Accessible Technology Services (ATS) narrowed the scope of the list to organizations that provide description services at prices and turnaround times that seem to be a good match for higher education institutions. In addition to vendors like 3Play Media and Echo Labs who offer audio description as an add-on to their captioning service, there are vendors who specialize in audio description production, such as Audio Eyes, Center for Inclusive Design and Innovation (CIDI, at Georgia Tech) and WGBH Media Access Group.

The typical deliverables provided by professional audio description services are either an audio file with soundtrack and description mixed together, or an audio described version of the video, with the described audio replacing the original program audio. In either case, these can be made available to users by announcing that the video is “Also available with audio description”, where “audio description” is a link to the described version.

Very few videos at the UW are currently audio described.

Current Data: Captioning

Information in this section is organized by data source.

Anthology Ally

According to the Anthology Ally Institutional Report, there were 7,799 videos without captions or with automated captions across all UW Canvas courses in Autumn quarter 2024. Ally currently has limitations in what it is able to track related to videos:

Ally does not check videos that are uploaded directly to Canvas.
Ally does not check Panopto videos that are embedded in Canvas.
Ally does check for captions in embedded YouTube videos, but we have questions about how accurate this is. (We tested this with videos that had only machine-generated captions, and Ally did not recognize them as being problematic).
Ally does not report on the number of videos in the Ally Institutional Reports.

Panopto and Zoom

UW enabled the automated machine-generated (ASR) captions by default in Panopto starting Autumn 2022 and in Zoom starting October 2021. New recordings created after those dates will be automatically captioned, and older recordings can be reprocessed to add the automated captions. Instructors may set up Zoom recordings to be automatically moved into Panopto.

Between recordings created in Panopto, Zoom recordings scheduled in Canvas*, and recordings uploaded into Canvas nearly 47,000 recordings were created or uploaded into Autumn Quarter 2024 courses. Of these recordings, over 60% had auto captions, and less than two percent had manual captions. Review the following chart for more information. The majority of current Panopto and Zoom course recordings have ASR captions. However, while ASR captions are improving, they currently do not meet accessibility standards.

Panopto
The following data describes the overall volume of content available on Panopto.

Total recordings and hours as of March 19, 2025:
- 621,825 recordings in Panopto.
- 101,000+ recordings never watched.
- 356,316 hours in storage and 153,740 hours archived.
- 510,603 hours (storage, archives, recycle bin).
2024 data:
- 75,626 recordings created (including 7,710 Zoom recordings) in 2024.
- 68,537 hours recorded in 2024.
- 709,148 hours viewed in 2024.
Autumn 2024: 20,736 hours were created or edited in Autumn 2024. Of these, approximately 591 (2.6%) were captioned by DRS.*
Public: Since July 1, 2023, 469 hours of content were viewed by the public. This option is locked down and only available to administrators.
Zoom recordings: Many Zoom recordings are moved to Panopto and are included in the Panopto numbers.
Limitations: Currently, we are unable to obtain caption and audio description data (ex. source of captions and audio descriptions) and information about whether the captions have been edited.
Update on ASR Captions: We have 198,194 recordings in Panopto with ASR captions.

Zoom
Autumn quarter 2024 data:

22,696 meetings scheduled in Canvas LTI.
596,736 meetings not scheduled in Canvas LTI.
717,197 minutes of Zoom meetings scheduled in Canvas LTI and recorded to Zoom Cloud.
Limitations: We cannot tell which recordings are for courses, and Zoom lacks captioning and audio description data.

Canvas

4,100 audio recordings were uploaded in Autumn 2024.
10,847 video recordings were uploaded in Autumn 2024
Limitations: Canvas data was obtained using the Canvas API and does not include embedded videos and audios. We only have information about the number of recordings, not the number of recorded hours. We also lack detailed captioning and audio description source data.

Autumn 2024 Partial Summary of Course Recordings

Tool	Number of recordings autumn quarter	Auto-captioned recordings (ASR)	Manually-captioned recordings
Canvas uploads	15,047		257
Panopto	122,771	All by default	591*
Zoom	9,148 Zoom meetings scheduled in Canvas LTI and recorded to Zoom Cloud	All by default	Included in Panopto’s number

Notes:
*This is the number of captioned recordings ordered by DRS. It doesn’t count other units’ recordings which could be for courses. It includes Zoom recordings if there are any.
**Zoom recordings scheduled outside the Canvas LTI could also be used in courses.

3Play Media

During Autumn Quarter 2024, 3Play Media was hired to caption 807 recordings. Of these, DRS paid $76,000+ for 591 recordings.

UW Libraries

Instructors also use video content from UW Libraries. The Libraries media collections has approximately 150 license agreements for streaming content from various filmmakers. Many recordings are already captioned. If they aren’t captioned to meet standards, the contract requests that the vendor grant the “UW Libraries permission to modify or copy the film to make it accessible.” This report does not focus on library content because UW Libraries already has a system for captioning their materials.

Vimeo

Vimeo is used both for course recordings and public recordings. In March 2025, a crawl of all UW websites using Little Forest (a vendor used for website domain discovery) found 44 unique Vimeo accounts, including College of Education, College of Built Environments, Department of Environmental and Occupational Health Sciences, Biology Department, and other academic units. UW does not currently have a way of measuring the number of videos on UW Vimeo accounts, and whether they are captioned.

YouTube

YouTube is used both for UW course recordings and UW public recordings. There are 95 known UW-affiliated YouTube channels, including Evans School of Public Policy & Governance, College of Engineering, iSchool, School of Law, School of Social Work, Human Centered Design & Engineering, Paul G. Allen School, and UW Applied Physics Laboratory, and other academic units. YouTube Caption Auditor, an open source tool developed by UW-IT’s ATS, is able to collect data for the videos in these channels using the YouTube Data API. As of January 1, 2025, the 95 channels combined included 12,534 public-facing videos, and only 4,167 (33.2%) were captioned. The uncaptioned videos included 3,641 hours of content. Most videos are auto-captioned by YouTube, but these require editing in order to be considered viable for people who depend on captions. YouTube flags their videos as “not captioned” in the YouTube Data API if the videos have unedited automatic captions.

There is no known data source for how many UW videos on YouTube include audio description. Anecdotally, we know that very few video content producers are including audio description in any of their videos. Which of these YouTube videos are used in academic courses is unknown. However, since all of these videos are in scope under the ADA Title II rule, we recommend including them in the UW’s accessibility plan.

BigBlueButton

This application, which ships with Canvas, has only been opened 5,463 times historically. In Winter quarter 2025, it was only opened 631 times, as measured by unique page views that included actual participation (e.g., hitting a button, viewing a recording). It only retains recordings for 7 days, and the recordings can’t be downloaded or shared.

Current Data: Audio Description

Audio description (AD) is not a frequently requested accommodation, and the number of recordings with AD varies from year to year. Instead of requesting audio descriptions, students with the accommodation may choose to work with a scribe who can personalize the descriptions.

3Play Media

Line graph titled: Number of recordings with audio descriptions created by 3Play Media. There are two lines, one for Total Audio Descriptions and one for DRS Audio Descriptions, with years on the X axis from 2016 to 2024. Both lines show low numbers, ranging from 0 to 63 in a given quarter, with no clear trends (the demand jumps sporadically up and down based on individual students' needs).

Historically, the following units have ordered audio descriptions from 3Play Media from 2015 to March 2025.

Unit	Number of recordings
U of Washington – DRS	135
U.Washington – MBA Program	2
Univ. of Washington	68
Univ. of Washington – DO-IT	19
University of Washington	2
UW – Simpson Center	1
UW Allen School	7
UW ATS Testing	30
UW DO-IT Audio Description	13

This data shows that the demand for audio description as an accommodation is low, and varies significantly from quarter to quarter, as required by specific students who prefer audio description and are enrolled in courses with video content.

Panopto

Audio description data is currently not available from Panopto, but we have requested it.

Other platforms

Canvas, Zoom, YouTube, and Vimeo do not include support for audio description within their media players, and they have no means of tracking whether recordings include audio description mixed into the program audio (i.e., “audio described versions”). Therefore, there is no data available to determine the extent to which audio description is available on UW videos through these platforms. Anecdotally, we believe the number of audio described recordings to be extremely small.

Challenges

Captions

The accuracy of machine-generated captions and transcripts varies greatly depending on the content, audio quality, and understandability of instructors.
Many instructors record their lectures and post the recordings after class. The turnaround for manual captioning may not enable students to stay current with content. Automated captions, however, are available much sooner.
The Accessible teaching methods page recommends developing a library of short, reusable, well-captioned video lectures on core content. However, the instructors in our group expressed concerns about keeping content current. They value in-person lectures where they can make announcements, and students can ask questions, allowing them to explain concepts in more detail when needed.
Canvas does not currently provide machine-generated captions as a starting point. Therefore, recordings uploaded directly into Canvas need to be captioned via other means, prior to uploading into Canvas. Automated captions are on Instructure’s (Canvas) roadmap with an anticipated beta release date of April 19, 2025.
Using a vendor to provide high quality captions for content in more than one language is challenging but not a common occurrence; for example, a beginning language course could have content in both English and French.
Panopto has a custom dictionary which would help with spelling in captions, but it is for the entire UW and is limited to 1,000 words per language and maximum file size of 5MB. Words may only be added by a Panopto administrator, but Panopto administrators in UW-IT and Learning Technologies (LT) don’t have the bandwidth to create and maintain the dictionary.
Captioning data is difficult to obtain: Panopto has limited caption data available in their interface, but it does not work for the UW because we have too much data; UW-IT engineers can access data on recordings uploaded into Canvas, but it doesn’t include the length of the recording; Zoom doesn’t provide any caption data.

Transcripts

Some media platforms provide transcripts by splicing together captions so they can be read in a large block of readable text. Zoom and YouTube all support this. Panopto has an option to download captions, but it includes timestamps which make it difficult to read, and the setting is turned off for viewers.

There is no known data source for whether audio recordings include transcripts on any media platforms.

Audio description

Audio description is rarely requested as an accommodation from Disability Resources for Students (DRS), and expensive to implement. 3Play Media provides audio description as an optional add-on to their captioning service, with current pricing of $7.35 per video minute (standard) or $11.00 per video minute (extended). Extended description (in which the video is paused temporarily while description is being voiced) is necessary for videos that have insufficient gaps in the program audio in which to insert description.

Audio description is a specialized skill, with description providers trained to know how best to describe content using language that avoids changing the meaning or disrupting the flow of the video. Although Panopto makes it possible for instructors or other authorized users to add their own descriptions, this ability should be exercised with caution.

DRS has found that scribes can work as well or better than audio description for accommodating individual students with vision-related disabilities because scribes can use their knowledge of the subject area and tailor the descriptions to the individual student’s needs, and the student can ask questions. However, because scribes provide individualized assistance, they would be difficult to scale for the entire university system. Also, audio description is required by WCAG 2.1 at Level AA; therefore is required by the ADA Title II rule.

The only audio description data we were able to obtain was from 3Play Media. None of the other vendors provide it.

Resource constraints

Faculty are already overburdened and learning how to make their recordings accessible takes time away from their teaching and other responsibilities. Requiring them to monitor and edit their captions and audio descriptions could have the unintentional consequence of causing faculty to avoid using media at all in their course content, which would make their course less accessible for students who benefit from multimodal delivery of content.
Departments have uneven levels of support and accessibility expertise. Solutions should be equitable where similar support and accessible content is provided campus-wide. It is not a viable solution to provide accessible content only to departments that teach large classes or have accommodation requests.

Current Self-Service by Units

Some units have staff with accessibility and instructional design expertise to help instructors while most do not.

Captioning

Faculty and department staff have several options for adding captions to their recordings. The easiest option is to do nothing and use the automated captions provided by default in several platforms. However, without reviewing and editing the captions, they generally do not meet accessibility requirements.

Another option is to use automated captions as a starting point, but also review and edit them if needed using the built-in caption editor provided by the platform.

Additional options include manually captioning recordings using free tools, or outsourcing recordings to a captioning vendor such as 3Play Media.

Audio Description

Instructors can avoid the need for audio description altogether by describing visual content during the lecture or filming of the video. If visual content is described within the program audio, audio description does not need to be added post-production.

To add audio description to videos, with some exceptions (described under Current Central Support), departments need to contact the service provider and arrange for use of their services, including covering the cost, themselves.

In Panopto, faculty and staff can manually add audio description by typing text in the audio description editor. (Details are available on Panopto’s help page on How to Add Audio Descriptions).

Current Central Support

UW provides captioning without charge for accessibility accommodations and selected videos.

Disability Resources for Students (DRS) makes recordings accessible for individual students with documented accommodations. The manual captions created for accommodations are available to everyone with access to the recordings in Panopto.

UW-IT ATS will caption and/or audio describe a limited number of UW videos without charge through their Captioning and Audio Description Service. Individuals, departments, and other units at the UW are encouraged to apply for funding to caption highly-visible, high-impact, multiple-use, and/or strategic videos. Examples include:

Videos available to the public on a high-use website.
Videos that will be used multiple times in a course.
Videos developed by several faculty members to be used in several different classes.

ATS also provides Digital Accessibility Consulting, which can include answering questions about audio and video accessibility and helping units to think strategically about how to address their accessibility needs.

Current Training/Education

The UW community has several options for accessibility training, including training on audio and video accessibility. Historically, ATS has offered a 1-hour webinar at least twice annually on video accessibility that covers techniques for both captioning and audio description. An archived recording of the webinar is available on the ATS website.

Other available trainings are asynchronous. Deque University (a library of asynchronous online courses on digital accessibility available to anyone with a UW NetID) includes two courses (basic and advanced) on “Multimedia, Animation, and Motion”. Both trainings are detailed and may be too advanced for users with no experience with these tools or other accessibility workflows.

A variety of UW web resources are available on video accessibility, including the following:

ATS website: The Audio and video page from the IT Accessibility Checklist provides an entry point for accessing additional information.
The Making Content Accessible module of the 2020 Teaching with UW Technologies and Designing an Accessible Syllabus 202 Canvas courses have limited information about captioning.
UW Bothell’s accessible multimedia content page provides background information and links to accessibility resources.
Teaching @UW: Making videos and recorded lectures accessible provides high level information with links to more specific documentation.

Vendors also provide documentation related to their video accessibility features. For example:

Panopto provides documentation about their accessibility features as well as creating and editing captions and audio descriptions and removing content:
- a. Add audio descriptions
- b. Edit captions
- c. Use caption vendors
- d. Deleting recordings and archiving recordings
Zoom’s accessibility page describes accessibility features and has links to how-to documentation.
Canvas provides a help page on how to add captions in the rich content editor.
YouTube Help pages include documentation on how to add captions and how to edit or remove captions.
Vimeo provides a variety of help articles on the subject of captions and subtitles.

Current Policies

Under the Student Governance and Policies, specifically Chapter 208, the university establishes guidelines for providing reasonable accommodations to students with disabilities. These policies are designed to ensure that students with disabilities have equitable access to educational resources and can fully participate in university life, with necessary adjustments or support made to accommodate their needs.

Otherwise, the UW does not currently have a formal policy related to digital accessibility. ATS has published IT Accessibility Standards since 2015, which declare video and audio content to be in scope, and state that “The UW looks to the Web Content Accessibility Guidelines (WCAG) 2.1 Level AA, developed by the World Wide Web Consortium (W3C), for guidance in meeting its IT accessibility commitments.”

Current Technology/Tools

UW offers several tools which instructors can use to improve course accessibility.

Able Player: An open source HTML media player developed and maintained by ATS. It was designed with accessibility in mind and includes support for captions and audio descriptions. Able Player can be added directly to websites and is also available in a WordPress plugin and Drupal module, but is not currently available as an external app (LTI) for Canvas.
Anthology Ally: Integrated with Canvas, Ally automatically checks for accessibility issues, provides alternative formats, and offers feedback on how to improve content accessibility. Its ability to check for accessibility of video and audio is currently extremely limited.
Native Canvas Video Editor: Canvas offers a built-in video editor that supports basic editing functions including adding captions, making it easy to create and edit videos directly within the platform.
Panopto: Panopto provides auto captioning and allows users to request 3Play Media captions and audio descriptions directly within the interface.
YouTube: YouTube is a video-sharing platform where users can upload, view, and share videos. Most videos uploaded to YouTube are automatically captioned. These can be replaced with captions uploaded from other sources, or edited using YouTube’s caption editor.
YouTube Caption Auditor (YTCA): YTCA is an open-source tool developed by ATS for collecting and reporting data on videos within designated YouTube channels.
UW YouTube Caption Report: Built from YTCA, this tool provides reports for YouTube channel owners at the UW, helping them to track and prioritize their captioning efforts.
Zoom Workplace: Zoom records meetings and automatically provides machine-generated captions and transcripts. Meeting owners can assign someone to type captions during meetings. It also supports sign language interpreters and third-party captioning using their Closed Captioning REST API.

Current Approach

The University’s current approach to audio and video accessibility is largely ad hoc. Some individual faculty and staff, academic departments, and colleges use the methods described in the Current Self-Service by Units section of this report, and others utilize the services described above in the Current Central Support section, where applicable.

Evidence suggests that some units have mature workflows for ensuring all their videos are captioned. For example, UW Professional & Continuing Education and the Center for Neurotechnology are among a small number of units that have captioned 100% of the videos on their YouTube channels, while UW School of Public Health, Human Centered Design and Engineering, and UW School of Law are among a small number of units that have captioned over 90% of their videos.

However, these mature models are isolated. There is no cohesive University-wide strategy for ensuring audio and video are accessible.

Future State – Brainstorming Options, Identifying Levers

Making audio and video content accessible is expensive. Outsourcing transcription, captioning, and audio description to suppliers who provide these services using human labor is prohibitively expensive, given the vast number of hours of content available at the UW. Doing the work internally would also be expensive, even if student employees were hired for this purpose, as it would require a significant investment in resources to manage, train employees, and provide logistics. Rapid advancements in AI technology, especially AI-generated captions, may eventually offer comparable quality at a significantly lower cost. Already, AI solutions could greatly reduce the up-front cost of transcribing audio and captioning video, but the output still needs to be reviewed and edited, either by the supplier or by UW staff or students.

To move towards substantial compliance, the Multimedia Working Group recommends considering a combination of solutions, including the following:

Prioritize recordings for transcription, captioning and audio description. Review the Prioritization Guidelines for specific recommendations.
Decrease the number of recordings: Limit the number of stored recordings by establishing retention policies and archiving recordings following archival guidance that’s included in the ADA rule.
Dedicate time for creating and editing: Require UW faculty, staff, and students to allocate time to editing captions and creating audio descriptions.
Provide training: Provide training for faculty to effectively describe visual material and create higher quality audio.
Contract with vendors: Engage vendors to provide captions and audio descriptions.
Leverage AI solutions: Engage vendors who provide AI captions and (potentially) AI descriptions at scale for lower costs than if these services are performed by humans. Develop workflows for reviewing and editing the AI output, as determined by priority.

Each of these strategies offers a pathway to compliance, with varying implications for time, effort, and financial resources.

We understand that many faculty members are already overwhelmed, and we aim to minimize the additional workload. To assist in this effort, we have outlined various options for implementing captions and audio descriptions for faculty, sorted by level of effort required.

Prioritization Guidelines for Transcripts and Captions

The Multimedia Working Group recommends the following approach to prioritization.

High priority recordings must have captions that have 99% accuracy or higher when:

A student has requested an accessibility accommodation for captions. Students are already using the DRS workflow for this.
The recordings are available to the public (rare for course content) and are high traffic or new.

Medium priority recordings should have captions that have 97% accuracy or higher when:

Viewers report a recording with poor or no captioning.
The same audio or video files are used repeatedly in courses over time.
Video lectures are the primary source of learning (i.e., in a “flipped classroom” setting).
The course has a large number of students; top viewed recordings, or the course has one of the highest enrollments in a department.

Low priority recordings can use automatic speech recognition (ASR) captions (without editing) when:

Recordings are used once for one course which doesn’t meet the previously listed criteria.
Recordings are created by students for an assignment that will be viewed by a small audience.

Prioritization Guidelines for Audio Description

Videos vary in their need for audio description. Therefore, the following criteria can be applied for prioritizing videos for audio description.

High need: The content of the video cannot be understood with audio alone.
Medium need: The video is generally understandable, but critical details are lost.
Low need: Some information is lost, but it isn’t critical.
No need: All content is accessible from the audio alone.

The priorities for audio description are also impacted by the high cost of audio description compared to captioning and transcription.

Top priority recordings must have audio description when:

A student has requested an accessibility accommodation for audio description. Students are already using the DRS workflow for this.
The recordings are available to the public (rare for course content) and have a high need for audio description.

High priority recordings should have audio description when:

The same audio or video files are used repeatedly in courses over time.
Video lectures are a primary source of learning (i.e., in a “flipped classroom” setting).
The course has a large number of students; top viewed recordings, or the course has one of the highest enrollments in a department.
The recordings have a high need for audio description.

Medium priority recordings should have audio description when:

The recordings meet all of the conditions of either High or Medium priority recordings, and have a medium need for audio description.

Low priority recordings are:

Recordings that are used once for one course which doesn’t meet the previously listed criteria.
Recordings that are created by students for an assignment which will be viewed by a small audience.
Recordings that have a low need for audio description.

Given the high cost of adding audio description to all videos, the goal should be to describe the top, high, and medium priority recordings. Low priority videos can be described if students with disabilities request this as an accommodation through DRS (at which point they become top priority videos).

Who determines the priority of recordings?

Centralized Decisions: Establishing centralized criteria and empowering a central authority to declare the priority level of videos would facilitate standardization so that criteria are applied consistently throughout the University. This central authority could consider variables such as those identified under Prioritization Guidelines, and would need to take care to ensure the process was equitable across departments of varying sizes and resources.
Unit Decisions: Allowing departments to make these decisions grants them greater control, However, some departments may lack the necessary resources to implement these measures effectively.

The Website and Mobile Action Team took a centralized approach in identifying the 500 top priority websites at the UW. They collected a variety of data on a large inventory of websites and applied a formula to calculate a “Priority Score” for each site. Although this was largely a centralized decision, all website owners (over 200 contacts) were asked to prioritize their own websites, and that information was included in the final calculation.

Decreasing the number of recordings

One solution for reducing the number of non-compliant recordings is to reduce the total number of recordings. An unfortunate side-effect of the ADA requirements is that some faculty might choose to create fewer new videos, or remove videos that still provide value for student learning. In 2017, this happened at UC Berkeley when the Department of Justice ordered them to make their public videos and audio lectures accessible. This is not a desirable outcome, since audio and video provide accessibility benefits to many students who benefit from multimodal delivery of content. Further, UW students have been requesting more recordings. For example, the Associated Students of the University of Washington (ASUW) has submitted several proposals to require instructors to record all lecture courses.

While the goal should not be to eliminate multimedia content solely to avoid the need for accessibility compliance, there is nevertheless value in deleting or archiving recordings that are no longer in use. The ADA TItle II rule provides an exemption and specific guidance related to archived content, and the UW Office of the ADA Coordinator has published Archived Web Content Guidance to help with determining whether content meets the conditions.

Additionally, retention policies save storage space and cost. Zoom and Canvas have well established retention policies, while Panopto’s policy will be implemented soon.

Zoom: Recordings stored in the Zoom cloud are deleted 120 days after the meeting and moved to the Trash, where they remain for 30 days before permanent deletion. Course recordings may be moved to Panopto. (Review UW Zoom cloud recording storage).
Canvas: Content created in the UW’s Canvas Learning Management System (LMS), including uploaded videos, is retained for five years after the end of the academic year in which the course was offered. UW-IT recommends that instructors review expiring courses and archive any needed content. Expiring courses will be permanently deleted from the vendor’s servers within six months and are not recoverable. (Review the IT Knowledge Base article on Preserving Canvas course content).
Panopto: Panopto’s retention policy is currently on hold as they work on improving notifications and the user interface. Once implemented, recordings not viewed in the last five years will be permanently deleted, and those not viewed in the last 24 months will be archived with the exception of source recordings with reference copies.

Options for Support

Self-Service by Units

Many staff and instructors prefer to receive information from colleagues within their own unit, whom they already know and trust. This approach allows for efficient and tailored assistance, as the support can be customized to the individual’s specific needs. However, those providing support within the unit may already be overwhelmed and require additional assistance themselves, particularly if they are faculty members who have developed expertise in accessible technology.

Hub-and-spoke models

Hub-and-spoke models offer central support for accessibility, but with local liaisons whose specific role is to offer personalized services and support within their schools or departments.

The two following models could complement each other.

Liaisons model

In a liaisons model, each unit would have trained accessibility experts who provide assistance in their units, while leveraging central support. This model embeds individuals with digital accessibility knowledge and skills across colleges, departments, and units, and provides these individuals with a high level of direct central support.

This model would ensure all university college, department, and unit heads understand and adhere to digital accessibility policies, regulations and standards; would empower individual colleges, departments, and units with tailored digital accessibility knowledge, solutions and support; and would enhance institutional capacity to support compliance and inclusion by making connections, expanding expertise, and training liaisons in technical, legal, and practical aspects of accessibility.

Specifically related to audio and video accessibility, the overall strategies and workflows, vendor contracts, and tools would be managed centrally. The central team would also provide training and support to liaisons, while the liaisons would oversee the work that happens locally within their unit.

The model could include multiple tiers, with one or more lead liaisons identified at the college level, and department-level liaisons turning to the college-level liaisons for technical leadership and support.

The UW currently has a voluntary IT Accessibility Liaisons network, founded in 2017, with approximately 150 members who have self-identified as having an interest in digital accessibility and a willingness to promote and support accessibility efforts within their units. This network could be formalized so that each college, department, and unit throughout the University would be represented; and the roles and responsibilities of all parties could be clearly defined.

Student Accessibility Team Model

This model is similar to the Liaisons model, but consists of student employees who are charged with implementation of the processes needed for making audio and video accessible. In this model:

Students within each college, department, or unit would perform the work of submitting recordings to vendors for captioning or audio description services, then as needed, review and edit the output from these services prior to publication. Local students are likely to have the subject matter knowledge and vocabulary needed for understanding the content and conducting this work.
A centralized team of staff with specialized expertise in audio and video accessibility would provide logistics, training to all students in the network, and support. In some cases, they would also provide centralized captioning and audio description service, with clearly defined guidelines for which videos would get escalated to the central team.

Central Support

Centralized support can enhance efficiency by standardizing and scaling the dissemination of information, training, and services. A central team specializing in accessibility would have expertise to provide training, documentation, help desk support, guidance on how to prioritize recordings, and/or centralized transcription, captioning, and audio description services. ATS serves all of these functions currently, but does not have capacity to do so at scale. With additional resources, ATS could serve as the hub in either or both of the hub-and-spoke models previously described.

Example Captioning Process with Escalation to Human Captions

The following process involves both central and localized support, as well as vendors. This process depends on the selected vendor being able to reliably estimate the percentage accuracy of its AI captions. Currently, 3Play Media is the only vendor who is known to have this capability.

Audio and video content is prioritized, either centrally or locally (review the Who determines the priority of recordings?)
All videos that need captions based on their priority level are captioned by a selected vendor using ASR.
If the vendor provides a trusted estimate of its accuracy percentage, that value provides a threshold for determining whether humans are needed to review and edit the AI captions.
If the video is a high enough level to warrant a high degree of captioning accuracy, it is automatically escalated to be reviewed and edited by the vendor’s team of human captioners at a higher cost.

Example Captioning Process with Internal Review/Editing

The steps in this process would be the same as above. However, the vendor’s estimate of its accuracy percentage would be used internally. Rather than automatically upgrade the service received by the vendor, UW employees would review the output provided by the vendor, and review and edit the output, based on a combination of the video’s priority level and the vendor’s accuracy estimate. The employees responsible for serving in this function could be a network of student employees, structured under a hub-and-spoke model as described above. This process depends on the selected vendor exposing its percentage accuracy value to the end user. Currently 3Play Media does not do this (the value is used solely for internal escalation). However, they say they intend to do so in a future release.

Audio and video content is prioritized, either centrally or locally.
All videos that need captions based on their priority level are captioned by a selected vendor using ASR. The output provided by the vendor includes a trusted estimate of its accuracy percentage.
If the video has an accuracy estimate that meets or exceeds a specified threshold (e.g., 98% for high priority videos, 97% for medium priority, 95% for low priority), the captions are published without further review.
If the video has an accuracy estimate that is below the acceptable threshold defined by its priority level, it is added to a queue for human review by the UW team.
A UW employee (either within the local unit or within a central support unit) is assigned the task of reviewing and editing the captions. They do so using a caption editor tool that is provided, then publish the captions when finished.

Example Audio Description Process

Multiple captioning vendors now include AI audio description among their service offerings. This technology is not yet mature enough to be useful for people who depend on audio description in order to understand the content of video. However, it might be useful enough to provide a starting point, from which more meaningful audio description can be created with review and editing.

Video content is prioritized, either centrally or locally.
Videos that are determined to need high quality audio description are outsourced to a vendor who provides this service.
Videos that are determined to need audio description, but not high quality audio description, are described by a selected vendor using AI audio description. No AI audio description is reliable enough to be published without review at this point. In the future, technology advancements might make it feasible to publish some descriptions at this point, without the additional following steps.
A UW employee (either within the local unit or within a central support unit) is assigned the task of reviewing and editing the audio description. They do so using an audio description editor tool that is provided by the vendor, then publish the video with audio description when finished.

Options for Training/Education

There are a variety of perspectives within our Working Group as to whether training should be required of faculty. On one hand, faculty are already overwhelmed and should not be required to take on additional tasks; any recordings they create should be made accessible with little additional effort required on their part; therefore minimizing or eliminating the need for training. On the other hand, there’s a baseline level of knowledge that all UW employees, including faculty, should have. Training specifically on audio and video accessibility may not be required, but some basic knowledge could be included in a training that covers digital accessibility more broadly.

Key training objectives for faculty/instructors

Understand their responsibilities and priorities.
Recognize the importance of captions and audio descriptions.
Learn techniques to optimize audio tracks for improved captions.
Understand when and how to describe images effectively.
- Describe slides only when needed to understand the concepts. For example, if you have a table, you probably don’t need to describe every cell.
- Provide an equivalent experience for people who cannot review the slides. Accessibility experts all have their opinions on how much to describe.
Familiarize themselves with the process of working with DRS.
Understand how to prioritize recordings for captions and audio descriptions.
Potentially acquire skills to edit captions and audio descriptions

Training objectives for support staff (including student employees)

Understand their responsibilities and priorities.
Recognize the importance of captions and audio descriptions.
Understand how to prioritize recordings for captions and audio descriptions.
Learn how to send recordings to a vendor for captioning and audio descriptions.
Learn how to edit captions and audio descriptions.
Learn how to publish captions and audio descriptions.

The hub-and-spoke models could be applied to training. In these models, each unit has one or more designated experts who are trained centrally and can teach the basics to others in their unit (i.e., a “train the trainer” model). In the spirit of universal design for learning, training could be offered using multiple methods, as described in the following sections.

Asynchronous Training

Asynchronous training offers the advantage of flexibility and the opportunity to learn materials as needed.

ATS Webinar: Video Accessibility could be divided into tool-specific, short recordings that allow individuals to learn how to perform specific tasks.
UW could create and record a mandatory webinar, with an option for in-person training, covering general information about the ADA requirements, UW’s implementation plan, high-level priorities, and available resources and training. This could be a training that is required of all faculty and staff responsible for creating and delivering course content including new hires, TAs, and visiting faculty.
The Accessibility 101 training currently in Canvas could be posted on a web site for easier access by faculty.

Documentation

Document the legal requirements and UW’s implementation plan covering the same content as the required training.
- Note: We have heard that faculty are stressed about not doing the right thing and are worried about getting sued. Information about the legal requirements and guidance should be provided, but it doesn’t need to be front and center.
Create concise documentation with similar information to the Tips for accessible recordings page from the Perkins School and Best practices from Harvard.
Share recommendations for more accurate captions and improved accessibility:
- Speak into a microphone and don’t walk away from it.
- Introduce yourself before speaking if the recording has several speakers.
- Describe meaningful graphics and visual content.
- Read slide titles and numbers.
- Repeat audience questions prior to answering.
- If the recording tool has a dictionary used at the course or department level, add difficult-to-caption words, such as your name and specialized vocabulary, to it.
Publish recommendations for getting started:
- Start small by making a short one to two minute introduction video, in Zoom or Panopto and try out the auto-generated captions. View and correct the captions.
- Some instructors find that ASR is less accurate at captioning discipline-specific vocabulary, strong accents, and audio with less than optimal quality. Practicing with short videos can help them determine which tool and process is most comfortable, as well as learning what works best for them and their videos.
Document how to determine whether recordings are high, medium, or low risks and how to decide which remediation to use.

Synchronous Training

Consider offering a series of regularly scheduled accessibility training workshops — not presentations — during which faculty have the opportunity to actually apply the skills they are learning. Here are some sample topics:

Accessible Videos 101 — universal best practices.
Captions and Audio descriptions in different tools (YouTube, Zoom, Panopto, etc. in separate sessions, not all in one session).
Presentation delivery skills — how teaching behaviors enhance accessibility of lecture recordings including narration, verbal description, pacing, etc.
Advanced topics in video — beyond lecture recordings
- Video editing.
- Capturing live demonstrations in captions and transcripts — i.e., when a professor draws a diagram or does live programming or math solutions on a white board.
- Audio description for complex classroom interactions — i.e., group work or performance, etc.

Developing a series of regularly scheduled workshops offers numerous benefits. Many adults find facilitator-led training more effective than asynchronous learning methods, making it easier for instructors to plan ahead and for departments to encourage or require multiple faculty to attend. These workshops provide opportunities for participants to repeat courses, solidify skills, and ask more questions, fostering continuous learning. Additionally, facilitators can better understand the challenges participants face based on the questions they ask. This approach not only enhances individual learning but also provides insights that can improve future training sessions.

Workshops could be created and taught by LT, ATS, Center for Teaching and Learning, the tri-campus partnership with IT & Digital Learning Colleagues at UW Bothell and UW Tacoma, a centralized Digital Accessibility Center, or through an outside vendor. Workshops could be taught online or on all three campuses.

Faculty Incentives

Faculty should be recognized or compensated for improving the accessibility of their courses. Faculty who take on a more specific or time-consuming role, such as providing direct accessibility support to colleagues or facilitating cross-unit collaboration, could also be recognized through release time or additional compensation.

Options for Policies

The Working Group feels that policy solutions should be aimed at encouraging best accessibility practices rather than discouraging effective teaching practices for the sake of accessibility compliance. Given that faculty have strong pedagogical reasons for using recordings in their courses, we recommend avoiding policies that might limit content creation.

Ideally, university policies provide faculty and staff with a list of clear and realistic requirements, and define the supports that are available to assist with compliance.

Implementation of retention policies helps reduce the number of recordings with inaccessible content. To effectively remove and archive content, the university should continue to implement retention policies, using software to identify unused or outdated content. Instructors can also manually sift through files and selectively copy and delete course materials. Zoom and Canvas already have implemented a retention policy, and Panopto’s retention policy will be in effect after key functionality is released.

Encouraging instructors to record their lectures can significantly improve accessibility, as recordings with ASR captions are generally better than having no recordings at all. However, this approach increases the number of recordings that are out of compliance.

The Working Group had some discussion on whether a policy might require use of specific platforms for deploying recordings. For example, we could require use of Panopto because it provides ASR captioning by default, has audio description playback functionality built into the user interface, and has demonstrated a commitment to accessibility. Also, that would provide a streamlined source of data for monitoring, tracking and reporting on progress.

The UW should strongly encourage instructors to use a templated accessible syllabus that includes standardized language regarding course accessibility. Instructors should inform students that efforts have been made to ensure course accessibility and request that students report issues, including captioning errors.

Options for Technology/Tools

ATS has worked closely with Panopto, Zoom, and Instructure (Canvas) over many years to ensure these tools are accessible. Due in part to our collaboration with Panopto, their product includes a variety of accessibility features such as high color contrast caption colors, variable text size, an accessible interface for screen reader users and keyboard users, and is one of the few media publishing environments that supports audio description. While the tools meet many accessibility standards, they could be improved. Additional features would allow us to track our progress better and make it easier to create accessible content. We have asked for several new features to improve the captioning process.

All of the tools described in the Current Technology/Tools section continue to be viable as part of our future strategy. We are anticipating upgrades to some of these tools in the near future, with features that may be especially helpful for our multimedia accessibility needs.

Native Canvas Video Editor

Instructure plans to enable the auto-captioning videos beta in Canvas by April 19, 2025. Based on updates from a recent beta rollout, it looks like they’re planning to roll out a scaled-down version of their Canvas Studio player to all Canvas users, which should make it easier to generate or update captions.

Panopto

ATS has requested several features to streamline its workflow, provide better captioning data, improve content management, and increase options for assigning roles through Canvas.

The following feature requests are likely to be deployed this year:

Captions on reference copies: A reference copy is a copy of an original Panopto video that refers to the original video’s streams, table of contents, captions, quizzes, and edits made in the video timeline but allows creators to modify certain settings. Currently administrators and creators can’t request or download reference copy captions. We have asked Panopto to add the ability to request and edit captions on reference copies.
- If UW decides to caption re-used content, ATS will request a feature to add an option to automatically provide high-quality captions on reference copies of recordings with the option to opt out.
- If the original and reference copy have different owners, the new owner won’t be able to access the captions unless they unlink the recordings.
Retention policy notifications: Add retention policy notifications, so UW-IT can notify creators which recordings will be permanently deleted. The policy has been communicated, but we will need to send the new implementation date and reminders months in advance. Removing and archiving recordings will decrease the number of inaccessible content available.

ATS will continue to push Panopto for the following additional features, but do not anticipate them to be available in the near future:

Caption request screen: Improve the captioning request screen so it can scale. The current interface gives all captioners access to all accounts, including ones for other departments. Also, the caption request screen is already unwieldy and not organized well for multiple accounts. It would be less user friendly if UW were to create more accounts. Panopto has indicated this isn’t a top priority for them.
Data: Provide caption data including number of recordings with captions and types of captions (manual, automated, edited), audio description data, and data on audio files. Limited caption data is available in theory, but it times out before we can get any information. Panopto is working on fixing this bug. We also requested caption and audio description data via the API.
Bulk download: Allow instructors to download and delete recordings from multiple folders at a time.
Download caption settings: Add more precise caption download settings so UW-IT can turn them on by default but allow instructors to opt out at the folder level if needed.
Subaccount caption roles: To automatically manage role assignments, add roles based on the Canvas sub-account hierarchy. The roles should provide access to caption reports at the unit and folder levels, so department administrators and instructors respectively can view caption data. Assign the captioning role based on the sub-account hierarchy, so department administrators could send recordings to vendors for manual captioning.
Dictionaries: Allow creators and department administrators to create caption dictionaries at the school, department, and folder level.
Report Caption issues: Provide a button on the viewing screen which viewers can use to report issues with captions. The report would caption the timestamp along with an option comment and send it to either the instructor or a central support address.

Vendor captioning and audio description

The cost of manually captioning all University of Washington videos is prohibitive, so UW-IT is evaluating AI solutions for video captioning. The accuracy of AI-generated captions (ASR) is rapidly improving, making these solutions increasingly viable. Adopting AI for captioning could significantly reduce costs while maintaining high accessibility standards. Here are some promising options:

3Play Media

3Play Media is launching a suite of solutions tailored to higher education and has requested our feedback on pricing, packaging, performance and positioning. 3Play Media has pioneered tools that combine AI with human intervention for their internal captioning services and they are now offering these tools to customers. One of the key features of their solution is the ability to predict captioning quality. Customers can set a threshold for caption accuracy, for example, 98%. If the AI detects that the captioning accuracy falls below this threshold, customers have the option to automatically upgrade to human review or make that decision manually. This flexibility allows UW to ensure high-quality captions while managing costs effectively.

They have also developed a similar toolkit for audio description (AD). From our testing, AI description is not yet accurate enough to generate descriptions that are useful for people who are dependent on them, but it can make the audio description process more efficient by creating a first draft of description, which can then be reviewed and edited, either by the vendor (at a higher cost than AI alone) or by UW employees as part of our audio description workflow.

UW currently has a state contract with 3Play Media, and they have been in business for 17 years. They support integrations with Panopto, Canvas, YouTube, Vimeo, and Zoom.

3Play Media offers ASR at $.10 minute. They regularly evaluate all leading AI ASR engines and choose the one that performs best for use in their own service offering. In the past, they have used Open AI Whisper, but recent tests have led them to switch to Assembly AI (they say Speechmatics is also very good).

Echo Labs

The UW is currently participating in an Internet2 NET+ service evaluation of Echo Labs, which uses high quality AI as a first step in their process of generating captions and audio description, and supplement that with human review and editing. A subset of our Working Group is participating in the evaluation and has found them to be responsive, and their caption and audio description quality is high. We are waiting for information about their Panopto integration. We have some concerns about their management and policies. A UW-IT Contracts Administration Manager reviewed and approved the contract.

They have an appealing plan to decrease their turnaround time and are planning to offer a 12-hour turnaround time premium service with the idea that eventually 12 hours will be the standard and a 6-hour plan will be the premium.

Panopto and Zoom

We have not tested the quality difference between the ASR provided by 3Play Media and Echo Labs and those offered by Panopto and Zoom.

Vendor	ASR captioning	AI description	Human captioning	Human description standard	Human description extended
3Play Media	$.10	$1.00 (standard) $1.50 (extended)	$1.85	$7.35	$11.00
Echo Labs	$.45 (includes human captioning)	$2.00 (includes human description)	Included in ASR price	Included in AI price	Included in AI price
Panopto	free	N/A	$1.00	N/A	N/A
Zoom	free	N/A	N/A	N/A	N/A

Pricing Comparison (standard 4-day turnaround time) cost per minute.

Pricing Comparison for 2 million minutes of video content.

Vendor	ASR captioning	AI description	Human captioning	Human description standard	Human description extended
3Play Media	$200,000	$2 million (standard) $3 million (extended)	$3.7 million	$14.7 million	$22 million
Echo Labs	$900,000 (includes human captioning)	$4 million (includes human description)	Included in ASR price	Included in AI price	Included in AI price
Panopto	free	N/A	$2 million	N/A	N/A
Zoom	free	N/A	N/A	N/A	N/A

Notes:

UW created close to two million minutes of content in Autumn Quarter 2024 in Panopto and Zoom. This doesn’t include recordings uploaded into Canvas.
The previous table is provided for reference only to assist in determining the overall cost of making recordings accessible at scale. Actual prices would likely be somewhat lower due to availability of volume discounts.

Options for Data/Analytics:

Data and analytics can help UW address, prioritize and track progress. However, currently most of the platforms being used to deploy recordings, and the tools being used to monitor them, are limited in the data they provide related to captions and audio descriptions. Given the current limitations, UW-IT could today develop a dashboard with data from the following sources:

Canvas API (number of videos uploaded to Canvas, and whether they include subtitles)
Captioning and audio description vendors (data about the services ordered)
Ally (number of embedded YouTube videos on Canvas pages without captions)
UW YouTube Caption Report (data on videos in UW YouTube channels, including whether they are captioned)
Panopto (data on number of videos with captions from various sources)

As the technology evolves, UW’s reporting needs will likely also evolve. For example, ASR captions today are generally held to be insufficiently accurate, therefore are not included in caption counts unless they are edited or replaced. The “percent accuracy” rating that 3Play Media now generates with each of its ASR captions could allow for much more specific tracking and reporting of our captioning efforts, based on accuracy level.

Conclusions

There is no single elegant solution to ensuring course recordings are accessible. All the policies, tools, and approaches explored by this committee came with tradeoffs and caveats, and none of them provided easy and affordable answers.

As the Course Content Action Team considers how to synthesize the elements mentioned, we advise leveraging the same focus that we drew on for our own analysis: investing in actionable, efficient approaches that minimize the burden on faculty and don’t prohibit effective teaching strategies. Whatever combination of solutions the university implements, we recommend communicating explicitly about this intent and how each piece of this solution will fit with the whole.

While this report may not present easy answers, the Working Group came away optimistic about the potential for meaningful improvement in the next year. We believe that with a strategic combination of tools, policies, support structures, and compassionate communication, the university will be able to dramatically improve the accessibility of course recordings.

Acknowledgements

Special thanks to:

Everyone in the multi-media working group for providing input.
Mary-Colleen Jenkins, Program Operations Specialist, UWIT: IT Accessibility Team, for providing input on in-person trainings.
Pen Moon, Director, Center for Teaching and Learning; Katie Malcolm, Associate Director Center for Teaching and Learning; and Beth Somerfield, Program Operations Specialist, Compliance Services – ADA for providing input on training and setting realistic expectations.
Brenda Nunez and Carole Ockerman for providing information on DRS workflows, captioning, and challenges.
Marcus Hirsch and Priya Keefe for coordinating the program and answering our questions.
El Schofield, Shannon Garcia, and Nasrin Nazemi for brainstorming ideas and exchanging information.
The Course Content Working Group on PDF Accessibility for sharing their document and allowing us to re-use some of its content.
Faye Christenberry, Collection Strategy and Licensing Librarian, UW Libraries, for providing information about contracts and the process for captioning recordings in the library collection.
Internet2 staff for facilitating the evaluation of EchoLabs.

Table of Contents

Background

Objective

Requirements

Current State

Captioning (prerecorded)

Captioning (Live)

Transcripts

Audio Description

Current Data: Captioning

Anthology Ally

Panopto and Zoom

Autumn 2024 Partial Summary of Course Recordings

3Play Media

UW Libraries

Vimeo

YouTube

BigBlueButton

Current Data: Audio Description

3Play Media

Panopto

Other platforms

Challenges

Captions

Transcripts

Audio description

Resource constraints

Current Self-Service by Units

Captioning

Audio Description

Current Central Support

Current Training/Education

Current Policies

Current Technology/Tools

Current Approach

Future State – Brainstorming Options, Identifying Levers

Prioritization Guidelines for Transcripts and Captions

Prioritization Guidelines for Audio Description

Who determines the priority of recordings?

Decreasing the number of recordings

Options for Support

Self-Service by Units

Hub-and-spoke models

Liaisons model

Student Accessibility Team Model

Central Support

Example Captioning Process with Escalation to Human Captions

Example Captioning Process with Internal Review/Editing

Example Audio Description Process

Options for Training/Education

Key training objectives for faculty/instructors

Training objectives for support staff (including student employees)

Asynchronous Training

Documentation

Synchronous Training

Faculty Incentives

Options for Policies

Options for Technology/Tools

Native Canvas Video Editor

Panopto

Vendor captioning and audio description

3Play Media

Echo Labs

Panopto and Zoom

Options for Data/Analytics:

Conclusions

Acknowledgements