Tech Tips: Do You See What I Hear? Video and Captions on the Web

Picture of Jenny at a computer.
DO-IT Scholar Jenny works on her computer.

Every day the World Wide Web becomes a little more like television on steroids. Sites like www.youtube.com/, video.google.com, video.yahoo.com, video.msn.com, www.aol.com/video, and www.myspace.com provide thousands of videos in dozens of categories. NBC Universal and News Corp recently announced that they, too, plan to enter the online video arena by showing free online versions of popular television programs.

Want to upload your own movie to the web? Just follow the "Upload Video" link on www.myspace.com and your home video will soon be available to millions of viewers around the world.

So, what's the difference between viewing video online and viewing it on television? Well, one big difference is that most television programming includes closed captions for viewers who are deaf or hard of hearing, thanks to the Telecommunications Act of 1996, which requires captions on most U.S. television programming. In contrast, few videos on the Internet are captioned. This means that individuals who can't hear the audio are missing a large portion of the message. Unfortunately, this is not an easy problem to solve.

Here's how captioning works: First, one must create or obtain a transcript of what's said in the video. Second, one must time-stamp groups of words in the transcript so that these words will appear as they're spoken in the video. Finally, the time-stamped transcript must be converted to the appropriate format so it can be displayed as closed captions along with the video. The first step, creating a transcript of the words that are spoken in a video, requires a lot of time and expense, and is the greatest barrier to online video becoming captioned.

We at DO-IT have produced over 30 videos. In each of our videos, we addressed the transcript problem by starting with a script prior to filming. The script guides production, then after the video is finished it can easily be converted to captions, saving us the time and effort of creating a transcript. Someday, computers will automatically be able to generate transcripts using speech recognition. Current technology isn't accurate enough to reliably do this, although it's getting a closer. You can see positive progress at www.podzinger.com, where speech recognition is used to extract text from podcasts so that users can search them. It's accurate enough for search, but not quite accurate enough for captions. Until computers can generate transcripts for us, we're stuck doing it ourselves.

What about video that was originally broadcast on television? Since that had to be captioned for television, at least that online programming will be captioned, right? Unfortunately, no! Believe it or not, programming that is captioned on television is typically shown on the Internet without captions. This isn't because of technical problems - there are now tools available that convert television-style captions to the format needed for delivering them online. However, an even greater barrier is that online video providers have a very difficult time locating and obtaining rights to the closed captions. Consider the following example. Company A wants to show an old TV series to viewers over its website. It secures the rights to this material from Company B, who produced the original media and owns the rights to it. However, Company B didn't caption their media-this happened much further downstream, by the distributor (Company D) or broadcaster (Company E), who outsourced to a captioning company (Company F). Company A now has to contact Company D or E in order to locate Company F, who hopefully will have the captions for the TV series. There's little consistency in how this has all transpired, which makes the process much more difficult than it probably should be.

Despite these problems, efforts are under way to at least caption some of the video that are available online. Google encourages users to upload captions with their video content, and has developed its own simple technique for marking up transcripts as captions. Google is interested in having captioned video not just for accessibility, but for the same reason www.podzinger.com is interested: Captions make it possible for users to search the full text of videos.

Google has made a very small hand-selected sample of their captioned videos available at video.google.com/videocaptioned. Unfortunately Google does not currently provide a means for users to filter video search results to include only videos that are captioned, but including the phrase "closed captioned" in the search field will sometimes yield positive results.

Similarly, AOL currently offers a portal for viewing its captioned CNN News updates (ability.aol.com/ability/index.php?ID=45&catID=6). However, for other captioned content on AOL users just have to browse the entire collection looking for that rare video that claims to be "Closed Captioned".

A couple of websites have emerged to address the problem of online closed captioned video being hard to find. One is http://harkle.com/, which features a searchable database of captioned videos, and currently includes links to several hundred videos from all over the web. Another website is www.projectreadon.com, which displays videos on their original web pages, but displays captions for each video in a custom standalone caption player which floats over the video. The folks at www.projectreadon.com do the caption work themselves, and take requests from users.

To summarize, video is everywhere. But not everyone can hear it. How can you help to correct this problem? There are many ways.

If you upload videos to the web, be sure to include captions. Once you have a transcript, converting it to captions is easy. There are several software tools available for this, ranging from free to affordable. Contact DO-IT for recommendations.

If you know others who upload videos to the web, encourage them to include captions.

Consider pursuing a career that will help to make the process automatic. Speech recognition is a hot area in computer science these days, and some important work is taking place in the Signal, Speech and Language Interpretation Lab right here at the University of Washington. See their website for more information: ssli.ee.washington.edu.