Def Project Category: Prototype Driven Project
Frame and Define Research Question:
I want to study and make: An automatic audio description generation system for self-made videos online
because I want to find out:
The need of people for audio description on un-official online video platforms such as YouTube and Instagram
The best way to aid them understand those videos if i is True
by working with: people with vision issue who want to understand causal video content online
in order to: Improve their experience and accessibility when enjoying video contents
2ndary Research:
Several quick facts about online video (Source: https://biteable.com/blog/tips/video-marketing-statistics/):
By 2022, online videos will make up more than 82% of all consumer internet traffic — 15 times higher than it was in 2017. (Cisco)
78% of people watch online videos every week, and 55% view online videos every day. (HubSpot)
By 2020 there will be close to 1 million minutes of video crossing the internet per second. (Cisco)
Several quick facts (State-of-the-art) about audio description:
Youtube and Vimeo, commonly known as two largest online video content provider, do NOT officially support audio description. Which means there is no options for opening an additional soundtrack, even if you paid for a copyrighted film, there is no audio description.
There are crowded voluntary projects such as:
You describe (https://youdescribe.org/) use volunteers to describe videos one by one. However, it is really hard to cover using fleshes
Descriptive Video Exchange (http://www.vdrdc.org/research/dvx) prompts volunteers record their comments when watching DVD and send the DVD with records to people having trouble watching.
However some of them are dying due to the media they chose, and it is really hard to catch up with the speed of new video generated.
There are commercial projects (www.3playmedia.com) providing services for video content, the prices are usually given upon business cooperation.
There are medical academic institutions (http://www.vdrdc.org/) specialized on video accessibility. And there are people who are working on “Algorithmic Automated Description”. However according to their own description, their research is “Preliminary” and current goal is to “automated announcement of scene changes, or the use of text-to-speech for the reading of on-screen text”. So it looks quite far away from the goal we are having. However, the direction they aimed at is pretty desirable to me.
Assistive technology for blindness and low vision - Chapter 16: Descriptive Video Services
DV: Description video: the art of enhancing audio-visual content by insert- ing verbal descriptions where circumstances permit, for example, between dialogues.
WHY: 87% of blind and low vision people regularly listened to TV and videos or DVDs (Douglas et al., 2006).
State of art in US: In 2011, Congress passed the Twenty-First Century Communications and Video Accessibility Act, enabling the FCC to reinstate DV ratios for broadcasters and this should be implemented by July 1, 2012. ‡ This is a major step forward for blind and low vision people since it could amount to at least 50 h of DV broadcast coverage per week by all the major networks
Production: delimit with time codes the gaps where DV could be inserted -> quality control -> recorded by a voice talent and synchronized with the sound track
Workload: Generally speaking, it is estimated that it takes an average of one working week to produce 1.5–2 h worth of DV (Foucher et al., 2007).
Rules: render the description in the gap available
Research Done by 2012: Review
DV required a distinctive language usage that has its own form and function, Piety (2004)
Turner proposed a DV typology to enrich f ilm indexing and, potentially, to automate DV production (Turner and Mathieu, 2008): His approach offers a classification of seven types of information: appearance, action, position, reading, indexical (indication of who is speaking), viewpoint, and state.
Benecke (2007) found the linguistic aspect of DV so peculiar that he suggested the need for specialized terms such as “character fixation” in order to avoid describing a character by his role in one instance and then by his name in another
Peli et al. (1996), Pettit et al. (1996), Schmeidler and Kirchner (2001), and Ely et al. (2006) reported on evaluations done with blind and low vision people on the value and importance of DV.
Available tools (by 2012):
no standard exists to support the development of software tools currently used by describers in the postproduction industry
In the research community, some tools are being designed and tested to assist the describers more efficiently
Two objectives were identified: (1) to provide a tool to assist DV production based on some automatic algorithms of image and speech processing to reduce production time and (2) to develop an accessible player design based on the results of usability studies to offer an enriched DV experience to blind and low vision people.
Consultations with End-Users (Gagnon et al., 2009) :
priority should be given to identifying the principal characters as soon as possible, then presenting the action and the place.
Participants disliked getting interpretation instead of description or added information not present in the image
Quantity of DV needed could vary a lot from one individual to another.
Description of the VDManager (Skip, I am not building tools)
Commentaires