Pre-thesis Week 10: Feature Set Determination

Updated: Dec 18, 2019

General Framework for image captioning (Task 1):

1. Take screen shot every 30 sec,

2. pause video and do image captioning

3. Insert before/after every frame of image

General Framework for video captioning:

1. Camera Shot segmentation using optical flow

2. verify the length of captioning = length of video - (Task 4) length of conversation

3. (Task 2) Do video translation with Convolutional LSTM

4. (Task3) Do length constrained video translation with Convolutional LSTM + Special loss function

3. (Task 2) Pause video and Insert captioning TTS before/after the video shot

4. (Task 3 - Task 4) Synch TTS / emotional TTS with video

Baseline 0: No audio description, dry video

Test 1: ASych Image captioning description every 30 second, pause and insert.

Every 30 Second capture the current screen of video, do image captioning and put it beck, pause while playing

Test 2: ASych video captioning based audio description, pause and insert

Test 2.1: Cut video between every scene, do video captioning and put it before the scene has been played. Pause while playing.

Test 2.2: Cut video between every scene, do video captioning and put it after the scene has been played. Pause while playing.

Test 3: Sych video captioning based audio description, on another sound channel

Test 4: Sych video captioning based audio description avoiding significant music and conversation, merged with original audio

Test 4.1: Netural TTS

Test 4.2: Emotional TTS (Not sure where to find one?)

Sock Xu