Topic Title: Promotional Short Video Generation


Technical Area: Computer Vision, Multimedia



Promotional short videos are critical for a spectrum of applications. For example, e-commerce is embracing videos to enhance its transaction conversion rate. Moreover, short video contents are becoming the first choice to consumers, especially the young post 90s due to great mobile video streaming experience enabled by the popularity of mobile devices and access to low cost 4G wireless networks.

However, producing an effective short video is challenging in two aspects. First, video editing and production is complex, involving a full set of skills with steep learning curve for most sellers. Second, the traditional pipeline is time consuming and costs a lot. As a result, the potential benefits promised by short videos are hard to be realized.

Recently Alibaba is developing an automatic short promotional video generation system to drastically reduce advertisers’ costs in video editing and production. To achieve high quality video generation, we need to understand how to organize video contents appropriately, supporting a variety of product categories and video styles. This involves understanding how existing promotional videos are generated and building relationship between video scripts and product attributes. It is then possible to create an AI to automatically generate decent videos for new advertisement requirements.



To generate decent promotional short videos for various e-commerce products, it is critical to firstly generate video scripts that can follow human cognitive style, arouse human interests and effectively emphasize key product features. There are numerous ways to manually compile a video, but we expect to find some patterns from real data, so that given a specific product and its advertising materials, computer can generate a complete video script including 1) the gist of the video, 2) the overall structure of the video, 3) the detailed order of video contents, 4) the key concept with a visually meaningful expression in each video shot, 5) the style to keep during video effect compilation.

Data preparation and analysis will be a crucial part of this project. A complete labelled dataset of advertising videos is the base of many potential exploring