A large number of videos are captured and shared by the audience from musical concerts. However, such recordings are typically perceived as boring mainly because of their limited view, poor visual quality and incomplete coverage. It is our objective to enrich the viewing experience of these recordings by exploiting the abundance of content from multiple sources. In this paper, we propose a novel Virtual Director system that automatically combines the most desirable segments from different recordings resulting in a single video stream, called mashup. We start by eliciting requirements from focus groups, interviewing professional video editors and consulting film grammar literature. We design a formal model for automatic mashup generation based on maximizing the degree of fulfillment of the requirements. Various audio-visual content analysis techniques are used to determine how well the requirements are satisfied by a recording. To validate the system, we compare our mashups with two other mashups: manually created by a professional video editor and machine generated by random segment selection. The mashups are evaluated in terms of visual quality, content diversity and pleasantness by 40 subjects. The results show that our mashups and the manual mashups are perceived as comparable, while both of them are significantly higher than the random mashups in all three terms.
Shrestha, P., With de, P. H. N., Weda, J., Barbieri, M., & Aarts, E. H. L. (2010). Automatic mashup generation from multiple-camera concert recordings. Association for Computing Machinery, Inc. https://doi.org/10.1145/1873951.1874023