Orchestrating Digital Micromovies

Glorianna Davenport Ryan Evans Mark Halliday

Glorianna Davenport (Assistant Professor of Media Technology) MIT Media Lab, 20 Ames Street, Cambridge, MA, 02139, U.S.A.

Ryan Evans (researcher, digital filmmaker) MIT Media Lab, 20 Ames Street, Cambridge, MA, 02139, U.S.A.

Mark Halliday (researcher, digital filmmaker) MIT Media Lab, 20 Ames Street, Cambridge, MA, 02139, U.S.A.



The authors describe how computers can be used to build narrative structures that create simple cinematic sequences from a large database of shots. The Digital Micromovie Orchestrator (DMO) does this by allowing the maker to attach sketchy descriptions to video clips in the database and to build narrative abstractions in the form of a layered filter structure that orchestrates the shots in real-time to create a flowing sequence of shots on-screen. The DMO points to a new direction for using digital technology to change the way cinematic narratives are structured. The system is outlined by analyzing in detail the first movie produced using the DMO and by discussing the process of actually making interactive movies using the system.


Thomas: "Hi, what's your name?"

David: "David Kung"

Thomas: "I like that"

David: "What are you doing here?"

Thomas: "I'm a graduate student in the Interactive Cinema Group."

David: "That's true"

Thomas: "What's this light here?"

David: "Well this light fixture actually comes from the first movie-house in Beijing and my grandfather took it as a souvenir when the theater closed down"

Thomas: "Don't listen to this guy. He's an undergraduate. He doesn't know anything."

The interchange above is taken from an unusual movie called An Endless Conversation[1]. The movie is a conversation between two characters. They are talking about the Interactive Cinema Group at MIT's Media Lab where this movie was made. Although the dialog may appear to be a normal, if slightly wooden transcript from a linear movie, An Endless Conversation is no ordinary film. It is a simple example of what we call a personalizable movie and a first step toward new kinds of experiences made possible by digital video technologies.

Video can now be stored and accessed digitally on a computer just like any other type of data. This has profound implications on how narratives can be structured. Instead of one long, unchanging linear strip, a movie can now be made up of hundreds of separate shots that can be orchestrated together in a variety of different ways. A fundamental problem is how to get the computer to assist in this orchestration. Two ingredients are necessary for orchestration to build up to story structures: descriptions need to be attached to the video and narrative abstractions need to make use of these descriptions. The Digital Micromovie Orchestrator (DMO) successfully tackles this problem by using a simple keyword-based description strategy based on the filmmaker's intent and by using an extensible layered filter structure that adaptively changes to the current state of the story to suggest an appropriate next shot.

Many interactive experiences are created as a pre-computed structure that requires input (interaction) from the user at specific pre-defined narrative junctions. The reverie of the experience so important to any successful cinematic narrative is lost in the imperative of such interactions. A viewer should be able to sit back and enjoy the show without frequent interruptions caused by mandatory interactions. The Digital Micromovie Orchestrator (DMO) creates personalizable movies which allow a more fluid kind of interaction. The viewer can sit back and watch the movie but he can also subtly change the direction of what appears on-screen at any time. With An Endless Conversation for example, the viewer can personalize the movie by changing the pacing from fast to slow or the rating from PG to R while the film is running.

Such fluid interaction is made possible because the computer is filtering out each next appropriate shot in real-time. Using the narrative framework created by the DMO, instead of pre-computing the entire story, a decision is made each time a shot is needed by successively filtering the clips from the entire database down to an appropriate shot for the current point in the movie. These filters are layered to build a narrative structure that mutates as the movie progresses, depending on user interaction and on the current point in the story.

Real time orchestration of what appears on-screen allows the creation of new kinds of cinematic experiences that were not possible before. Currently, we can create simple cinematic sequences such as a conversation or an action sequence of a character climbing some stairs. In future creations, we can imagine allowing the user to specify: "I have thirty minutes to see this movie. I'm really interested in what Max has to say, but I don't want too much violence" and then halfway into the movie decide to lower the violence level even more through fluid interaction. Thus, a movie can be different each time it is watched and it can be personalized and changed while it is running.

A Digital Environment

Digital video has opened up a whole new domain within which to explore alternative ways of telling stories with moving images. In the digital domain, video becomes just another piece of data that can be described, accessed and displayed. The DMO takes advantage of this by making use of micromovies. A micromovie [2] is a short piece of video with descriptive information attached to it which represents a unit of meaning determined by the filmmaker’s intent. These micromovies are filtered by the computer based on the machine readable descriptions that form part of a micromovie. Even though each individual filter might be quite simple (one for pacing, one for continuity, etc...), it is the layering of multiple filters that can be changed in real-time (through interaction) that gives the system its power and flexibility.

The DMO can be thought of as a simple machine implementation of an editor. A film or video editor takes a collection of raw footage, logs it and then uses an idea or script as a basis for editing together sequences of shots that form a video narrative. Similarly, the DMO has a template for a story in the form of a succession of layered filters. A group of filters can be tailored to a particular scene: we have developed groups of filters for dialog (two characters talking), action (climbing stairs) and interaction (pacing, rating). Each time an edit is required, the filters select an appropriate shot for the current point in the story. The most important aspect of the system is that the template can be layered and changed in real-time, allowing interaction to take place at any time during the story.

Using this approach, the DMO allows us to explore a new mode of interaction called fluid interaction. Imagine a narrative that progresses much like a linear film, but when the viewer is interested in changing the flow or feel of the experience he is free to manipulate it along certain dimensions [3]. This type of interaction provides an experience that lies somewhere between the constant clicking and decision making of current interactive narratives (e.g. computer adventure games or interactive training programs) and the lack of interaction inherent in linear films. The viewer can get caught up in the reverie of the experience without the interruption of interaction until he desires to change the film experience. Changing the experience might be as subtle as speeding up the pacing a bit or it might be as drastic as changing the focus of the narrative from Dave's character to Thomas'.

Design and Implementation

The DMO consists of a video logging module and a filter based shot selector. The logger allows the author to bind "sketchy" descriptions to fragments of video, creating digital micromovies. A sketchy description contains the minimum amount of information for use in a particular movie. The selector module makes use of the descriptions attached to micromovies to sift out unwanted shots using a set of filters designed by the filmmaker and manipulated by the viewer. Several different types of filters are chained together to orchestrate a sophisticated personalizable movie. Current filters include template filters, continuity filters, stylistic filters, and interaction filters, but the selector mechanism is designed to easily incorporate new types of filters.

The Logger Module

The DMO logging component guides the author through a process which is very similar to creating logs for use in editing a traditional linear film. It allows the creator to associate information in the form of keywords with fragments of digital video. The logger encourages sketchy logs whose depth and sophistication are not sufficient for general use, but rather allows the author to quickly record descriptions. The logger and the entire DMO system describe video and construct movies at the shot level. The filmmaker chooses in and out points in the video and describes the content of each shot with keywords in a manner which is appropriate to the experience she is trying to build. Thus the filmmaker decides what constitutes a good shot while the system decides (with guidance from the filmmaker) what constitutes a good sequence.

The log is structured around slots and values using keywords as descriptive elements. Each shot has several slots attached to it and each slot contains one or more values represented by keywords. (see fig. 1) Typical slots used in An Endless Conversation include character, type of utterance, subject of utterance, rating, and pacing. Keyword classes and slot groups provide default values within the logging tool to help maintain consistency across logs [4]. The DMO system has no required slots for descriptions. Instead, the author is able to create and delete slots at will and design filters which depend only on those slots. This free-form description structure provides the creator with the freedom to log values which are important to the movie while ignoring other values.

Fig. 1 A micromovie with descriptive slots [5].

The Shot Selection Module

Just as the logger module mirrors the process of creating logs for a traditional editing job, the shot selection module makes use of those logs to mirror the process of actual editing. An editor chooses possible shots based on knowledge of available video, story structure and cinematic style. The selection module chooses shots based on a simple layered filter model which mimics some of this knowledge. The module takes as its input the entire video database along with a set of specifications from the filmmaker and returns all of the video clips which meet those specifications. The shot selection process is accomplished through layered filters which progressively weed out shots which do not qualify. The filters are structurally very simple. Each contains a set of descriptions which are matched against micromovie descriptions. Since the output of each filter is simply a subset of the original database, the filters are easily chained by applying one filter to output of another. The filters which select shots for a complete movie can be added, deleted and mutated over the course of the presentation, either programmatically or through viewer interaction.

To create An Endless Conversation (the movie from which the example at the beginning of the paper comes) the filmmaker designs four filter layers that are used by the shot selector to create the movie. The first filter layer is based on a template structure that parallels the structure of the conversation between Thomas and David: a question, a response, a reaction, another question, etc. The second filter layer matches continuity across the movie. In An Endless Conversation the topic of each response must match the previous question. The third filter layer is a stylistic filter. This filter is created from a rule base that embodies some of the filmmaker's ideas about cinematic conventions. During the course of the dialogue this layer specifies when to use a close-up, a medium shot or a wide shot of the character based on the framing of previous shots. The fourth and final layer is the interaction filter. In An Endless Conversation this filter is mutated based on the user's preferences concerning rating and pacing.

Template Filter

The template filter is perhaps the simplest filter layer in the DMO system, but it is also the most powerful authoring tool. The author outlines a template which describes the progression of the movie as a collection of filters arranged along a time line. A driver built into the system applies these filters (layered along with other types of filters) to choose each successive shot. The filters are activated and deactivated on a shot by shot basis according to the time line. In An Endless Conversation the template is specified such that the first shot must be a question from one character, the second shot must include the other character responding to the question and the third shot is the first character again with a reaction. This template structure is easily translated into a list of filters which are matched against micromovie descriptions.

Continuity Filters

The continuity filter layer is changed over time based on the course of the movie up to that point. This layer tries to control consistency of image, plot and character. A program embodying some simple continuity rules looks at descriptions of previously presented micromovies and adds filters to the continuity layer which govern the content of the next shot. Using the script at the beginning of the paper as an example, David asks a question in the first shot. The Orchestrator must match Thomas' response to David's question. In this case a continuity filter makes sure that the topic of the response matches the topic of the question. The filmmaker could design this layer to monitor simple continuity points such as setting, costume or time of day. Again, the author only needs to log continuity descriptions which have some bearing on the final appearance of the movie.

Stylistic Filters

The stylistic filter layer controls cinematic conventions during movie presentation. This layer differs from the continuity filter in that it does not select shots based on the content, but instead looks at the style of the shot (e.g. framing, camera angle, lighting). The filters in this layer are constructed from a stylistic rule base which uses previous shots as input (typically the past one or two shots) and outputs shot selection filters. Two simple stylistic rules might read:

•If this is the first shot of the movie then the framing should be extreme wide. (i.e. an establishing shot for the space)

•If the character was seen in the last shot in extreme close up then do not show her in extreme close up in the next shot.

The filmmaker can exercise quite close control over the look and feel of the movie using such stylistic filters.

Interaction Filters

The final filter layer consists of filters which are directly or indirectly manipulated by the viewer. The viewer is presented with various controls before and during the movie presentation. Manipulation of these interface objects changes parameters that modify shot selection filters in the interaction filter layer. An Endless Conversation currently allows the viewer to manipulate the rating (PG, R) and the pacing (fast, slow) while the film is unfolding. This type of interaction is accomplished by translating the user's action on the interface into changes in the filter layer. To change the rating, each shot is logged as R or PG and the rating filter simply weeds out the risqué possibilities when the rating is set to PG. The pacing control creates a similar filter which weeds out lengthy shots if the user wants fast pacing and weeds out short shots if the user prefers slow pacing. More sophisticated interaction modes could be implemented using this filter layer such as a "More" button. This button would allow the user to go deeper into a particular scene at any point in the movie by modifying a template filter layer to spend more time with the current idea.

Fig.2 Layered Filters

A New Approach to Filmmaking

Making a movie within the constraints of the DMO system is very different from making traditional linear movies. The whole process of making a movie has to be open-ended in the sense that micromovies have to be scripted and shot in such a manner that the filtering system has a variety of shots to choose from for each part of the movie. If there are several appropriate shots, the exact response to a question by one character may not be known in advance. It may be long and involved. It may be short and to the point. The response will be decided by the computer and will depend on the interactions made by the viewer. To filmmakers accustomed to controlling every aspect of a production, this apparent loss of control may be slightly disconcerting. A new type of filmmaker is needed who is willing to relinquish some of the control to the computer during the viewing of the film. The reality of the situation is that rich multi-threaded narratives will require that some of the authoring be done by the computer and that the filmmaker create algorithms to control the computer. The filter-based system we are proposing allows the filmmaker to shape the structure of what appears on-screen through the creation of filters. The important point to remember is that both the filters and the filming are controlled by the filmmaker, which means that ultimate control still resides with the filmmaker.

The DMO has evolved out of a long tradition of alternative narrative structures created within the MIT Media Lab that explore new content suited to interactive environments. The real challenge for interactive filmmakers is to come up with content that will create a compelling and entertaining experience within the framework of an interactive environment. An early pioneering application was the Aspen project which explored the idea of surrogate travel using videodisc technology. In that project, the experience was built upon a geographic environment around which the viewer could navigate [6]. New Orleans Interactive is a documentary which began to explore the import of attaching computer readable content descriptions to video clips. While watching a linear movie about New Orleans as the city prepares for, holds and concludes the 1984 Louisiana World Exposition, it is possible, using keywords attached to people, places and themes, to get more information relating to what is currently being watched in the linear movie [7]. Ben Rubin explored the use of layered filters for building a personalized movie. His constraint based editing system selects shots based on limited input from the viewer and builds an edit list to create a linear piece for subsequent viewing, although the filters in the system are not extensible, nor can they be mutated during the presentation [8].

The DMO extends this line of research and looks at how the computer can use descriptions attached to video clips within a framework where the filtering is done in real time. There are two significant advances in the DMO approach: first, because the filtering is done in real-time, the story can be re-directed through interaction at any point. Second, because the layered filters create a simple, yet flexible story structure into which micromovie clips can be "dropped", the DMO takes us away from the need to create a large, fixed structure where every possible strand of a narrative has been pre-scripted. The rate of growth of a fixed structure multi-threaded interactive narrative can sometimes be exponential [9]. It is possible to start thinking about scripting, shooting and editing a film in a much more flexible and open way. A look at the methodology used in the creation of An Endless Conversation will demonstrate the unique approach required for making a personalizable movie.


The first step for the author is to create an appropriate filter structure. The filters have to be created at the outset so that the appropriate video clips can be scripted to fit those filters. In the case of An Endless Conversation, we decided that the template filter would be a question, followed by a response and finally a reaction. This template is repeated ad infinitum, resulting in a limited but coherent interchange between the two characters that goes on forever (hence the title: An Endless Conversation). The other filters will also affect the scripting of a movie. For example, the interaction filters for the conversation require that pacing and rating be taken into account. Clips of varying length and rudeness will be needed.


Once the filters have been defined, the next step is to create a shooting script defining shots which are appropriate for each possible combination of filter layers through the course of the presentation. In An Endless Conversation this involves many different types of statements from each character. For the "question" filter in the template layer, we have utterances such as: "What's your name?", "What are you doing here?", "What is this light?". The response template filter requires clear responses to each of these questions from both characters. A question will have more than one possible response. By scripting several possibilities for each type of shot, the movie can be personalized along the exact lines required by the filmmaker. For example, a question will have long and short responses, allowing the pacing to be controlled. Only short responses will be chosen when the pacing knob is on "Fast". The reaction template filter is a much more generic slot. It includes possible rebuttals of varying length and rudeness by each character to what he just heard: "I agree", "I disagree", "Don't listen to this guy", and so on.


After the script has been laid out, shooting can begin. In a traditional film, the director captures a variety of performances from the actors to allow the editor a choice of the best "take" at one particular point in the story. Likewise, the more takes of a particular shot the DMO has to work with, the more possibilities the layered filters have at any point to personalize the presentation based on user manipulation. In the case of An Endless Conversation, the shooting process can be thought of as very similar to shooting an interview. The main difference is that all possible character utterances are filmed: questions, responses, reactions. The most important aspect of shooting for a personalizable movie is to get a large variety of different takes (long, short, animated, subdued, wild, controlled, etc...). The director must capture a variety of utterances that are also generic enough to be used in different situations. For example, many of the reactions in An Endless Conversation can be used after any character's response.


Once the shooting is over, just like in a traditional film, the footage has to be logged by the filmmaker. Each shot has to have descriptions attached to it so that the program can choose the correct shot using the layered filters. As the raw video is broken into coherent shots by the filmmaker, each one is logged into the database, creating a collection of micromovies that can be accessed by the orchestration program. The descriptions used by the DMO system for An Endless Conversation are actually quite simple. They are: character name, shot length, type of utterance, subject of utterance, rating, and pacing. Based on only these descriptions, the computer can put together a sequence of shots that recreates a conversation between two characters.


The DMO is currently implemented on a Macintosh IIfx computer using Macintosh Common Lisp as a development language and Apple's QuickTime software for digital video delivery. The system resides completely in software and can run on any Apple Macintosh computer.

In addition to An Endless Conversation, a second film, Stairwell to Nowhere, has been completed with the DMO system. The movie presents a short action sequence with a man running up a stairwell and exiting through a doorway. It explores the automatic generation of an action sequence template from a physical model of a space. Before the movie begins, the viewer can specify which stairwell landing the character should begin on and which landing the character should end on. An simple algorithm uses a physical model of the stairwell to generate a template filter specific to the viewers preferences. The algorithm makes sure that both the space and the action played out within it are clearly communicated to the viewer through cinematic conventions. With a system like this the DMO can help filmmakers plan and shoot simple action sequences while making sure that spatial and action continuity are understandable to the viewer.

Another presentation currently being created with the DMO system is CyberCritics. This movie presents two movie critics who review movie clips on-screen and have exchanges about their reactions to the movie. The viewer will be able to choose which movies she wants reviewed and if she wants to hear more from a particular critic. The movie critics project expands the creative options open to the filmmaker in the digital realm. Two of the more exciting ideas are positional editing and automatic compositing. Positional editing refers to placing the characters within the screen space so that they can refer to each other and other objects within the space by looking, gesturing or pointing. This involves not only logging each micromovie with directional information, but also creating a rule base which will allocate screen space to micromovies as they are presented. Automatic compositing explores how two or more digital video streams might be combined in real time. Traditional compositing is the process whereby a character is filmed against a blue background and seamlessly inserted into a background that represents a location. The CyberCritics movie will create a space (location) where the characters can be automatically positioned and talk to each other as in a television show. This kind of system would rely on shot selection filters which not only know the layout of the screen space, but also know what action is required of a character for her to refer to other on screen objects such as looking at another character.

Curently work is continuing on a fourth interactive DMO movie called Train of Thought. In this movie we are beginning to explore more sophisticated cinematic structures in two ways: first by involving a modeof interaction which is essential to telling the story and second by using a larger granularity in the video database. The movie is centered around a romance between an amateur filmmaker, Jack and a woman who is moving overseas in a few days to take a good job. The story is can be viewed by either following a "dream" thread through the story or a "reality" thread. The dream thread explores Jacks’s viewpoint of the situation as a filmmaker, while the reality thread is a more "objective" telling of the story. The viewer can change threads at any point during the story. Depending on the viewer’s path through the experience different parts of the narrative will be played out. The video database for Train of Thought is based on short cinematic sequences ranging up to 3 minutes in length which are often made up of several separate shots. By using a larger granularity we have been able to focus on higher level constructs such as character development and chronological story telling.

As more interactive experiences are created with the DMO system it becomes apparent that research into filter based interactive movies must move from the realm of cinematic sequences (e.g. a dialog between two characters or an action sequence) to the realm of cinematic structure (e.g. parallel action, conflict resolution, flashback, character development). This becomes possible as authors of interactive movies become more familiar with narrative abstraction and building sketchy video descriptions. Creating narrative abstractions involves picking out and formalizing the essential constructs of a particular story and its intented modes of interaction. This becomes more difficult as cinematic structures become more involved. One possibility for simplifying this process are "metafilters". Metafilters orchestrate high level story constructs by selecting filter layers from a descriptive database. Essentially metafilters are filters for filters. This story structure paradigm allows the filmmaker to build low level abstractions for cinematic sequences separately from abstractions related to higher level story structures.


Only very recently have we been able to think about using the computer to control the structure of movies. The Digital Movie Orchestrator is a system that attempts to give the computer some control over the content appearing on screen. A filmmaker breaks the video down into coherent shots, attaches descriptions to these shots to form micromovies and creates a filtering system that controls the sequence of shots that make up a personalizable movie experience. The significance of the system is that the filtering is done on-the-fly, thereby allowing real-time control over the direction of the narrative. With real time shot selection, personalizable movies and fluid interaction modes can be created.

Unlike existing interactive systems which put the onus for forward navigation on the viewer, the DMO allows the viewer to interact in a fluid manner with the story. It is possible to sit back and enjoy the show or to change the direction of what appears on-screen at any time. Fluid interaction maintains the reverie of the story-telling experience, something often lacking in traditional interactive programs.

An Endless Conversation, Stairwell to Nowhere and the other examples outlined in the paper point to new ways of thinking about building films. Interactive narratives have required very complicated branching structures that have until now have been pre-determined by an author or filmmaker. The DMO supports a simple layered filter structure that is designed by the filmmaker during the making but controlled by the computer during viewing. This provides a testbed for complicated interactive narratives while reducing the amount of planning and testing required by authors. As we establish extensive video databases and gain more experience with the structure of layered filters, even more sophisticated interactive movies than An Endless Conversation will become possible.



The work described herein was realized at the Interactive Cinema Group of the MIT Media Laboratory and was supported in part by directed research contracts from Kansa Corporation, Bellcore and British Telecom.

References and Notes

1. An Endless Conversation is included on the CD-ROM QuickTime™: The CD published 1992 by Sumeria, Inc., San Francisco.

2. The term "micromovie" is thought to have been used informally in the Architecture Machine Group in the late 1970's and early 1980's. It was used to refer to the type of short movies which are appropriate to the granularity of interactive video applications. We are extending the definition to include attached descriptions.

3. T. Galyean, "Continuous Variables for Interactive Film", Computer Graphics and Animation Group: Working Paper (MIT Media Lab,1992).

4. T. Aguierre Smith, If You Could See What I Mean... Descriptions of Video in an Anthropologist's Video Notebook, Master’s thesis (MIT, August 1992).

5. The inspiration for the "videobox" representation of a micromovie comes from E. Elliott, Lots o' Video, Master’s thesis (MIT, 1992).

6. A. Lippman, "Movie-maps: An Application of the optical videodisc to computer graphics", Proc. SIGGRAPH (Seattle: 1980).

7. G. Davenport, "New Orleans In Transition, 1983-1986: The Interactive Delivery of a Cinematic Case Study", transcript of remarks (Boston: The International Congress for Design Planning and Theory, August 1987) .

8. B. Rubin, Constraint-Based Cinematic Editing, Master’s thesis (MIT, June 1989).

9. A. Bruckman, "The Combinatorics of Storytelling: Mystery Train Interactive", Interactive Cinema Group: Working Paper (MIT Media Lab,1990).