The Grammar of Film Editing

© 2003 Pamela Cole

Georgia State University


In the hundred-year history of film there has been ample debate about the ways that the human mind is able to interpret 24 flickering images per second into a narrative nearly indiscernible from reality. The question has been addressed by theorists from fields ranging from linguistics to psychoanalysis to Marxism. Different philosophers from various schools of thought have each tried to claim a piece of the great mystery of filmic perception, all the way to the current day. While film displays a remarkable similarity to reality, it is the interpretation that comes from the differences between film and reality that lead us to perceive film as reality. Thus, the primary instrument for creating Coleridge’s “willing suspension of disbelief for the moment,” is film editing (Coleridge, 1907).

This paper will examine the literature from different schools of thought including classical film theory, semiotics, audience, and communication theory to prove the existence and the effectiveness of the grammar of the language of film, and attempt to answer the question, “what is the grammar of film?”

History of Editing

Early filmmakers quickly discovered the powerful potential of film once it was removed from the static position of being merely a “recording instrument.” In the early 1900s, the two Lumière brothers, Auguste and Louis, would set up the camera and simply record whatever moved in front of it. This in itself was miraculous. There was no subsequent editing of the film or movement of the camera during the shot. Another Frenchman, Georges Méliès, began to make the first forays into primitive editing techniques, inserting the first titles, developing the first dissolves of one scene into another, and devising double exposure and other “tricks” of film. This was the beginning of true editing, or producing events on film as they would never occur in reality (Millar & Reisz, 1968).

 Filmmakers continued to refine editing techniques, realizing the freedom that editing provided for true creativity and narrative. Film did not have to be simply a recorder of moving events as they were; it could be used to create whole new events and present stories in ways that could only occur on film. Time, space, and geography could be wholly manipulated to the mind of the director. The crowning event in film editing, and one that has defined film construction and narrative ever since, was the arrival of D.W. Griffith’s film, Birth of a Nation (1915).

If Griffith was the artist responsible for this leap in film editing, German psychologist Hugo Munsterberg was the theorist. Munsterberg, who is arguably the father of applied psychology, saw only one film in his life: Birth of a Nation. He was so excited by the exhibit that he immediately wrote the treatise now regarded as one of the classical works in film theory, The Photoplay: A Psychological Study (Munsterberg, 1916). This treatise was published the year after Griffith’s film premiered. In The Photoplay, Munsterberg identified the following film “devices,” as he called them, used in Birth of a Nation:

·        Continuity editing – editing that directs the flow of attention of the viewer, providing a smooth flow of consciousness, not jarring or interrupting the viewer with the notion that they are watching a film.

·        Close-up – camera placed near the subject, which fills the frame and focuses the attention of the viewer.

·        Flashback – a film sequence that recalls an event that happened in the past, but shows it in the present. Simulates memory.

·        Cross-cutting – cutting between two lines of action to show each one happening in real time and then converging.

·        Special effects – reference to visions and fantasies or dreams possible by manipulating the light or film (Braudy & Cohen, 1999, p. 400).

Clearly, these are still primary elements in film construction or editing. Munsterberg was able to isolate these devices through his one observation of Griffith’s film. In his zeal to define the qualities of the edited film, Munsterberg says, “The photoplay tells us the human story by overcoming the forms of the outer world, namely, space, time, and causality, and by adjusting the events to the forms of the inner world, namely, attention, memory, imagination, and emotion” (Braudy & Cohen, 1999, p. 402).

About this same time, the Russian filmmaker and teacher Lev Kuleshov documented the differences between Soviet film and American film in 1916, saying, “Russian film was constructed of several very lengthy shots photographed from a single position.” He went on to compare that to an American film that “consisted of a large number of short shots filmed from various positions” (Kuleshov, p. 46). This in essence, describes the evolution of film editing in a nutshell. Kuleshov became a major proponent of film editing and formulated the famous “Kuleshov Effect,” which proved that the juxtaposition of separate images would influence the comprehension of their meaning, in effect, create an entirely new meaning. Kuleshov believed that this “interrelationship of shots” was “the basic means that produces the impact of cinematography on the viewer” (Ibid, 1916, p. 46). Kuleshov went on to say, “The content of the shots in itself is not so important as is the joining of two shots of different content and the method of their connection and their alternation” (1916, p. 47). This “method of their connection and their alteration,” which is the essence of film editing, was Kuleshov’s primary interest.

 This early montage theory was absorbed by Kuleshov’s student, Sergei Eisenstein, who is now revered for his contributions to film, specifically the theory of montage. Eisenstein’s montage was more violent than Kuleshov’s; it was a collision of images that gave birth to a new idea in the mind of the viewer. The collision was carefully constructed by the director to manipulate the mind of the viewer, which in turn, illustrated the importance of the editing process that took place after the film was shot (Eisenstein, 1949).

Montage is the first and central building block of film editing. Today we refer to montage simply as “cuts” and while it is the most basic instrument in the toolbox of the modern film editor, it is only one of an unlimited number of ways in which the editor, in Kuleshov’s words:

“takes the viewer as if by the scruff of his neck, and let’s say, thrusts him under a locomotive and forces him to see from that point of view: thrusts him into an airplane and forces him to see the landscape from the air, makes him whirl with the propeller and see the landscape through the whirling propeller” (Kuleshov, 1916, p. 59).

Is Film Language?

If we are to consider the question of grammar in film, we must first ask the question that contemporary film theorist Christian Metz is so familiar with: Is film language? To answer this question, we must traverse into the land of semiotics fathered by Ferdinand de Saussure (coincidentally around the same time that Munsterberg and Kuleshov were formulating their film theories). Saussure, who had no interest in narrative but is more concerned with mechanics and structure (not unlike the formalist Lumière filmmakers), would have no problem pronouncing that film was neither language nor langue (a sign system).

Nonetheless, the term “film language,” according to Metz, “gradually assumed a place in the special vocabulary of film theoreticians and aestheticians” (Metz, 1992, p.168). For example, Kuleshov said, “The shot should act as a sign, as a letter of the alphabet, so that you can instantly read it, and so that for the viewer what is expressed in the given shot will be utterly clear” (Kuleshov, 1929, p. 63). He went on to say that, “Each separate shot must act as each letter in a word” (Ibid, 1929, p. 63). Thus, Kuleshov introduced the concept of semiotics and film as language long before semiotics was a subject for discussion in film theory.

Other theorists have also used the analogy. Eisenstein compared the structure of Griffith’s film Birth of a Nation to the writings of John Milton, Charles Dickens, and Walt Whitman in his essay “Dickens, Griffith, and the Film Today” (Millar & Reisz, 1968, p. 27). Pipolo describes the “cinematic vocabulary” as:

“long shots and long takes over close-ups and editing; slow, exploratory pans and tracks; intertitles throughout to precisely fix the dates and places of the action; black leader interspersed between scenes often concealing extraordinary elisions in the progress of events” (Pipolo, 2000).

Modern film editor, Wohl, bluntly states that “The metaphor works for film because film is a language, and the editor—even more than the director or the cinematographer—must be a poet” (2000, p. 31).

So despite Saussure’s protests, film as a language has found a place of its own, in the editing world at least. Saussure’s argument against film as a language was that it had no unit that could be broken down to the level of a word. He believed that every shot, which was the smallest unit of meaning or “syntagma” of a film, existed at the level of a sentence. Semiotics was a system devised for the study of language since Saussure believed that language constituted the best and most sophisticated use of signs as meaning. However, by Saussure’s own definition, language is a “system of signs that express ideas only in relation to each other” (Silverman, 1983, p. 6). This sounds remarkably like Eisenstein’s definition of montage, which says “meaning is not inherent in any one shot but is created by the juxtaposition of shots” (Sobchak, 1980, p. 105.)

Metz later concludes that film can be considered to be a language “to the extent that it orders signifying elements within ordered arrangements different from those of spoken idioms” (1992, p. 177). Given that definition, let’s progress to the grammar of the language of film.

The Grammar of Film

The grammar of film. According to the American Heritage Dictionary of the English Language (2000), grammar is:

·        The study of how words and their component parts combine to form sentences.

·        The study of structural relationships in language or in a language, sometimes including pronunciation, meaning, and linguistic history.

·        The system of inflections, syntax, and word formation of a language.

·        The system of rules implicit in a language, viewed as a mechanism for generating all sentences possible in that language.

Armed with this concrete, if broad, definition of grammar, we can now discern what is the grammar of the language of film.

Since film is not a language of words, but rather a nonverbal language of images, we must make an appropriate substitution for “words” and “sentences” in the above definitions. Saussure and Metz both agree that there is no minimum filmic unit that corresponds to a “word.” Therefore, each considers that the smallest unit in film may be the shot. But Metz adds:

“One cannot conclude, however, that every minimum filmic segment is a shot. Besides shots, there are other minimum segments, optical devices—various dissolves, wipes, and so on—that can be defined as visual but not photographic elements.” (Metz, 1992, p. 177)

If we assume that the shot begins at the level of a sentence, then the shot must consist of a number of items roughly analogous to phrases, punctuation, syntax, and inflections, so that perhaps a shot can be diagrammed as precisely as a sentence. (Indeed, anyone who has ever seen a shooting script knows that a shot must be diagrammed as precisely as a sentence.) Wohl says, “In cinematic language, sentences are constructed in a noun-verb-noun-verb arrangement and are punctuated with fades, freeze frames, and other graphical elements” (2002, p. 31) .

 Metz seems to agree with Wohl, saying that the optical devices he observed function as a kind of punctuation:

”The expression ‘filmic punctuation,’ which use has ratified, must not make us forget that optical procedures separate large, complex statements and thus correspond to the articulations of the literary narrative (with its pages and paragraphs, for example, whereas actual punctuation—that is to say, typographical punctuation—separates sentences (period, exclamation mark, question mark, semicolon), and clauses (comma, semicolon, dash, possibly even ‘verbal bases,’ with or without characteristics (apostrophe, or dash, between two ‘words,’ and so on)” (Metz, 1992, pp. 177-178).

For example, if you consider some basic film grammar elements (or optical devices as Metz refers to them) to be shots, camera angles, camera movements, and transitions then each element could have the following modifiers: (Wohl, 2002, pp. 31-79)

·        Shots - close-up, medium, medium close-up, full, master, single, two-shot, reverse, point-of-view (POV), over-the-shoulder (OTS)

·        Camera movement - static, fluid, hand-held, dolly, pan, tilt, tracking, zoom, or crane

·        Camera angle - high-angle, low-angle, Dutch angle, or overhead

·        Transition – dissolve, fade, cut, jump cut, match action cut

Perhaps our biggest clue to the existence of grammar in film lies in the acknowledgement of film as narrative. Metz points to the work of early filmmakers including Méliès and Porter in “inventing” the optical devices he refers to above, but he credits Griffith with their use in creating the first film narrative. Referring to Birth of a Nation, Metz says, “Thus, it was in a single motion that the cinema became narrative and took over some of the attributes of a language” (Metz, 1992, p.170). He went on to say, “Had the cinema not become thoroughly narrative, its grammar would undoubtedly be entirely different (and would perhaps not even exist)” (Ibid, p. 88).

            Bazin identified potential grammatical elements of the language of film when he discussed the following film techniques as useful for recording reality, (the primary purpose of film in his opinion):

  • Deep focus –depth-of-field where everything in the frame is in focus
  • Long take – as opposed to short takes, which are edited together
  • Camera Movement – motion of camera during the shot
  • Mise-en-scene - French term that literally means “to put in the scene” everything that is required to communicate the shot.

Bazin called “deep focus,” or the judicious use of depth-of-field, “a dialectical step forward in the history of film language” (Braudy & Cohen, 1999, p. 48).

            Wohl identifies what he calls the “rules of film grammar.” He says, ’While the language [of film] is still growing and evolving, quite a few of the basic rules have remained surprisingly consistent over the years” (Wohl, 2002, p. 69). He identifies the rules as follows:

  • New Shot = New Information – Don’t use a shot unless it provides new information to the story. “Pudovkin held that if a film narrative was to be kept continually effective, each shot must make a new and specific point. He is scornful of directors who tell their stories in long-lasting shots of an actor playing a scene, and merely punctuate them by occasional close shots of details” (Millar & Reisz, p. 30).
  • Screen direction – Don’t cross the eye line and make sure characters enter and exit from appropriate sides of the frame.
  • Cut on Action – To hide an edit, cut on a movement or action in the shot.
  • Match Your Shots – Try to cut between shots that match in focal length and frame size.
  • Cut Moving to Moving, Still to Still – When making cuts, avoid cutting from a moving shot to a still shot.
  • Find a Compositional Link – Find an element that provides visual continuity between shots.
  • Manipulating Time – Shots can be set up to condense time quite differently from time in reality.
  • Respect Silence – Use silence where it is most powerful.
  • Set the Pace – Set the pace of the film with your cuts and transitions.

(Wohl, 2002, pp.69-81).

Of course, like most rules, these rules can occasionally be broken, sometimes intentionally to display the effect of editing. Other film editors would undoubtedly add or subtract to this list with their own rules. But this is a good example of the types of film grammar rules that have developed over the last hundred years along with the language of film.

The Effects of Film Grammar. If film is language, and grammar is the link that allows us to read or perceive the story being communicated by a series of separate shots, then how is it that viewers have learned to interpret the grammar of film? There are no classes in school that correspond to “Film Grammar 101,” or “Understanding the Grammar of Film.” Katz said, “What we know least about is the psychology and sociology of the viewing experience itself” (Katz, 1996, p. 9). Katz inquires how much effort is required by viewers to interpret the images presented on television or film screen. The field of cognitive psychology has inserted itself into the debate loudly in recent years, insisting that visual perception is “a product of an array of high-order cognitive activities” (Naaman, 2002).

If we consider the narrative distance between each separate shot as a gap, then the grammar of film becomes the bridges between those gaps necessary for comprehension. Logue & Miller state that “All forms of communication, even face-to-face dialogue, are marked by such gaps; and message-senders and message-receivers alike must concern themselves with ‘negotiating’ or crossing them” (Logue & Miller, 1996, p. 366). While they argue that most “gaps” in communication are bridged by “signs or semiotics,” I would argue that in film, gaps are bridged by signs that constitute film grammar, and the skilled use of that grammar by the film editor. Logue & Miller would also agree with that when they say, ”What makes communication, we would argue, are the media that bridge these gaps, namely, signs of various types that transport meanings across spatio-temporal distances and serve also to reduce or close interpretative distances” (Ibid, p. 368).

            Branigan (cited by Naaman) says that viewers watching a film are really watching four different films at the same time: 1) a celluloid strip of material; 2) a projected image with recorded sound; 3) a coherent event in three-dimensional space; 4) a story we remember as we reconstruct the shots we have seen (Naaman, 2002). This fascinating description gives us a hint of the complexity of the viewing experience and makes us wonder again at the ease with which comprehension occurs. Naaman calls this “the phenomenon of ignoring editing with a scene—which implies spatial and often temporal skips—in favor of accepting the dramatic unity the scene conveys” (Naaman, 2002). The mind, through repetitious exposure to films, or through the high-order cognitive process, is able to assimilate the edits implicit in the grammar of film and form its own narrative. This is of course, if the edits have been applied by a skilled editor in compliance with the rules of film grammar, which have evolved through trial and error throughout the history of film editing. One misplaced jump cut can confuse a viewer and displace the “suspension of disbelief” necessary for film narrative comprehension, just as one misplaced modifier or incorrect pronoun in speech or text will alert the listener or reader that perhaps what they are hearing or reading is nonsense.

            Metz said:

“The cinematic institution is not just the cinema industry…it is also the mental machinery—another industry—which spectators ‘accustomed to the cinema’ have internalized historically and which has adapted them to the consumption of films” (1975, p.2).

Here he is saying that viewers have learned to interpret films simply by the process of watching films over time. We have learned the grammar of film, through repeated viewing of good grammar, and similarly, we recognize “bad film grammar” when we become disoriented or distracted in a film. “What happened?” we may ask when a crucial grammatical element has been left out. The mind strives to make sense of what it is presented with on screen.

In a study of how capable children are of interpreting montage, researchers “counted 311 cuts, 56 pans, 10 zooms, and 4 fades” in a single television episode, for an average of 8.1 “cinematic techniques per minute” (Smith, Anderson & Fischer, 1985).  In another episode, they counted 556 cuts, 118 pans, 38 zooms, 6 freeze frames, and 63 dolly shots—681 cinematic techniques, or 16.7 cinematic techniques per minute. Viewers have to absorb all this in addition to the dialogue, setting, sound and other elements that make up a film. According to the Smith, Anderson & Fischer, “There is a common belief that such ‘film literacy ‘ is gradually acquired” (1985). While studies disagree, this 1985 study found that “substantial comprehension of montage is already established in the preschool years.” They concluded that,

“There should be little doubt that as children’s knowledge of the world increases, their comprehension of montage continues to develop, eventually encompassing appreciation of sophisticated symbolism and aesthetic devices” (Smith, et al, 1985).


This paper has investigated the visual elements that can be said to comprise the grammar of film editing. Since the beginning of film making, certain devices have been recognized as having definable properties analogous to grammar, i.e., cross-cutting, montage, mise-en-scene, and flashback. The visual elements that make up film grammar are what give film the power to communicate effectively. Audiences unconsciously recognize and interpret these elements in making sense of a film narrative.

Conclusion. If one considers that film, as a form of communication, is analogous to a language, then it must follow that such language has a grammar orand syntax of its own making, as does every language. Bordwell tells us, “As film studies integrated itself into the modern liberal arts, it allied itself not with departments of art history or music history, but with departments of literature” (Bordwell, 1998). Since film evolved into a narrative device for storytelling, it makes sense that film theory would investigate the nature of film as language and syntax. The grammar of film, however, is constantly changing, limited only by the imaginations of filmmakers and editors. As such, it is exactly like other living languages, which also constantly change and evolve.


