Filmmaking with Midjourney &
Generative AI
For the past few weeks, I've been experimenting with GPT-4 and Midjourney v5.
Last weekend I had a thought — what if I combined the two tools to make a short film? — GPT as a writer and Midjourney as a cinematographer.
Along the way, I discovered AI-generated voice and music composition tools, which I added as members to my production.
As for the story, I decided to create a film about Tiong Bahru — an iconic residential neighborhood in Singapore — a place special to me — where I've made many friends and memories.
I created the film with the full collaboration of generative AI collaborators — GPT-4 as the writer, Midjourney as the cinematographer, AIVA as the composer, and MetaVoice as the voice actor.
I'm capturing my thoughts and experiences from this experiment to share my enthusiasm about these new generative tools.
Midjourney as Cast and Crew
Midjourney as an Actor
One unexpected feature of Midjourney was the performances it produced. Based on my prompts, Midjourney created "actors" with performances that matched my instruction.
Like any film shoot, Midjourney had its collection of bad "takes" — poor acting, poor composition, and hick-ups during various takes. But by providing feedback and better directions, Midjourney was able to improve performance — like any good actor.
Even more surprisingly, I quickly felt emotionally connected to the fictional characters that Midjourney was producing. As I selected the suitable performances and actors for the story I was creating, I started feeling connected to these fictional characters. I felt motivated to bring their stories to life.


In many ways, the experience didn't feel so foreign to me. When I shoot movies as a cinematographer, I seldom have control over the performance. In filmmaking, so many things are serendipitous, and working with Midjourney felt like taking serendipity to an extreme.
Midjourney as a Cinematographer
I found Midjourney's composition, color, lighting, and blocking highly sophisticated. The more I interacted with Midjourney, the more I appreciated each frame's beauty. In many instances, I felt that Midjourney produced more expressive images than I could shoot a scene. Midjourney gave me new perspectives, similar to how Grammarly offers me new ideas to improve my writing.

Midjourney as a Production Designer
One of my key goals in Whispers of Tiong Bahru was to capture the essence of the neighborhood of Tiong Bahru. I didn't want to create just any depiction of Tiong Bahru, but how I see the neighborhood. For me, this meant the greenery, the iconic curved buildings, the connection to the outdoors, the sunlight, and the people in the community.
With the help of Midjourney, I created images aligned with my vision of the essence of Tiong Bahru I wanted to convey.


These images feel like synonyms to reality — like parallel universes. In fact, through Midjourney, I was discovering a new side to the neighborhood.
Midjourney as a Storyboard Artist
Without a doubt, generative AI opens new ways to do storyboarding and pre-visualization and for creative professionals to communicate intent and brainstorm ideas.

By quickly creating frames, I was able to build out the visual narrative to greater fidelity and speed than I've ever been able to in the past.
Working with Midjourney
On Steerability
I had several issues with steerability that I struggled with. Keeping the characters consistent was a significant struggle. Specifically, I spent much time attempting to keep the faces, bodies, and clothing consistent. For example, I struggled to keep the color of the shirts similar between the characters, let alone their faces and bodies. People with good visual memory should quickly recognize that the characters in Whispers of Tiong Bahru are different in every scene.

At some point, I accepted that this was a quirk of working with Midjourney and decided to work around it.
My Prompting Journey
Prompting with Midjourney felt like learning how to use a camera all over again. When I started taking photos, my first images were terrible, and they improved. It was the same with Midjourney — just that the learning curve was much faster.

I also learned that Midjourney has several parameters for controlling outputs better — chaos and stylize. I found both of these parameters indispensable in getting the frames I needed. For anyone considering Midjoruney as a professional tool, I suggest learning how to use these parameters appropriately. They are the equivalent of learning a camera's aperture and shutter speed.
Thoughts on Prompts
Not once in the creation of Whispers did I feel that I would become obsolete as a creative professional.
I realized that what made this film possible was my database of personal images — my collection of photos from the neighborhood. Initially, I started prompting using pictures of the area from Google Images. It quickly became clear to me that none of the images on Google captured the neighborhood how I saw it, and as a result, Midjourney was also unable to produce the frames I needed.
Luckily, I had an archive of photos from my favorite parts of the neighborhood from the best moments. The final frames in Whispers could not have been produced without these images as a part of my prompts — or at least, so I like to believe.
Outtakes
Many frames didn't make the final cut, but they were beautiful nonetheless. I would love to share them here.

My Role
My ultimate role in this production was directing and editing. Both GPT-4 and Midjourney created both fantastic and nonsensical material. I was the one deciding what stayed and what went. I've given both tools feedback to tighten the story and frames. I've orchestrated these pieces to go together. I decided to keep things and cut things. I am the editor, the remixer, and the one setting the intention behind the piece.

Despite having a wildly creative and expressive tool, I still felt in control of my work, and ultimately it feels like mine, as much as a director might think that a film is theirs, even with the hundreds of crew members that worked to put together a movie.
With that said, I believe that an AI can replace the role I played. I imagine a future where AI can create perfect story arcs and better edits than 99% of humanity.
Will another person be able to generate the images I have created with Midjourney without the experiences I've had as a person?
Would AI be able to create the story I have created fully organically?
I feel optimistic in that I still think that humans will always have a role to play, even if the degree of abstraction increases.
Ideas and Opportunities
My experiment revealed several interactions and products that would be highly beneficial for filmmakers.
Variety and Fidelity of Prompts
How might we increase the vocabulary creators can use to interact with AI systems?
When creating movie frames, in many cases, it would be easier to describe the frame through a sketch rather than through words. I often wanted to sketch the framing and composition I had in mind so that Midjourney could use it as a reference.
In addition, for the score in the film, since I don't have a music background in musical composition, I wanted to hum into the system so that it could build out compositions based on my humming.

Generative Foley
What if generative AI could add soundscapes to visual images?
I wanted to add a soundscape to the short during this film's production. If a foley AI could interpret the images and add a soundscape behind them, it would bring this picture to another level.
Layered Generation
What if Midjourney could create images with individually controllable objects and entities?
If I could control the environment and characters separately, it would be incredibly powerful and would help me keep characters and locations consistent throughout the frames.

This thought comes from AIVA, the AI composer I used for the music composition. AIVA created pieces with separate tracks and gave the user the ability to edit each track separately after the AI generated the score. For example, I could change a piano into strings after it produced a piece of music, giving me more control over the final polishing.
An Educational Tool
How might we use generative AI to educate the next generation of artists?
Midjourney felt like an excellent tool for filmmakers to get a good sense of the core elements of filmmaking — being able to experiment with storyboarding and visual compositions at a greater fidelity and speed.
In the same way AI has helped chess players increase their overall chess skills, filmmakers will be able to learn much more quickly and further than ever before.
Closing Note
Working on this project, I couldn't help but notice that I felt as excited as when I first saw Reverie by Vincent Laforet — the first DSLR movie shot, celebrating the beginning of DSLRs about to shoot moving pictures for the first time. I was stunned at the time and felt it marked the beginning of a new era in filmmaking. It made the "Cinema-look" more accessible at a cheaper cost to everyone. Midjourney feels very much like that to me. It's a game-changing tool that can bring visual artists to another level.
I can imagine much more can be done by using generative AI for producing creative work, and I'm very excited to see this space moving so quickly.