This month, OpenAI announced a newgenerative AI system named Sora, which produces short videos from text prompts.

How does Sora work?

Sora combines features of text and image generating tools in what is called a diffusion transformer model.

What is Sora? A new generative AI tool could transform video production — and amplify risks

40% off TNW Conference!

They are best known for their use in large language models such as ChatGPT and Google Gemini.

Diffusion models, on the other hand, are the foundation of many AI image generators.

A series of images showing a picture of a castle emerging from static.

They work by starting with random noise and iterating towards a clean image that fits an input prompt.

A video can be made from a sequence of such images.

However, in a video, coherence and consistency between frames are essential.

Sora uses the transformer architecture to handle how frames relate to one another.

Leading the pack

Sora is not the first text-to-video model.

Earlier models includeEmuby Meta,Gen-2by Runway,Stable Video Diffusionby Stability AI, and recentlyLumiereby Google.

Lumiere, released just a few weeks ago,claimedto produce better video than its predecessors.

But Sora comes off as more powerful than Lumiere in at least some respects.

Lumieres videos are around 5 seconds long, while Sora makes videos up to 60 seconds.

Lumiere cannot make videos composed of multiple shots, while Sora can.

Both models generate broadly realistic videos, but may suffer from hallucinations.

Lumieres videos may be more easily recognised as AI-generated.

Soras videos look more dynamic, having more interactions between elements.

However, in many of the example videos inconsistencies become apparent on close inspection.

OpenAIstechnical paperabout Sora is titled Video generation models as world simulators.

A complete simulator would need to calculate physical and chemical reactions at the most detailed levels of the universe.

In a world alreadyplagued by disinformation, tools like Sora may make things worse.

Video generators may also enable direct threats to targeted individuals, via deepfakes particularlypornographic ones.

These may have terrible repercussions on the lives of the affected individuals and their families.

Beyond these concerns, there are also questions of copyright and intellectual property.

Large language models and image generators have also been criticised for this reason.

In the United States, agroup of famous authors have sued OpenAIover a potential misuse of their materials.

It is not the first time in recent memory that technology has run ahead of the law.

Also tagged with