It’s impossible to scroll through social media or attend any technology conference without encountering the dramatic shift happening in video production. Text-to-video AI has arrived, and the titans of tech are racing to bring their versions to market. At the forefront of this revolution are two powerhouse tools–OpenAI’s Sora (released in the UK and EU just this Friday) and Google’s Veo 2—each representing vastly different visions for the future of digital content creation. The implications for industries from fashion to gaming, advertising to independent filmmaking are profound and immediate.
Sora vs Veo 2: Two Visions for AI-Generated Video
Since both tools are relatively new to the market, certainly with UK and EU audiences, I spoke to three different expert users who have had early access to these tools for a number of months to tell me about their experiences with them and to compare and contrast their relative merits and features. My key takeaway is that the battle between Sora and Veo 2 isn’t just about technical specs—it’s a clash of philosophies. One aims to replicate reality, the other to transcend it. These tools represent a pivotal moment where the barriers between imagination and execution are dissolving at an unprecedented rate.
The contrast between Sora and Veo 2 represents more than just competing products—it embodies divergent philosophies about what matters most in creative tools. OpenAI has prioritized user interface and control, while Google has focused on output quality and physics simulation.
“Sora has a huge advantage, because they put a lot of work into the interface and the user interface,” explains David Sheldrick, founder at PS Productions and Sheldrick.ai, who is an early tester of both platforms. “Veo 2, even though the rendering output quality is obviously incredible…Sora itself, when you go on the website, feels way more like a real, sort of refined product.”
This distinction becomes immediately apparent to users encountering both platforms. Sora offers a comprehensive suite of creator-friendly features—timelines, keyframing, and editing capabilities that feel familiar to anyone with video production experience. It prioritizes creative control and workflow integration over raw technical performance.
Leo Kadieff, Gen AI Lead Artist at Wolf Games, a studio pioneering AI-driven gaming experiences, has also had early access to both platforms and describes Veo 2 as “phenomenal, with web access, and API access which enables much more experimental stuff. It’s really the number one tool”. His enthusiasm for Veo 2’s capabilities stems from its exceptional output quality and physics modeling, even if the interface isn’t as polished as Sora’s.
This reflects a key question for creative tools: is it better to provide a familiar, robust interface or to focus on generating the highest quality outputs possible? The answer, as is often the case with emerging technologies, depends entirely on what you’re trying to create.
Technical Strengths: Physics, Consistency and Hallucinations
The real-world performance of these tools reveals their distinct technical approaches. Sora impresses with its cinematic quality and extended duration capabilities, while Veo 2 excels at physics simulation and consistency.
“The image quality is pretty damn good,” notes Sheldrick about Veo 2, while adding that “Sora already has nailed photo realism. It’s got this image fidelity, which is super, super high.” Both platforms are clearly pushing the boundaries of what’s possible, but they handle technical challenges differently.
One particularly revealing area is how each platform deals with the “hallucinations” inherent to AI generation—those moments when the physics or continuity breaks down in unexpected ways.
Kadieff explains the difference vividly: “When Veo 2 hallucinates, it just clips to kind of like a similar set that it has in its memory, but you might lose, like, consistency, or you might get a whole different, weird angle. So, for example, if you make a drone shot flying over a location, and it’s like 10 seconds, it will do five seconds perfectly, and then it’s going to clip to some rainforest”.
Bilawal Sidhu, a creative technologist and AI/VFX creator on YouTube and other platforms, with over a decade of experience, doesn’t mince words about Sora’s limitations: “the physics are completely borked, like, absolutely horrendous”. He explains that while Sora offers longer duration videos (10-15 seconds), its physical simulation often falls short, particularly with human movement and interactions.
Speaking on his YouTube channel, Sidhu declares, “Nothing comes close to what Google Deep Mind has dropped… Veo 2 now speaks cinematographer. You can ask for a low angle tracking shot 18 mm lens and put a bunch of detail in there and it will understand what you mean. You just ask it with terms you already know… I feel like Sora doesn’t really follow your instructions. Sora definitely does pretty well at times, but in general it tends to be really bad at physics.”
Behind every AI video generator lies mountains of training data that shapes what each tool excels at creating. Hypothesising why the physics outputs of Veo 2 are superior in the video outputs, he states, “Google owns YouTube, and so even if you pull out a bunch of the copyrighted stuff, that still leaves a massive corpus compared to what anyone else has to train on.”
The battle for training data supremacy extends beyond quantity to quality and diversity. OpenAI has remained relatively secretive about Sora’s full training dataset, raising questions about potential biases and limitations.
For commercial applications where physical accuracy is non-negotiable, this distinction matters enormously. Video quality and physical realism are essential for products that need to be represented accurately, highlighting why industries with strict visual requirements might lean toward Veo 2 despite its more limited interface.
Sora vs Veo 2: Prompt Control and Generation Quality
By coming out first, Sora had a first-mover advantage of sorts, but it also set the bar for other models to work towards—and then transcend. Sidhu was very impressed when he first saw the outputs: “watching the first Sora video, the underwater diver discovering like a crashed spaceship underwater, if you remember that video, that blew my mind, because I feel like Sora showed us that you could cross this chasm of quality with video content that we just hadn’t seen.”
Explaining more of the positives for Sora, Sidhu adds, “Sora is very powerful. Their user experience is far better than their actual quality. They’ve got this like storyboard editor view, where you can basically lay out prompts on a timeline—you can outline, hey, I want a character to enter, the scene from the left, walk down and sit down on this table over here, and then at this point in time, I want somebody else to walk up and suddenly get their attention.”
The ability to translate text prompts into intended visuals varies significantly between platforms. Veo 2 appears to be winning the battle for prompt adherence—the ability to faithfully translate textual descriptions into corresponding visuals.
“Veo 2 is very good at prompt adherence, you can give very long prompts, and it’ll kind of condition the generation to encapsulate all the things that you asked for,” Sidhu explains, expressing genuine surprise at Veo 2’s capabilities. “Like Runway and Luma, and pretty much anything that you’ve used out there, the hit rate is very bad… for Veo 2, it is by far the best. It’s like, kind of insane, how good it is”.
This predictability and control fundamentally changes the user experience. Rather than treating AI video generation as a slot machine where creators must roll repeatedly hoping for a usable result, Veo 2 provides more consistent, controlled outputs—particularly valuable for commercial applications with specific requirements.
Consistency extends beyond single clips as well. Sidhu notes that “the four clips you get [as an output from Veo 2], you put in a text prompts, as long as you want them to be, and with a very detailed text prompt, you get very close to character consistency too”, allowing for multi-clip productions featuring the same characters and settings without dramatic variations.
Kadieff is also a huge fan of Veo 2’s generation quality: “”Veo 2 has generally been trained on very good, cinematic content. So almost like all the shots you do with it feel super cinematic, and the animation quality is phenomenal.”
Beyond this, the resolution quality of Veo 2’s outputs is also a cause for celebration, as Sidhu states, “this model can natively output 4K. If you used any other video generation tool, Sora, Luma, whatever it is, you end up exporting your clips into some other upscaling tool whether that’s Krea or Topaz, what have you — this model can do 4K natively, that’s amazing.”
Industry Applications: From Fashion to Gaming
Different industries are discovering unique applications for these tools, with their specific requirements guiding platform selection. Fashion brands prize consistency and physical accuracy, while gaming and entertainment often value creative flexibility and surrealism.
“What I’m really excited about is not just the ability, indies are going to be able to rival the outputs of studios, but studios are going to set whole new standards,” says Sidhu. “But then also, these tools are changing the nature of content itself, like we’re moving into this era of just-in-time disposable content.”
For fashion and retail, the ability to quickly generate variations of a single concept represents enormous value. Creating multiple versions of product videos tailored to different markets is now possible without the expense of multiple production shoots.
Meanwhile, gaming and entertainment applications embrace different capabilities. Kadieff describes how AI is transforming creative approaches: “The intersection of art, games and films, is not just about games and films anymore – it’s about hybrid experiences”. This represents a fundamental shift in how interactive media can be conceived and produced.
Sheldrick predicts significant industry adoption this year: “I think this is the year that AI video and AI imagery in general will kind of break into the advertising market and a bit more into commercial space.” He warns that “the companies that have got on board with it, will start to reap the rewards, and the companies that have neglected to take this seriously, will suffer in this year.”
The Human-AI Collaboration Model
Despite these tools’ remarkable capabilities, the most successful implementations combine AI generation with human creativity and oversight. The emerging workflow models suggest letting AI handle repetitive elements while humans focus on the aspects requiring artistic judgment.
As these platforms continue to develop, creative teams are adapting how they work, with new hybrid roles emerging at the intersection of traditional creativity and technical AI expertise.
The learning curve remains steep, but the productivity gains can be substantial once teams develop effective workflows. Kadieff notes how transformative these tools have been: “when I saw transformer-based art, like three, four years ago, I mean, it changed my life. I knew instantly that this is the biggest media transformation of my lifetime”.
Looking Forward: AI Video in 2026 and Beyond
As these platforms continue evolving at breakneck speed, our experts envision transformative developments over the next few years. Specialized models tailored to specific industries, greater customization capabilities, and integration with spatial computing all feature prominently in their predictions.
With Sidhu’s earlier visions of independent creators rivalling the outputs of studios, this democratization of high-quality content creation tools doesn’t mean the end of major studios, but rather a raising of the bar across the entire creative landscape.
Sheldrick remains enthusiastic about the competitive landscape driving innovation: “I’m just most excited to watch these massive, sort of frontier labs just going at it. I’ve enjoyed watching this sort of AI arms race for years now, and it hasn’t got old. It’s still super exciting.”
David Sheldrick has used OpenAI’s Sora tool to create fashion videos
Perhaps the most transformative potential lies in how these tools will reshape our understanding of content itself. As Sidhu explains, “I think content authoring will look almost like a world model, one of the characteristics or attributes of it is like, here’s a scene graph, here are the three scenes that I have. Here are the characters that are within it. Here are the props. Here’s the time of day”. This structured approach would allow content to be personalized and localized at unprecedented scales.
The Democratization of Visual Storytelling
As we look toward the future of AI-generated video, it’s clear that neither Sora nor Veo 2 represents a definitive solution for all creative needs. The choice depends on specific requirements, risk tolerance, and creative objectives.
What’s undeniable is the democratizing effect these tools are having on visual storytelling. “Now we’re coming to a place where everybody, anybody with an incredible imagination, whether they’re in India, China, Pakistan or South Africa, or anywhere else, and access to these tools can tell incredible stories,” Kadieff observes.
Sidhu agrees, noting that “YouTube creators are punching way above their weight class already. And so I think that trend is going to continue, where we’ll see like the Netflix’s of the world look a lot more like YouTube, where more content is going to get greenlit”.
These tools are enabling a new generation of creators to produce content that would have been prohibitively expensive just a few years ago. The traditional barriers to high-quality video production are falling rapidly.
As AI video tools like Sora and Veo 2 continue to evolve and become increasingly accessible, we stand at the beginning of a fundamental shift in how visual stories are told, who gets to tell them, and how they reach their audiences. The tools may be artificial, but the imagination they unlock is profoundly human.