Google announced the private preview of its video generation model, Veo, for Google Cloud customers using Vertex AI. The announcement allows companies like Quora and Mondelez International to leverage this AI technology for creative and marketing purposes.
Veo, unveiled in April, can generate high-definition videos, including 1080p clips up to six seconds long in various frame rates. Users can create videos by providing images along with prompts, allowing the model to capture different visual and cinematic styles. Warren Barkley, senior director of product management at Google Cloud, emphasized that the long wait for the API was due to enhancing the model for enterprise readiness.
The model excels in generating specific effects such as explosions and understands basic physics concepts. It can also perform masked editing, allowing users to modify specific regions within a video. Despite its capabilities, Veo displays inconsistencies typical of current generative AI, such as disappearing objects and unrealistic physics behavior, leading to potential limitations in its usage.
Veo and Imagen 3: What they offer
Google introduced Veo alongside Imagen 3, a generative image model designed to produce the highest quality images from text prompts. Both models will be available to Vertex AI customers, with Imagen 3 set to become widely accessible next week. Companies that have begun utilizing these tools include Oreo and Cadbury, emphasizing their commercial applications.
Prompt of the below video: A lone cowboy rides his horse across an open plain at beautiful sunset, soft light, warm colors.
(Video: Google)
Prompt of the below video: An aerial shot of a lighthouse standing tall on a rocky cliff, its beacon cutting through the early dawn, waves crash against the rocks below
(Video: Google)
Barkley noted that generative AI is driving business transformation, with 86% of enterprises reporting revenue growth. He stated that Google is committed to advancing generative AI technology, evident in their latest offerings with Veo and Imagen 3.
To address potential misuse, both models will incorporate safeguards to prevent harmful content generation. Additionally, all outputs will feature digital watermarks, including Google DeepMind’s SynthID, which embeds invisible markers to combat misinformation and misattribution. Google maintains that it does not use customer data for training its models, focusing instead on publicly available datasets.
Enhancing enterprise functionality has been a priority since Veo’s initial announcement. The model was trained using a vast array of high-quality video samples, though details about specific training data sources remain undisclosed. Barkley acknowledged that while some data may include YouTube content, it operates under Google’s agreements with content creators.
Google’s proactive approach also addresses intellectual property risks associated with generative AI. Veo is set to offer prompt-level filters to block violent and explicit content, and Barkley indicated that Veo outputs will come with an indemnity policy to protect users against copyright infringement claims.
As Veo gradually integrates into more Google products, the model was introduced in trials through Google Labs in May and announced as part of YouTube Shorts in September. However, Google has faced competition in the generative AI space from companies like OpenAI and Adobe, which have quickly secured partnerships with various studios and creative agencies.
Featured image credit: Google DeepMind