We all know the tale of the first YouTube video, a grainy 19-second clip of co-founder Jawed Karim on the zoo, remarking at the elephants at the back of him. That video was once a pivotal second within the virtual area, and in many ways, this can be a mirrored image, or a minimum of an inverted replicate symbol, of nowadays as we digest the arriving of Veo 3.
Part of Google Gemini, Veo 3 was once unveiled at Google I/O 2025 and is the primary generative video platform that may, with a unmarried urged, generate a video with synced discussion, sound results, and background noises. Most of those 8-second clips arrive in underneath 5 mins after you input the urged.
I’ve been enjoying with Veo 3 for a few days, and for my newest problem, I attempted to return to the start of social video and that YouTube “Me at the Zoo” second. Specifically, I questioned if Veo 3 may just recreate that video.
As I’ve written, the important thing to a just right Veo 3 result is the urged. Without element and construction, Veo 3 has a tendency to make the selections for you, and also you in most cases do not finally end up with what you wish to have. For this experiment, I questioned how I might be able to describe all of the main points I sought after to derive from that quick video and ship them to Veo 3 within the type of a urged. So, naturally, I became to every other AI.
Google Gemini 2.5 Pro isn’t lately in a position to inspecting a URL, however Google AI Mode, the brand-new type of seek this is briefly spreading throughout the United States, is.
Here’s the urged I dropped into Google’s AI Mode:
Google AI Mode nearly right away returned with an in depth description, which I took and dropped into the Gemini Veo 3 urged box.
I did do a little enhancing, most commonly casting off words like “The video appears…” and the overall research on the finish, however differently, I left maximum of it and added this on the best of the urged:
“Let’s make a video based on these details. The output should be 4:3 ratio and look like it was shot on 8MM videotape.”
It took some time for Veo 3 to generate the video (I feel the provider is getting hammered at the moment), and, as it best creates 8-second chunks at a time, it was once incomplete, slicing off the discussion mid-sentence.
Still, the result’s spectacular. I would not say that the primary persona seems the rest like Karim. To be honest, the urged does not describe, as an example, Karim’s haircut, the form of his face, or his deep-set eyes. Google’s AI Mode’s description of his outfit was once additionally most certainly inadequate. I’m certain it might have achieved a greater activity if I had fed it a screenshot of the unique video.
Note to self: You can by no means be offering sufficient element in a generative urged.
8 seconds at a time
The Veo 3 video zoo is nicer than the only Karim visited, and the elephants are a lot additional away, regardless that they’re in movement again there.
Veo 3 were given the movie high quality proper, giving it a pleasant 2005 glance, however no longer the 4:3 facet ratio. It additionally added archaic and pointless labels on the best that fortunately disappear briefly. I understand now I will have to have got rid of the “Title” bit from my urged.
The audio is especially just right. Dialogue syncs neatly with my primary persona and, should you concentrate carefully, you’ll be able to pay attention the background noises, as neatly.
The greatest factor is this was once best part of the transient YouTube video. I sought after a complete sport, so I determined to return in with a miles shorter urged:
Continue with the similar video and upload him having a look again on the elephants after which having a look on the digicam as he is announcing this discussion:
“fronts and that’s that’s cool.” “And that’s pretty much all there is to say.”
Veo 3 complied with the surroundings and primary persona however misplaced one of the most plot, losing the old-school grainy video of the primary generated clip. This implies that once I provide them in combination (as I do above), we lose substantial continuity. It’s like a movie staff time bounce, the place they abruptly were given a significantly better digicam.
I’m additionally just a little pissed off that each one my Veo 3 movies have nonsensical captions. I want to bear in mind to invite Veo 3 to take away, conceal, or put them out of doors the video body.
I take into consideration how exhausting it most certainly was once for Karim to movie, edit, and add that first quick video and the way I simply made necessarily the similar clip with out the will for other people, lighting fixtures, microphones, cameras, or elephants. I did not need to switch photos from tape and even from an iPhone. I simply conjured it out of an set of rules. We have actually stepped throughout the having a look glass, my buddies.
I did be informed one more thing thru this undertaking. As a Google AI Pro member, I’ve two Veo 3 video generations in step with day. That method I will do that once more day after today. Let me know within the feedback what you need me to create.
You may additionally like
Source hyperlink