給你 30s,介紹一下今天你的 OOTD 吧。
一向溫文爾雅、只穿基本款的蘋果 CEO Tim Cook,在他的「?jìng)€(gè)人 ID 視頻」里,穿上了大號(hào)羽絨服,戴著鑲鉆牙套,對(duì)著鏡頭作出了最狠的 Gangsta 匪幫姿勢(shì)。
最神來之筆的是,他像掏槍一樣掏出了一個(gè)……德州儀器計(jì)算器。
![]()
▲視頻https://x.com/ReflctWillie/status/1997819640874205685
很多人看到這個(gè)視頻都欲罷不能,一鏡到底的展示也太過癮了,一遍遍根本停不下來。視頻作者把好萊塢大片級(jí)別的運(yùn)鏡語言,套用在一個(gè)荒誕的內(nèi)容上。形式的高級(jí)感和內(nèi)容的滑稽感,讓這個(gè) AI 視頻沒有那些普遍存在的廉價(jià)特點(diǎn),很快在社交媒體上火起來。
立馬就又有了馬斯克的版本。
![]()
▲ 視頻https://x.com/VibeMarketer_/status/1999227084250448083
作者很細(xì)心的給出了完整的制作流程,通過使用底片印樣方式的提示詞(Contact Sheet prompting),來獲取到一套 6 張圖片,背景一致、人物表情和服裝一致,但是動(dòng)作不同的照片組合。
![]()
▲ 3×2 的膠片印樣
所謂印樣,是最早在膠片時(shí)代,攝影里使用的一種縮略圖版的照片索引頁;現(xiàn)在把這樣的概念用在 Nano Banana Pro 里,就是充分利用它的一致性能力,一次性生成一系列風(fēng)格不同、角度不同的視頻截圖,然后再通過首尾幀來生成視頻。
Nano Banana Pro 最多能一次性生成,包含 9 個(gè)以上關(guān)鍵幀的完整印樣,每一幀圖片都保持了出色的角色、細(xì)節(jié)和敘事一致性。即便是分別生成,Nano Banana Pro 也能根據(jù)上傳的參考圖片,自動(dòng)填補(bǔ)圖片內(nèi)容,確保敘事一致性。

▲ 首尾幀視頻生成,提示詞:一鏡到底的拍攝,攝像機(jī)平穩(wěn)且緩慢地推進(jìn),聚焦在人物的眼鏡上,同時(shí)始終將主體保持在畫面中。主體的動(dòng)作極小且謹(jǐn)慎。
有了圖片之后,我們就可以通過首尾幀轉(zhuǎn)視頻的方式,將這幾張圖片整合起來,可靈、Veo 3.1、Hailuo、剪映等視頻生成模型和工具,都可以輕松做到。
值得注意的是,像 Sora 2 目前是不支持上傳這種有真實(shí)人臉的圖片,馬斯克的 Grok Imagine 也僅支持首幀轉(zhuǎn)視頻,綜合下來,我們還是推薦使用 Google Veo 3.1、剪映里的即夢(mèng)、還有快手可靈來完成。

▲Grok 圖片轉(zhuǎn)視頻,默認(rèn)生成的內(nèi)容,不明所以
在這位視頻博主給出的指南里,他使用了 Nano Banana Pro 和可靈來完成,并且它開發(fā)了一整套工具,讓我們可以自由地實(shí)現(xiàn)各種人物的替換。
![]()
▲ 視頻https://x.com/ReflctWillie/status/1998720751806066916
根據(jù)他分享的工作流,由于這個(gè)視頻和庫(kù)克那個(gè)基本類似,所以它只需要修改輸入的三張圖片,以及做一些細(xì)微的調(diào)整。例如從口袋里掏出來的是 GAME BOY 游戲機(jī),還有更符合這個(gè)人物特點(diǎn)的元素,庫(kù)克是鑲嵌著蘋果股票代碼 AAPL 的大金牙,美聯(lián)儲(chǔ)的主席鮑威爾則是戴上了 FED 的金戒指。
![]()
▲項(xiàng)目地址:https://github.com/shrimbly/node-banana
目前他把這個(gè)項(xiàng)目放在了知名開源平臺(tái) GitHub 上,如果你喜歡自己折騰的話,把項(xiàng)目下載到本地,輸入自己的 Gemini API,也可以直接套用這個(gè)流程。
我們也嘗試了這個(gè)自動(dòng)化的項(xiàng)目,生成了幾張圖片,相比較在 Gemini 網(wǎng)頁或 App 內(nèi)生成,確實(shí)能方便不少。我們不需要反復(fù)的上傳圖片,而是可以直接選擇需要使用的圖片,直接修改提示詞,將整個(gè)操作流水線化。
![]()
![]()
不過,沒有 API 也沒關(guān)系,下面跟著我們的詳細(xì)步驟,就用 Gemini 網(wǎng)頁版一樣能做到。
找一張自己的照片,喜歡的潮牌衣服,還有酷炫的眼睛。我們這里用才情高絕、生性孤傲、多愁善感的林妹妹來舉例,看看她的 OOTD 時(shí)尚大片會(huì)是怎么樣。
這里我們直接用 Nano Banana Pro 生成了一張林黛玉的照片。
![]()
▲提示詞:Subject: A hyper-realistic high-fashion portrait of Lin Daiyu from Dream of the Red Chamber. She has a fragile, melancholic beauty, pale skin, and her signature “knitted eyebrows” (frowning slightly). She looks distinctively sorrowful and intellectual. Attire: Wearing exquisite, high-end traditional Qing Dynasty couture (Hanfu style). The fabric is layered translucent silk and organza in pale bamboo-green and moon-white. Intricate embroidery of falling petals. She wears a jade hairpin. Setting: Inside a modern, minimalist professional photography studio. A solid dark grey or textured canvas backdrop. Lighting & Camera: Cinematic studio lighting, Rembrandt lighting to accentuate her cheekbones and mood. Softbox lighting, sharp focus, shot on Hasselblad X2D, 85mm lens. Deep depth of field. Style: Vogue China editorial, ethereal, elegant, sorrowful, oriental aesthetics, avant-garde fashion photography, ultra-detailed texture. 16:9, 4K.
得到角色照片之后,眼鏡和外套圖片是可選的,如果沒有上傳,Nano Banana Pro 會(huì)自動(dòng)生成對(duì)應(yīng)的潮牌外套和眼鏡。
![]()
我們從網(wǎng)上找了一件潮牌夾克外套讓她穿上,然后在默認(rèn)的提示詞里面,增加了一些發(fā)型控制、妝造和瞧不起這些世俗之物的輕蔑表情等。
默認(rèn)提示詞:Show me a high fashion photoshoot image of the model wearing the oversized jacket and glasses, the image should show the a full body shot of the subject. The model is looking past the camera slightly bored expression and eyebrows raised. They have one hand raised with two fingers tapping the side of the glasses. The setting is a studio environment with a blue background. The model is wearing fashionable, dark grey baggy cotton pants. The jacket is extremely, almost comically oversized on the model. The image is from a low angle looking up at the subject. The image is shot on fuji velvia film on a 55mm prime lens with a hard flash, the light is concentrated on the subject and fades slightly toward the edges of the frame. The image is over exposed showing significant film grain and is oversaturated. The skin appears shiny (almost oily), and there are harsh white reflections on the glasses frames.
![]()
下一步就是生成所謂 Contact Sheet,輸入我們之前得到的外套+眼鏡的照片,再輸入下面的提示詞,我們就能得到一個(gè),人物一致性的多角度分鏡。
提示詞: Analyze the input image and silently inventory all fashion-critical details: the subject(s), exact wardrobe pieces, materials, colors, textures, accessories, hair, makeup, body proportions, environment, set geometry, light direction, and shadow quality. All wardrobe, styling, hair, makeup, lighting, environment, and color grade must remain 100% unchanged across all frames. Do not add or remove anything. Do not reinterpret materials or colors. Do not output any reasoning. Your visible output must be: One 2×3 contact sheet image (6 frames). Then a keyframe breakdown for each frame. Each frame must represent a resting point after a dramatic camera move — only describe the final camera position and what the subject is doing, never the motion itself. The six frames must be spatially dynamic, non-linear, and visually distinct. Required 6-frame Shot List 1. High-Fashion Beauty Portrait (Close, Editorial, Intimate) Camera positioned very close to the subject’s face, slightly above or slightly below eye level, using an elegant offset angle that enhances bone structure and highlights key wardrobe elements near the neckline. Shallow depth of field, flawless texture rendering, and a sculptural fashion-forward composition. 2. High-Angle Three-Quarter frame Camera positioned overhead but off-center, capturing the subject from a diagonal downward angle. This frame should create strong shape abstraction and reveal wardrobe details from above. 3. Low-Angle Oblique Full-Body frame Camera positioned low to the ground and angled obliquely toward the subject. This elongates the silhouette, emphasizes footwear, and creates a dramatic perspective distinct from frames 1 and 2. 4. Side-On Compression frame (Long Lens) Camera placed far to one side of the subject, using a tighter focal length to compress space. The subject appears in clean profile or near-profile, showcasing garment structure in a flattened, editorial manner. 5. Intimate Close Portrait From an Unexpected Height Camera positioned very close to the subject’s face (or upper torso) but slightly above or below eye level. The angle should feel fashion-editorial, not conventional — offset, elegant, and expressive. 6. Extreme Detail frame From a Non-Intuitive Angle Camera positioned extremely close to a wardrobe detail, accessory, or texture, but from an unusual spatial direction (e.g., from below, from behind, from the side of a neckline). This must be a striking, abstract, editorial detail frame. Continuity & Technical Requirements Maintain perfect wardrobe fidelity in every frame: exact garment type, silhouette, material, color, texture, stitching, accessories, closures, jewelry, shoes, hair, and makeup. Environment, textures, and lighting must remain consistent. Depth of field shifts naturally with focal length (deep for distant shots, shallow for close/detail shots). Photoreal textures and physically plausible light behavior required. frames must feel like different camera placements within the same scene, not different scenes. All keyframes must be the exact same aspect ratio, and exactly 6 keyframes should be output. Maintain the exact visual style in all keyframes, where the image is shot on fuji velvia film with a hard flash, the light is concentrated on the subject and fades slightly toward the edges of the frame. The image is over exposed showing significant film grain and is oversaturated. The skin appears shiny (almost oily), and there are harsh white reflections on the glasses frames. Output Format A) 2×3 Contact Sheet Image (Mandatory)
得到六宮格的圖片之后,我們需要使用下面的提示詞,依次提取出這六張圖片。
提示詞:Review the grid of six images. I want you to isolate and upscale the image in the first/second/third column of the first/second row of images. Do not change the pose or any details of the model. only output the single image from the six image grid.
![]()
![]()
![]()
![]()
![]()
![]()
其實(shí) Nano Banana Pro 有能力直接生成九宮格的圖片,不過為了保持固定 3:2 的橫寬比,六宮格能更好的分離出所有圖片,我們這里全部使用 16:9 的大小,以及 4K 畫質(zhì)。
有了這 6 張圖片,我們還可以腦洞大開生成更多的關(guān)鍵幀圖片,例如原視頻中,讓庫(kù)克展示他的金牙、從口袋里掏出一個(gè)古早的設(shè)備。
例如我們從網(wǎng)上找了一張手鐲的圖片,讓林黛玉展示他的玉手鐲,而不是大金表。
![]()
▲圖 7|輸入:圖 3+圖 5+玉手鐲照片,以及提示詞:Show me a wide angle close up of the model.The model is holding one wrist vertically in front of her, The opposite hand is gently pulling down the voluminous sleeve of her clothes robe to display a translucent emerald jade bangle. The hand that is pulling down the sleeve has a silver fashion ring shaped like a fallen flower petal on the last two digits of her hand encrusted into the front face.
如果你想保持這種街頭的匪幫風(fēng)格,可以直接使用默認(rèn)的提示詞,找到一個(gè)大金表的圖片,然后輸入下面的內(nèi)容。
默認(rèn)提示詞:Show me a wide angle close up of the model.The model is holding one wrist vertically in front of him, the opposite hand is pulling down the sleeve of the hoodie to display the watch. The hand that is pulling down the sleeve has a two finger ring on the last two digits of his hand with the letters ‘LOVE’ encrusted into the front face.
此外,鞋子也換上了帶有刺繡的潮牌高幫,既有古代繡花鞋的緞面、花朵刺繡,底下又是那種鋸齒狀的黑色橡膠厚底。
![]()
▲圖 8|輸入圖 7 + 圖 3 +鞋子照片,提示詞:Show me a wide angle worms eye view of the model standing, her right foot is extended in front of her, showing she is wearing the shoes in the reference image. Maintain the setting perfectly, include the finger ring on the models hand, and have her foot angled slightly to the side to highlight the detailing of the shoes
最后是從口袋里,掏出了一盒人參養(yǎng)榮丸,這是一個(gè)靠著藥物維持生命的賽博朋克少女。
![]()
▲圖9|輸入 圖 7+圖 8 + 藥盒照片,提示詞:Tight shot of the model reaching into the side of the kangaroo pouch of the hoodie and partially showing the box of pills.
這里只需要修改 showing the box of pills,把 showing(展示)后面的內(nèi)容,更換成你希望從口袋里拿出來的物品即可。
得到了全部的關(guān)鍵幀圖片,接下來我們就是把這些圖片串聯(lián)起來,制作出一個(gè)看起來像是一鏡到底的酷炫視頻。圖片轉(zhuǎn)視頻也不是完全不需要提示詞,想要得到原視頻一樣的節(jié)奏控制,盡量采用流暢的動(dòng)作和最小的模特移動(dòng),是減少抽卡的重要指令。
博主提到,可以在提示詞里面輸入,像是「鏡頭緩慢而平穩(wěn)地圍繞眼鏡旋轉(zhuǎn),同時(shí)進(jìn)行變焦。拍攝對(duì)象幾乎一動(dòng)不動(dòng),動(dòng)作極其沉穩(wěn)而深思熟慮。」
像是圖 8 和圖 9 之間的轉(zhuǎn)換,我們?cè)谔崾驹~里面,就增加了腿慢慢放下,鏡頭垂直上升的文字。

▲Google Veo 3.1 生成|提示詞:Camera Movement (Vertical Scan):
A continuous, seamless vertical crane shot moving upwards. The camera starts low, focused tightly on the embroidered high-top sneakers, then smoothly tilts up and glides along the texture of the grey cargo pants. As the camera rises to waist level, it pushes in (dolly in) towards the green satin jacket.
Subject Action (The Flow):
Start: The subject’s leg (showing the shoe) slowly lowers to a standing position as the camera moves up.
Transition: The subject stands confidently. The hand wearing the butterfly ring moves naturally into the pocket.
End: The hand pulls out a yellow and white medicine box (“Renshen Yangrong Wan”). The focus racks sharply onto the text on the box.
Atmosphere & Consistency:
High-fashion streetwear aesthetic. Hard flash lighting with a blue studio background. Maintain strict consistency of the green sukajan jacket embroidery and the jade bangle. The transition is liquid-smooth, feeling like a single, planned camera move.
你可能會(huì)好奇,為什么提示詞里面說動(dòng)作要慢,最后出來的預(yù)覽視頻,給人感覺確實(shí)干凈利落。其實(shí)是用了這位視頻博主的另一個(gè)工具,不得不佩服現(xiàn)在 AI 視頻博主的創(chuàng)意和能力,不僅有好的點(diǎn)子,還能開發(fā)好用的工具。
![]()
▲地址:https://easypeasyease.vercel.app/,這個(gè)工具能對(duì)多個(gè)視頻進(jìn)行拼接、同時(shí)應(yīng)用緩動(dòng)曲線和添加音頻;目前是免費(fèi)使用。
通過 EasyPeaseEase 這個(gè)工具,我們的視頻能夠選擇壓縮到 0.5s-6s 之間,之前通過視頻生成模型得到的緩慢動(dòng)作,經(jīng)過緩動(dòng)曲線,讓視頻從開始到結(jié)束,加速或減速過程更平滑、自然,更能模擬真實(shí)世界的物理效果,從而讓加速后的視頻,看起來更生動(dòng)、有質(zhì)感,而不是生硬的勻速運(yùn)動(dòng)。
最后把這些視頻都拼接起來,我們就得到了林妹妹的今日 OOTD 視頻展示。
![]()
首尾幀轉(zhuǎn)視頻的提示詞,如果你擔(dān)心會(huì)需要頻繁抽卡,直接上傳首尾幀圖片,問 Gemini 是很有效的方法。
![]()
Contact Sheet prompt,印樣表提示詞其實(shí)是 Nano Banana Pro 非常有意思的一個(gè)玩法。先利用 Nano Banana Pro 強(qiáng)大的圖片生成和世界知識(shí)理解能力,生成一張九宮格的視頻關(guān)鍵幀集合,再逐行逐列提取對(duì)應(yīng)的關(guān)鍵幀。
![]()
▲視頻https://x.com/techhalla/status/1996650389228355819
最后再匯總一波 Nano Banana Pro 的官方使用途徑吧。
ai.studio:Google 官方 AI 工作室,需要綁定支付方式,能通過下拉選擇不同的分辨率和圖片大小,無需提示詞控制,按次收費(fèi)。gemini.google.com:Gemini 網(wǎng)頁版和手機(jī) App,免費(fèi)生成,有次數(shù)上限,達(dá)到上限后會(huì)自動(dòng)使用 Nano Banana 模型,最大的特點(diǎn)是不能再控制生成圖片的寬高比。flow.google:Google 的視頻生成平臺(tái),可以選擇生成圖片,不消耗積分,免費(fèi)生成。
文中視頻可點(diǎn)擊該鏈接前往查看:https://mp.weixin.qq.com/s/s_EIYB0qqcWv29zMM1g-7Q





京公網(wǎng)安備 11011402013531號(hào)