Gemini 2.5 Flash image model, popularly known as Nano Banana, is the rage on the internet these days with its popular 3D-model trend. Users on social media are using Gemini to turn their real-life images into 3D-style models, reminiscent of the time a few months back when ChatGPT’s GPT-4o model fired up the Ghibli-style craze among the masses.
Gemini Nano Banana is very powerful on its own, with the ability not only to create images from natural-language text but also to make complex edits like changing lighting, background, style, and texture while maintaining visual coherence. However, if you use the model for an extended period, you’ll notice that it can sometimes lower the resolution of the original image or alter other details, and I personally feel it isn’t great at understanding my requests.
This is where you can use other AI chatbots like ChatGPT and Qwen to not only fix the issues in the output created by Gemini but also improve the resolution of the image and remove that pesky Gemini watermark. But before we get to that, let’s take a look at how one would go about making these 3D models of themselves.
You should now have a ready 3D model of yourself; you can ask for further edits from the chatbot if needed.
The most obvious way you can use either ChatGPT or Qwen is to upscale the details in the image provided by Gemini and removing the watermark for the chatbot at the bottom.
To do that, navigate to the Qwen app and tap on Image Edit.
Now give the chatbot this prompt: “Remove the Gemini logo at the bottom right and enhance the resolution while keeping all other elements exactly the same.”
You can also upgrade the resolution of the Gemini output by giving a similar prompt to ChatGPT, but it will refuse to remove the Gemini watermark.
The area where ChatGPT really shines, however, is in creating detailed prompts for Gemini. There were times in the past few days when I was frustrated with the output generated by Gemini, and I instead turned to ChatGPT for help. With the power of GPT-5, I feel that the chatbot is much better at understanding instructions and gives an easy fix to the problem by providing me with a detailed prompt that I can pass on to Gemini.
For instance, here’s a list of some creative prompts that ChatGPT gave me to generate images with Gemini.
“Turn the photo into a cinematic golden-hour portrait on a quiet city street. Warm orange sunlight rim-lighting the hair, subtle bokeh in the background, gentle lens flare.”
“Transform the person into a monumental sandstone statue half-buried in desert dunes, with dramatic golden-hour lighting and wind-blown sand trails.”
“Turn the uploaded image into a photorealistic oil painting on canvas, with visible brushstrokes, rich warm tones, and soft candlelight shadows, framed in an ornate gold frame.”
“Using the uploaded photo for likeness, create a 1/7-scale collectible action figure reimagined in a cyberpunk skin: matte black jacket with discreet glowing circuit trim, soft metallic accents on zips and buckle. Figurine stands on a clear round acrylic base (no text) set on a modern desk. The computer screen behind displays the 3D asset in progress — split view: one pane wireframe, one pane material editor with emissive nodes active, and a render preview. Include a premium collector box with flat, neon-lined 2D illustrations and a stylised cityscape background printed on the panel; box shows printed fold lines and texture. Lighting: cool blue rim, magenta fill, and a warm desk lamp for balance; camera 50 mm, f/2.8, cinematic shallow DOF, crisp specular highlights on metallic paint, subtle film grain, photoreal finish.”
“Using the provided image for face and stance, produce a 1/7-scale collectible that blends modern streetwear with subtle samurai armour elements (textile jacket panels with lacquered arm guards, sheathed katana). Place on a transparent round acrylic base (no text) set on a home workspace. The computer screen displays the modelling process (skeleton/rig view + sculpt passes), and a packaging box features elegant flat 2D character illustrations in three views printed with a soft-touch cardboard look. Lighting: moody city evening via window + soft desk lamp; camera 35 mm, f/2.0, emphasise fabric micro-details and lacquer sheen. Photoreal, editorial composition suited to social thumbnails.”
