Table of Contents
Introducing the Alibaba AI Duo
The era of using artificial intelligence to generate a disjointed string of text or a solitary, generic image is rapidly coming to a close. Today, the focus has shifted toward building complete, functional prototypes almost instantly.
With the dual release of Qwen3.5-Omni and Wan2.7-Image, Alibaba has introduced an all-encompassing creative toolkit designed to dramatically accelerate how developers and creators bring their ideas to life
Coding by Demonstration With qwen3.5-Omni
At the heart of this prototyping revolution is Qwen3.5-Omni, Alibaba’s most advanced omni-modal large language model to date. While its ability to process text, audio, images, and video is impressive, its most transformative feature for developers is its “Audio-Visual Vibe Coding” capability.
By leveraging native multimodal scaling, Qwen3.5-Omni allows users to essentially code by demonstration. Instead of writing boilerplate code from scratch, a developer or designer can simply showcase a handwritten product sketch via their camera and verbally describe its functionality.
In response, the model autonomously generates a functional user interface (UI) for websites, apps, and mini-games. This intuitive “show-and-tell” process completely transforms the early stages of software creation, dramatically lowering the barrier to prototyping and development.
Populating the Prototype With wan2.7 -Image
Of course, a functional UI code is only half the battle; it needs stunning, brand-consistent visual assets to truly become a prototype. This is where Wan2.7-Image steps in to handle the visual design process.
Historically, AI-generated images have suffered from a generic, standardised appearance and highly unpredictable colour rendering, leading to a frustrating trial-and-error process for professional designers.
Wan2.7-Image transcends these shortcomings by giving creators unprecedented control over their final outputs. Through a new “colour palette” feature, users can input specific colour codes and proportions directly into their prompts to replicate complex artistic styles or lock in exact corporate brand colours.
This ensures that every image perfectly matches a company’s strict visual guidelines.
Furthermore, the model allows creators to fine-tune specific physical attributes, such as eye shape and bone structure, to generate truly unique and lifelike characters tailored to a specific project.

Because Wan2.7-Image can process multiple images and generate up to 12 visual assets at once, creators can effortlessly build cohesive e-commerce campaigns, architectural renderings, or storyboards to populate their newly coded applications.
Revolutionising the Creative Workflow
By pairing the rapid, multi-modal coding abilities of Qwen3.5-Omni with the precise, professional-grade visual generation of Wan2.7-Image, Alibaba is offering a true “prompt-to-prototype” ecosystem. Creators no longer have to stitch together mismatched AI outputs; instead, they have a streamlined path from a simple handwritten sketch directly to a polished, fully functional digital product.