How We Built SwiftList: A 6-Agent AI Pipeline Powered by Google Gemini

Every e-commerce seller knows the problem. You have a great product and a mediocre photo. Getting that photo marketplace-ready — clean background, professional styling, lifestyle context — used to take a designer, a studio, or hours in Photoshop. We built SwiftList to eliminate that gap entirely. This is the story of how we did it, and why Google’s AI infrastructure was central to making it work.

The Problem We Were Solving

Marketplace sellers on Etsy, eBay, Amazon, and Poshmark live and die by their product photography. A listing with a clean white background converts better. A lifestyle shot — product placed in context — converts even better. But for a solo jewelry maker or a fashion reseller running 200 SKUs, professional photography isn’t scalable.

We needed AI that could handle the full spectrum: remove backgrounds from complex items like gemstone rings and sheer fabric, and then generate contextually accurate lifestyle scenes that look like they were shot in a real studio. That’s a hard problem. The AI has to understand what a product is before it can treat it correctly.

Why Google Gemini

We evaluated multiple approaches before settling on Google Gemini as the core intelligence layer. Two models do the heavy lifting.

Gemini 2.5 Flash handles vision analysis — the “understanding” step. When a user uploads a product photo, Gemini Flash analyzes it multimodally: classifying the product type, extracting style DNA from reference images, scoring image quality, and determining which specialized processing pipeline to route it through. At roughly $0.001 per call, it’s fast and cost-effective enough to run on every single job.

Gemini 3 Imagen handles scene generation — the “creation” step. Once the background is removed and the product is isolated, Imagen generates the lifestyle scenes: flat lays, studio setups, in-context product shots placed in environments that match the seller’s brand aesthetic. At ~$0.004 per image, the economics work cleanly within our credit model.

The combination — Flash for reasoning, Imagen for generation — gives us an AI stack that actually understands context before acting on it. That distinction matters more than it sounds.

The 6-Agent Pipeline

Background removal sounds simple. It isn’t — not when you’re handling jewelry with light-refracting gemstones, sheer fabrics, and reflective metal surfaces.

We built a LangGraph-inspired 6-agent DAG (directed acyclic graph) pipeline:

Preprocess — Format detection, HEIC conversion, metadata extraction
Segment — Background removal via Replicate RMBG or fal.ai Bria
Specialist — Product-type routing: jewelry engine (gemstone/metal detection) or fabric engine (8-agent texture and print preservation)
Enhance — Edge refinement via CleanEdge™, color correction, shadow generation
Validate — Multi-metric quality scoring across edge accuracy, segmentation quality, and artifact detection
Postprocess — Format conversion, watermarking, ZIP packaging

The quality threshold is 85%. If a result scores below that, the pipeline retries with adjusted parameters before returning to the user. This is what separates a production-grade AI pipeline from a simple API call — the system knows when its own output isn’t good enough.

For jewelry specifically — our GemPerfect™ engine — the specialist agent performs gemstone and metal surface detection before the enhance step, preserving reflective properties that generic background removal destroys. For fabric and fashion, our ThreadLogic™ engine runs an 8-agent texture and print analysis to maintain pattern integrity and handle invisible mannequin scenarios.

The Infrastructure Stack

Beyond the AI models, the infrastructure is intentionally lean. We run on Railway for the application and BullMQ workers, Supabase for PostgreSQL and storage, and Cloudflare for CDN, WAF, and DDoS protection. Total monthly infrastructure cost: $10–75 at current scale.

The job queue architecture means image processing is asynchronous — users submit a job, it’s queued in BullMQ backed by Redis, processed by TypeScript workers that call the AI APIs directly, and results are stored in Supabase. No middleware layer, no orchestration tax. Average margin across all job types: 93.2%.

What We Learned Building This

Model specialization beats model generalization. We use Claude for job classification and monitoring, Gemini Flash for vision analysis, Gemini Imagen for generation, and Replicate/fal.ai for background segmentation. Each model does what it’s best at. Trying to route everything through one model would have cost more and performed worse.

The routing logic is as important as the models. Getting Gemini Flash to correctly classify a product type on the first call — so the right specialist agent fires downstream — is where the real engineering happens. The models are capable. Orchestrating them intelligently is the work.

Quality gates change the user experience. The 85% quality threshold with conditional retry is invisible to users but dramatically changes what they receive. It’s the difference between an AI tool and an AI product.

What’s Live Now

SwiftList launched at swiftlist.app in March 2026. The platform supports six marketplace export formats (Etsy, Shopify, Amazon, eBay, Poshmark, Facebook), a preset marketplace where sellers can create and monetize processing presets, and a credit economy where 1 credit = $0.05 with a free Explorer tier to start.

The Google AI infrastructure is what makes the product work at the quality level sellers actually need. We’re continuing to push the pipeline — more specialist engines, better scene generation, and tighter feedback loops between the validation agent and the generation step.

If you’re building an AI product in a vertical where the subject matter is genuinely complex, the lesson we’d offer is this: invest in understanding the domain before you invest in the model. Google Gemini gave us the vision capability. Understanding jewelry, fabric, and marketplace photography gave us the product.