Google AI – Gemini Live Agent Challenge

Fill out any Google Form with your voice. Paste a form URL, and a Gemini-powered voice agent walks through every question conversationally — then submits answers back to the original Google Form.
Four steps. Zero typing.
Paste a Form URL
Drop any Google Form link. Cauliform fetches and parses the HTML to extract every question, option, and required flag.
Gemini Builds a Prompt
The system auto-generates a Gemini Live prompt that understands your form’s structure — questions, types, validation rules, everything.
Talk Through It
A real-time voice agent interviews you question by question — in the browser or over a phone call via Twilio.
Auto-Submit
When done, an AI browser agent fills out and submits the original Google Form. The form owner’s workflow stays untouched.
Built on Google Cloud. A multi-service system: Next.js + Gemini Live + Twilio + Cloud Run + Artifact Registry.
AI & Voice
Cloud Infrastructure
Frontend
Telephony
Browser UI (Landing + Console)
→ Cloud Run (Next.js API Routes)
→ Gemini Live (WebSocket Audio)
→ Twilio (Voice Calls)
→ Google Forms (Parse & Submit)
Two students. One sprint. A two-person studio that shipped a real multi-service system on Google Cloud.
Chinat Yu
Stanford University
Backend · Cloud Run · Twilio · IAM
Preston
Diablo Valley College
Frontend · UX · Gemini Live Integration
IAM & Service Accounts
Had to grant Storage Admin, Artifact Registry Admin, and Logs Writer to Cloud Build’s compute service account.
WebSocket Auth with Gemini
Agent kept closing with code 1008 (‘unregistered callers’) — fixed by wiring /api/gemini-token to fetch the key before opening the WebSocket.
Region & Image Confusion
Pushed to gcr.io in us-central1 while deploying in us-west1. Moving to Artifact Registry in us-west1 fixed ‘image not found’ errors.
Twilio + Cloud Run URLs
Removed all localhost refs, computed correct base URLs, and ensured TwiML responses had the right Content-Type: text/xml.
What's Next
Production Reliability
Retries, per-form throttling, dashboards for failed calls
Analytics & Transcripts
Completion rates, drop-off questions, average call length — stored in Firestore or BigQuery
Multi-Language
Run the same form across different languages for educators and global teams
One-Click Deploy
Paste form → connect Twilio → deploy in two clicks. Designed for non-technical users.
Category: Live Agents — Real-time voice interaction using Gemini Live API
Built for the Gemini Live Agent Challenge 2026, focusing on breaking the “text box” paradigm with immersive, real-time voice experiences.