Google AI – Gemini Live Agent Challenge

Cauliform AI

Cauliform AI

Fill out any Google Form with your voice. Paste a form URL, and a Gemini-powered voice agent walks through every question conversationally — then submits answers back to the original Google Form.

Demo

How It Works

Four steps. Zero typing.

01

Paste a Form URL

Drop any Google Form link. Cauliform fetches and parses the HTML to extract every question, option, and required flag.

02

Gemini Builds a Prompt

The system auto-generates a Gemini Live prompt that understands your form’s structure — questions, types, validation rules, everything.

03

Talk Through It

A real-time voice agent interviews you question by question — in the browser or over a phone call via Twilio.

04

Auto-Submit

When done, an AI browser agent fills out and submits the original Google Form. The form owner’s workflow stays untouched.

Technology

Built on Google Cloud. A multi-service system: Next.js + Gemini Live + Twilio + Cloud Run + Artifact Registry.

AI & Voice

  • Gemini Live API
  • gemini-2.5-flash-native-audio
  • WebSocket streaming
  • Real-time transcripts

Cloud Infrastructure

  • Google Cloud Run
  • Artifact Registry
  • Cloud Build
  • IAM & Service Accounts

Frontend

  • Next.js (App Router)
  • React + TypeScript
  • Tailwind CSS
  • Custom voice console

Telephony

  • Twilio Programmable Voice
  • TwiML webhooks
  • Outbound calling
  • Status callbacks

Architecture Flow

Browser UI (Landing + Console)

Cloud Run (Next.js API Routes)

Gemini Live (WebSocket Audio)

Twilio (Voice Calls)

Google Forms (Parse & Submit)

The Team

Two students. One sprint. A two-person studio that shipped a real multi-service system on Google Cloud.

Chinat Yu

Stanford University

Backend · Cloud Run · Twilio · IAM

  • Google Cloud infrastructure & IAM
  • Twilio voice integration
  • Docker & Artifact Registry
  • Cloud Build CI/CD pipeline

Preston

Diablo Valley College

Frontend · UX · Gemini Live Integration

  • Landing page & dashboard UI
  • Voice console (/test) experience
  • Gemini Live WebSocket hook
  • Cauliform brand & design system

Challenges We Hit

IAM & Service Accounts

Had to grant Storage Admin, Artifact Registry Admin, and Logs Writer to Cloud Build’s compute service account.

WebSocket Auth with Gemini

Agent kept closing with code 1008 (‘unregistered callers’) — fixed by wiring /api/gemini-token to fetch the key before opening the WebSocket.

Region & Image Confusion

Pushed to gcr.io in us-central1 while deploying in us-west1. Moving to Artifact Registry in us-west1 fixed ‘image not found’ errors.

Twilio + Cloud Run URLs

Removed all localhost refs, computed correct base URLs, and ensured TwiML responses had the right Content-Type: text/xml.

What We're Proud Of

  • • Shipped a real multi-service system as a 2-person team
  • • Test Console with form parsing, live transcript, debug logs, and AI form submission
  • • UI that feels like a product — clean landing page, dashboard with smooth animations
  • • End-to-end: voice → parse → converse → submit, all on Google Cloud

What We Learned

  • • IAM, service accounts, and regions matter as much as application code
  • • Designing for Gemini Live = streams, events, WebSockets — not REST
  • • With two people, ruthless prioritization beats feature completeness

Roadmap

What's Next

Production Reliability

Retries, per-form throttling, dashboards for failed calls

Analytics & Transcripts

Completion rates, drop-off questions, average call length — stored in Firestore or BigQuery

Multi-Language

Run the same form across different languages for educators and global teams

One-Click Deploy

Paste form → connect Twilio → deploy in two clicks. Designed for non-technical users.

Hackathon

Category: Live Agents — Real-time voice interaction using Gemini Live API

Built for the Gemini Live Agent Challenge 2026, focusing on breaking the “text box” paradigm with immersive, real-time voice experiences.

Cauliform AI — Google AI Gemini Live Agent Challenge 2026