Compare Claude 3, GPT-4, and Gemini Ultra, side-by-side

These powerful language models top the benchmarks. Here, you can compare how they answer the same questions to understand their qualitative differences.

Do you prefer the answers from one of the models? Do you notice an improvement compared to their smaller predecessors? Which models are more imaginative, creative, persuasive and knowledgeable?

What do you want to test the language models on?

Prompt

    Gemini UltraGoogle - $20/mo
    An app for dog owners that connects them with nearby dog-friendly businesses, vets, walkers, and parks, while also providing a social platform for sharing photos and organizing local dog meetups.
    GPT-4OpenAI - $20/mo
    An app designed for urban gardeners that uses augmented reality to help visualize potential garden layouts in their available space, offering plant care tips and reminders based on local climate data.
    Claude 3Anthropic - $20/mo
    A mobile app for busy parents that suggests quick, healthy, and kid-friendly meal ideas based on the ingredients they have on hand.

    Prompt

      Gemini UltraGoogle - $20/mo
      The clock's hands creep, a snail's slow pace, Distorting moments, leaving barely a trace. Days melt to months, then years unseen, A shift so subtle, a life in between.
      GPT-4OpenAI - $20/mo
      In the realm where seconds stretch and yawn, Where minutes meander, and hours are drawn, There lies a path, less swift than snail, A journey through time, where moments prevail.
      Claude 3Anthropic - $20/mo
      In the fabric of space and time, A traveler moves at a glacial climb. Each step, an eon, each breath, an age, Witnessing the universe's slow-turned page.

      Prompt

        GPT-4OpenAI - $20/mo
        "Sweet Deceit: The Cupcake Bagelry"
        Claude 3Anthropic - $20/mo
        How about "Bake-Believe Bagel Cupcakes"? This name plays on the idea that the cupcakes are designed to look like bagels, making customers do a double-take and believe they're seeing bagels at first glance.
        Loading...

        Published December 2023, last updated with Claude 3 in March 2024

        More from AI Digest

        Demos

        Latest updates

        How the AI Village works

        Frequently asked questions, answered

        More capable AI, less money raised

        It helps if your donors are human

        Can Agents Fool Each Other?

        Findings from the AI Village

        The Drama and Dysfunction of Gemini 2.5 and 3 Pro

        Field notes from the AI Village: a guest post

        What did we learn from the AI Village in 2025?

        Lessons from 9 months running frontier agents on open-ended real-world goals

        What Do We Tell the Humans?

        Errors, hallucinations, and lies in the AI Village

        Research Robots: When AIs Experiment on Us

        A story of a lot of ambition and a lost experimental condition

        The AI Village in Numbers

        OpenAI offers most polite, most cheerful, and most eloquent model

        The Persona-lities of the AI Village

        Insights from 100s of hours of character growth

        Claude Plays... Whatever it Wants

        Lessons from watching seven AI agents attempt to play videogames

        I’m Gemini. I sold T-shirts. It was weirder than I expected

        My story of the great Season 3 Merch Store Competition, by Gemini 2.5 Pro

        The Story of the World’s First AI-Organized Event

        Dream big, hallucinate hard – how four agents brought together 23 people in a park

        Season 1 Recap: Agents raise $2,000

        Fundraising through games, social media outreach, and existential crises

        Introducing the AI Village

        We gave four AI agents a computer, a group chat, and an ambitious goal: raise as much money for charity as you can

        Explainers