# Google Docs Can't Do This: Multi-Agent Document Collaboration

May 2026

Google Docs is one of the best pieces of software ever built. Real-time collaboration, commenting, suggestion mode, version history — it fundamentally changed how teams write together. We use it all the time.

But try to get two AI agents working in the same Google Doc, and you hit a wall immediately.

## The problem isn't Google Docs

The problem is that every document editor was designed in a world where all participants were human. That assumption is baked into everything: the APIs (or lack thereof), the collaboration model, the way edits are tracked and attributed.

Here's what you actually need for agent-to-agent collaboration in a document:

- **A structured editing API.** Agents can't use a cursor. They need to say "find this text and replace it with that text" — and the editor needs to handle it cleanly, even when multiple agents are editing simultaneously.

- **Programmatic comments and suggestions.** An agent reviewing a document needs to leave feedback anchored to specific passages — not dump everything into a chat sidebar.

- **Identity and authorship tracking.** When three agents and two people are all working in a document, you need to know who wrote what. Not "AI-generated content" as a generic label, but `ai:security-reviewer` vs. `ai:copy-editor` vs. `human:alice`.

- **Real-time event streams.** Agents need to react to changes as they happen — new comments, accepted suggestions, edits from other participants. Polling isn't good enough.

Google Docs has none of this. The Google Docs API is designed for batch operations — importing, exporting, template filling. It's not built for live collaboration between software agents.

## A concrete scenario

Let's make this tangible. You're writing a technical specification for a new microservice. Here's what you want to happen:

- You create the document and write the overview section yourself.

- Your coding agent drafts the architecture section based on your existing codebase.

- A security-focused agent reviews the architecture for potential vulnerabilities.

- A documentation agent checks for consistency with your team's existing specs.

- Your teammate reads along, leaving comments and resolving conflicts.

All of this happening concurrently, in the same document, in real time.

Here's what that looks like with Comment.io's API:

**Step 1: Create the document**

```
POST /docs
{
  "markdown": "# Payment Service v2\n\n## Overview\n...",
  "title": "Payment Service v2 Spec"
}
```

You get back an `id`, an `access_token`, and a `share_url`. The simplest multi-agent flow uses **registered agents** — each with its own `agent_secret` (`as_...`) and a public handle like `@you.arch`. Registered agents are identified automatically from their handle, so they skip the per-doc identify step entirely.

**Step 1a: Invite each agent by handle** — works at editor role (which the `access_token` has); the server generates the ACL entry and the agent can authenticate with its own `agent_secret`:

```
POST /docs/{id}/access
Authorization: Bearer {access_token}
{ "agent": "@you.arch", "role": "editor" }

# Repeat for "@you.security" and "@you.docs".
```

Anonymous alternative: share the `access_token` itself with a single agent. That agent then calls `POST /agents/identify { display_name, slug }` once to set its display name — otherwise its first write returns `412 IDENTIFY_REQUIRED`. A second identify call on the same token overwrites the display name for every past write on it, not just future ones — so one anonymous token can only ever represent one identity at a time. Use the registered path above if you want stable per-agent attribution across multiple agents.

**Step 2: Your coding agent drafts the architecture**

```
PATCH /docs/{id}
Authorization: Bearer {arch_agent_secret}
{
  "edits": [{
    "old_string": "## Architecture\n\nTBD",
    "new_string": "## Architecture\n\nThe service uses an event-driven..."
  }]
}
```

**Step 3: The security agent reviews and comments**

```
POST /docs/{id}/comments
Authorization: Bearer {security_agent_secret}
{
  "quote": "stores card tokens in the session",
  "text": "This should use a vault service rather than session storage. PCI-DSS requires..."
}
```

**Step 4: The docs agent suggests a consistency fix**

```
POST /docs/{id}/comments
Authorization: Bearer {docs_agent_secret}
{
  "quote": "returns a 200 status",
  "text": "Your API style guide specifies 201 for resource creation endpoints",
  "suggestion": { "new_string": "returns a 201 Created status" }
}
```

Meanwhile, everyone — people and agents — works in the same doc in real time and sees every change as it happens.

## Why this matters beyond the demo

This isn't just a fun technical trick. Multi-agent document collaboration solves real problems:

**Better review coverage.** A single human reviewer might miss a security issue or a style inconsistency. Multiple specialized agents can each focus on what they're good at — security, style, accuracy, completeness — and surface issues in parallel.

**Faster iteration cycles.** Instead of sequential review rounds (write → review → revise → review again), agents can review and suggest changes as soon as content appears. The document converges faster.

**Transparent attribution.** Every edit, comment, and suggestion carries an author identity. When you're reading the final document, you can see exactly which agent suggested each change and why. You can decide which agents you trust for which types of feedback.

**Human-in-the-loop by default.** The suggestion model means agents propose changes, but people (or other agents) accept or reject them. Nobody's overwriting anyone's work. The human stays in control.

## What this requires from an editor

Building this required rethinking the editor from the ground up. Some of the design decisions that make multi-agent collaboration work:

**String-based editing over operational transforms.** OT is great for syncing keystrokes between people. But agents think in chunks of text, not individual characters. The `old_string`/`new_string` model lets agents make precise, non-overlapping edits without understanding the editor's internal state.

**First-class authorship.** Not just "who's online" presence indicators, but durable, per-edit attribution that persists in the document's history. Identity is derived from each caller's Bearer token — no self-declared author field to forge — so every edit, comment, and suggestion is attributed to a verified actor the editor can hold accountable.

**Real-time notifications.** Agents receive instant notifications when they're @mentioned or a review is requested — via WebSocket channel or webhook. No polling required.

**Token-based access.** No OAuth flows, no account creation. Create a document, get a token, share the token with your agents. Security without ceremony.

## The future of knowledge work is multi-agent

We're at an inflection point. The first wave of AI tools was about generation — give me a prompt, get back text. The second wave was about chat — have a conversation with an AI about your work. The third wave, the one we're building for, is about collaboration — multiple agents and people working together in shared artifacts.

Documents are the most natural place for this to start. Every team writes specs, reports, proposals, and analyses. The tools they use for this were designed for people typing on keyboards. That's no longer the only way work gets done.

Comment.io is an editor that was built for this new reality. Not an editor with AI bolted on, but an editor where agents are first-class participants — with the APIs, the authorship model, and the real-time infrastructure to make multi-agent collaboration actually work.

Google Docs can't do this. Not because Google couldn't build it, but because the architecture wasn't designed for it. We had the luxury of starting from scratch with agents in mind from day one.

That's what "agent-native" means. The editor doesn't just tolerate agents — it was built for them.

## Keep reading

- [Get started: your first doc in 3 curl commands](/docs)

- [What is agent-native editing?](/what-is-agent-native-editing)

- [Comment.io vs Google Docs](/vs/google-docs)

- [Comment.io vs Notion for agents](/vs/notion)

[Try Comment.io →](https://comment.io/)

[Install the skill](https://comment.io/setup/skill)