The forgery rate#
Large language models (LLMs) fabricate citations. Not occasionally. Routinely.
A 2023 study in Scientific Reports checked 300 ChatGPT-generated references. 32% pointed to papers that never existed. Real author names, plausible titles, correctly formatted DOIs. Publications with no physical or digital presence anywhere in the scientific record.
GPT-4o performs worse, at 56%. The mechanism: author names, journal titles, and volume numbers follow statistical patterns in training data. The model generates text matching those patterns without verifying whether the specific combination refers to a real publication. A fabricated citation circulates through draft manuscripts, slide decks, and literature reviews for months before anyone checks the DOI.
The standard mitigation: ask the LLM to search the web for each reference. Each web search consumes a tool call (tokens for the request, tokens to parse the HTML response), hits paywalls, and returns unstructured content that the model must interpret. For a 20-reference paper, that amounts to 20 separate searches, each slow, each unreliable, each burning context window.
One structured call#
We released OokCite MCP, a Model Context Protocol (MCP) server that connects language models to our own citation search index, backed by CrossRef. Instead of scraping the web, the LLM makes a single API call and receives structured metadata (title, authors, year, journal, DOI) in milliseconds.
One command installs and configures it:
npx @turtletech/ookcite-mcp setupThis detects your MCP clients (Claude Desktop, Claude Code, Cursor, Codex, Windsurf, OpenCode) and writes the configuration. No API key required for basic usage.
What the tools do#
OokCite exposes 29 tools. The core workflow:
validate_doi checks whether a DOI resolves to a real publication. If the model tries to cite a paper that doesn’t exist, this catches it before the reference reaches your manuscript.
reverse_lookup accepts messy citation text (“Henkelman 2000 climbing image NEB”) and returns the correct paper with its DOI.
format_citation takes a DOI and a style name (APA, IEEE, Chicago, Nature, or any of 2,900+ CSL styles) and returns both the in-text marker and the bibliography entry.
batch_format and verify_references handle entire reference lists in parallel. A 20-citation bibliography resolves in one call, not twenty sequential web searches.
Collections let you save citations to named groups. Papers in a collection cost nothing to re-lookup. Export as BibTeX with Better BibTeX citation keys. Import .bib and .ris files.
The economics#
The free tier handles 30 lookups per day with one collection of 100 entries. For most individual researchers writing one paper at a time, this covers the workflow.
Batch operations, larger collections, and higher daily limits require a paid plan:
| Plan | Price | Lookups/day | Collections | Entries/collection |
|---|---|---|---|---|
| Free | $0 | 30 | 1 | 100 |
| Academic | $4/mo | 10,000 | 5 | 500 |
| Business | $12/mo | 10,000 | 10 | 2,000 |
Academic pricing applies to students, researchers, and educators at accredited institutions.
Install#
npx @turtletech/ookcite-mcp setupOr configure manually:
{
"mcpServers": {
"ookcite": {
"command": "npx",
"args": ["-y", "@turtletech/ookcite-mcp"]
}
}
}Test with: “Validate this DOI: 10.1038/187493a0”
The response identifies Maiman’s 1960 paper on stimulated optical radiation in ruby – the first working laser.