Back to Repositories

SakanaAI/AI-Scientist-v2

AI-Scientist-v2 is the first system to get a fully AI-generated scientific paper accepted at a peer-reviewed ML workshop with zero human intervention. Not a writing assistant. Not a research helper. An end-to-end agent that generates hypotheses, designs experiments, runs them on GPUs, analyzes results, writes manuscripts with citations, and iterates based on reviewer feedback — all autonomously. The core innovation in v2 is progressive agentic tree search. Instead of the linear pipeline in v1 (which required human-authored templates for each research domain), v2 uses best-first tree search (BFTS) with an experiment manager agent that explores multiple research directions in parallel. Think of it like a chess engine, but for science — it evaluates which experimental paths are most promising, allocates compute to the best branches, and prunes dead ends automatically. A Vision-Language Model feedback loop handles something researchers usually do manually: looking at figures and deciding if they actually communicate the findings. The VLM iterates on chart aesthetics, label placement, and visual clarity until the figures meet publication standards. That's the kind of detail that separates this from a GPT wrapper that generates papers. The system runs experiments using real GPU compute with PyTorch and CUDA. It integrates Semantic Scholar for literature search, supports GPT, Gemini, and Claude (via AWS Bedrock) as the reasoning backbone, and costs approximately $15-$20 per experimental run. That cost-per-experiment figure is remarkable — a human researcher's time on a single experiment often exceeds that by an order of magnitude. Sakana AI, the Tokyo-based company behind the project, explicitly acknowledges v2's tradeoffs. It doesn't necessarily produce better papers than v1 when a strong template exists. v1 excels at well-defined tasks within narrow domains. v2 trades that reliability for generalization — it can tackle open-ended research questions across ML domains without templates. Lower success rate, but dramatically broader capability. The security warning is worth taking seriously: the system executes LLM-generated code on your hardware. Sakana strongly recommends Docker containers for sandboxing. Any paper generated must include a disclosure that it was created by AI — the repo provides mandatory wording for this.

other
Python

Why It Matters

Every researcher who has spent months on a paper that got rejected knows the pain. AI-Scientist-v2 doesn't eliminate that pain, but it runs the entire hypothesis-to-manuscript pipeline at $15-$20 per attempt, making it economically feasible to explore dozens of research directions simultaneously. The peer-reviewed acceptance proves this isn't vaporware — a real academic review committee evaluated the output and said yes. For ML research teams, the practical value is in exploration speed. You can use v2 as a hypothesis generation machine that runs overnight, evaluates 10 experimental branches, and presents the most promising ones in the morning. It's not replacing researchers — it's giving them a 24/7 lab assistant that writes up its findings. The open-source release (Apache 2.0 compatible elements with research-specific licensing) means any university lab or independent researcher can run this. Given that compute access is the primary bottleneck, not software, this democratizes the experimental pipeline in a way that matters.

Repository Stats

Stars
2.9k
Forks
476
Last Commit
12/19/2025

Category

Related Resources

Weekly AI Digest