Agentic Research Workflows: AI and Validation

Natural-language specifications, CLI agents, and a four-layer validation stack

This lesson introduces researchers to Agentic Research Workflows, changing how we write and maintain research software. By using AI coding agents, we move from writing every line of syntax to defining intent and verifying outcomes.

The focus is not on replacing research thinking, but on reducing the cost of turning research intent into runnable code while maintaining standards for correctness, validation, and reproducibility.

From ‘Vibes’ to Research Orchestration

The term “vibe coding” (coined by Andrej Karpathy) describes the early 2023-2024 shift toward guiding AI with natural-language prompts. While intuition is where we start, research demands more. This lesson teaches you how to turn conversational speed into Spec-Driven Research Orchestration—a disciplined, validated workflow.

What this shift supports

Code orchestrator over author: Your primary role shifts from writing syntax to auditing logic and specifying constraints in a Living Spec (GEMINI.md).
Specification-driven development: You spend more time defining what the data should look like and why, and less time debugging boilerplate.
Cognitive load management: By offloading syntax to an agent, you free up mental resources for research decisions.

What it does not do

Validate your research question or choose your methods.
Detect subtle statistical or causal errors by default.
Reduce your responsibility for the final output.

Using AI to generate code is research-adjacent labor. It becomes research only when embedded in a disciplined process of validation, documentation, and justification.

Prerequisite

Prerequisites

Basic familiarity with the command line.
Fundamental understanding of Python (variables, functions, scripts).
No prior experience with AI coding agents is required.

Learning Objectives

By the end of this lesson, participants will be able to:

Manage changes across a project using CLI-based AI agents.
Orchestrate agents using a Living Spec (GEMINI.md).
Apply a four-layer validation stack (requirements, tests, metamorphic checks, domain plausibility).
Implement approval gates to prevent fatigue and spec drift.
Track the provenance and reproducibility of agent-generated results using Git.

Acknowledgements

This lesson is adapted from the workshop “Vibe Coding for Research” developed by Bruno Smaniotto and Tom van Nuenen at the UC Berkeley D-Lab.

The original materials can be found at dlab-berkeley/Vibe-Coding-for-Research.

To follow this lesson, you will need the Gemini CLI and Python installed on your machine. The steps below get you set up with a direct local install, which is the approach used throughout the lesson.

1. Install Node.js and Python

Node.js: Download the LTS version from nodejs.org or use a package manager (brew install node on macOS).
Python: Ensure you have Python 3.9+ installed. Check with python --version.

2. Install the Gemini CLI

BASH

npm install -g @google/gemini-cli

3. Authenticate

Run this command and sign in with your Google account when the browser opens:

BASH

gemini auth login

The CLI stores your credentials locally. You will not need to repeat this step.

4. Verify the install

BASH

gemini --version

If you see a version number, you are ready. If the command is not found, restart your terminal and try again.

Caution

Security and working directory

The Gemini CLI runs in your terminal and has direct access to the files in your current folder. A few habits to keep in mind:

Always start the CLI from a dedicated project folder, not your home directory.
Keep files under version control (Git) so you can revert unwanted changes.
Never start the CLI in folders with sensitive system files, credentials, or private data.

Callout

Running in a sandbox (optional)

Some researchers prefer to isolate the CLI from their personal files entirely. Two options worth knowing about:

Docker: The lesson repository includes a Dockerfile that builds a container with the Gemini CLI pre-installed. The agent can only see files you explicitly mount into it. See the Docker documentation for setup, or Docker AI Sandboxes for a purpose-built option.
Agent Safehouse: agent-safehouse.dev is a dedicated environment for running AI agents with built-in isolation controls.

Both approaches require more setup and are not needed for this workshop, but are worth exploring if you plan to use these tools regularly with sensitive data.

Institutional context and access

Many campuses (like the University of California) have enterprise AI agreements. CLI access is often a separate feature that requires IT provisioning. If your institution’s license does not cover the CLI, you may need to use a personal Google account for this workshop. Always follow your institution’s data privacy policies.