Instructor Notes

Teaching Philosophy: The Shift from Writer to Editor

This lesson is designed to help researchers navigate the transition from imperative coding (writing every line) to agentic orchestration (guiding an AI to write code and then validating it). The core challenge for learners isn’t learning syntax—it’s managing Cognitive Load and maintaining a rigorous Editor’s Mindset.

Key Concepts to Emphasize:

Verification Load: It is often harder to verify code you didn’t write than to write it yourself. Normalize this “friction” as a sign of high-quality research.
Evidence Mantra: “I do not approve changes; I approve evidence.” This should be the recurring theme of the workshop.
Sandboxing: Always emphasize the security implications of giving an AI direct access to the filesystem.

Episode 1: Understanding CLI-Based AI

Auth Check: Ask learners to run gemini --version. If it returns a version number they are ready. If not, have them run gemini auth login and sign in with a Google account before continuing.
The Browser vs. CLI distinction: Use the analogy of a “consultant” (Browser) vs. a “research assistant with keys to the lab” (CLI).
Discussion: The prompt about “ChatGPT writing code that looks correct but fails” is a great way to bond over shared frustration and set the stage for why we need the CLI (to run and test immediately).

Episode 2: Best Practices for Prompting

CO-STAR vs. CLEAR: Don’t get bogged down in the acronyms. The goal is intentionality.
Live Demo Tip: Show a “Bad Prompt” vs. a “Good Prompt” live. Purposely run a vague command and show how it fails or produces messy output before using the refined version.
Self-Correction: This is the “lightbulb” moment. Demonstrate asking the AI, “Are you sure? Review your code for edge cases.”

Episode 3: Data Cleaning (Live Demo)

High Intensity: This is the most technically demanding episode.
The “Safety Net”: If a learner’s AI fails to generate working code after two attempts, have them copy the pre-written script from instructors/files/backup_clean_and_merge.py. This prevents them from falling behind.
Stop and Read: Literally tell the class to “hands off keyboards” for 2 minutes to read the generated script before they run it.

Episode 4: Validation Best Practices

Three-Layer Framework:
1. Human Rules (Ground Truth)
2. AI Tests (Supervised)
3. Metamorphic Tests (Relationships)
Activity: Encourage learners to try the “Metamorphic Sanity Check” (Challenge). It’s a powerful way to show how “reproducibility” can be subtly broken by AI randomness.

Episode 5: Limitations and Cautions

Silent Semantic Drift: This is the most “dangerous” failure. Use the example of a filtering threshold (e.g., > 0.5 vs. >= 0.5) that might not crash the code but changes the science.
Environmental Cost: This is often a new topic for researchers. It grounds the “Spec-Driven Research Orchestration” hype in physical reality.

Episode 6: Resources and Next Steps

The Toolscape: Acknowledge that tools (Aider, Cursor, Claude Code) change weekly. Focus on the principles (CLI, validation, provenance) rather than specific software.
Attribution: Remind learners that while AI can’t be an author, transparency about its use is a core tenet of Open Science.

Troubleshooting & Common Issues

API Quotas & Limits

If learners hit “Resource Exhausted” errors, it’s likely they’ve exceeded their free tier quota or are prompting too rapidly. Suggest they wait 60 seconds or use a smaller “context” (don’t send every file in the folder).

“Gemini Not Found”

Ensure the global npm install worked and that their PATH is updated. In Docker, this is handled automatically. On host machines, they might need to restart their terminal.

Hallucinations

If a learner gets a script that imports a non-existent library (e.g., import science_cleaner), use it as a “teachable moment” for the whole class about verification.

CLI-based AI

Setup check

Ask all learners to run:

BASH

gemini --version

If it returns a version number, they are ready. If the command is not found, they need to complete the install and run gemini auth login before continuing.

Discussion prompt

Ask learners: “Have you ever used ChatGPT to write code that looked correct but failed when you ran it?” This is a good time to introduce the concept of orchestration. The goal is not just to “fix” code, but to ensure the AI’s intent (the spec) is correct.

Best practices for prompting

Teaching tip: Visual aids

Write CLEAR vertically on a whiteboard. As you explain each letter, add the keyword (Concise, Logical, Explicit, Adaptive, Reflective). This helps students remember the framework.

The introspection concept

Emphasize this section. Most learners treat AI output as final. The idea that they can ask the AI to fix its own work is often a new concept. It is like asking a student, “Are you sure you checked your work?”—they often find their own mistakes when asked.

Data cleaning with AI

Live coding

This episode uses live coding. Learners should follow along by running commands on their own machines.

Prerequisite

Prerequisites

Ensure you are authenticated (gemini auth login) and have a Gemini CLI session running in your project folder. Generating scripts can take 10-30 seconds.

Backup scripts

If a learner’s AI fails to generate working code, provide the pre-written versions from instructors/files/: - backup_make_messy_data.py - backup_inspect_data.py - backup_clean_and_merge.py

Challenge tip

This challenge requires modifying existing code. If learners are stuck, suggest they ask the AI to read clean_and_merge.py before asking for modifications.

Validation strategies: the approval gate

Teaching tip: approval fatigue

Warn learners about approval fatigue the tendency to accept AI suggestions without reading them. The four-layer stack is designed to make the AI prove it is correct before you review the code.

Limitations and cautions

Managing expectations

Models are becoming less likely to hallucinate. * If it refuses: Acknowledge that the model correctly identified its own limitations. * Backup: Have a screenshot of a known hallucination ready to show if the AI performs perfectly during the session.