From Teaching Workshops to Building Research Infrastructure
By 2018, I was teaching constantly.
We built UCLA Carpentries. We trained instructors. We ran cohorts. We worked closely with departments and tailored instruction to their needs. Library Carpentry events were well attended. It felt like we were hitting stride.
Then two things happened at the same time.
First, we inherited a social science data archive that was already in Dataverse. My colleagues and I made a decision: instead of treating that as a legacy system, we would build UCLA Dataverse as a campus publishing platform.
We stood it up ourselves.
That meant infrastructure decisions, policy decisions, curation practices. It shifted my time away from pure instruction and toward data publishing and governance. We were no longer just helping people analyze data. We were helping them publish and preserve it.
Then COVID hit.
Instruction went online almost overnight. That could have fragmented everything. Instead, it pushed us toward system collaboration.
We began teaching across the UC system. UC Carpentries emerged as the first real iteration of that. We helped launch UC Love Data Week. We focused on improving online instruction and scaling it across campuses.
Workshops became networked.
At the same time, we began investing in infrastructure. With support from gifts, including from the Powell family, we purchased multi-tier computing infrastructure, GPUs, and deep learning machines. The library was no longer just a teaching space. It was becoming a place where research computing capacity lived.
As consulting demand grew, we needed more capacity.
We adopted Paula Lackey’s DataSquad model from Carleton College and built our own version at UCLA. Undergraduates began consulting on real research projects. This force-multiplied what we could do. In some cases, students brought deeper statistical or programming expertise than we had internally. Researchers gained support. Students gained experience. It became a core part of our structure starting around 2021.
Around the same time, researchers began bringing us more complex and sensitive data. Some datasets were massive. Some contained personally identifiable information. Local solutions were not viable.
We licensed and implemented Redivis to handle restricted and computationally intensive datasets. Researchers used it to publish articles, including work connecting health data and voting behavior. That platform filled a gap between compliance and productivity.
We also pushed the campus to lower barriers to tools like RStudio Pro and RStudio Online. Expanding access dramatically increased the number of scholars who could use those tools without managing local installations.
By 2023 and 2024, the work expanded again.
I helped write a Sloan Foundation grant that secured $1.8 million to build a UC-wide Open Source Program Office. That effort recognizes something we see daily: research depends on open source software, from Jupyter notebooks to small Unix packages that everything else relies on. Sustainability, governance, documentation, and citation practices matter.
In parallel, we secured an IMLS grant to develop a structured open science curriculum for librarians. Fourteen new lessons are funded, piloted, and moving toward integration into broader training ecosystems. This is about training the trainers.
Along the way, I joined and helped lead governance work in Library Carpentry. We improved processes for lesson intake, adoption, and sustainability.
All of this happened after 2018.
What began as intensive teaching evolved into building durable systems:
- Data infrastructure
- Compute infrastructure
- Student workforce capacity
- Secure data environments
- System-wide instruction networks
- Open source governance
- Open science curriculum
I still teach.
But increasingly, my work is about building the conditions that make teaching and research sustainable.