Library Carpentry sprint, 2 Oct, 2017 - LiASA

The Library Carpentry sprint is part of the Library and Information Association of South Africa 2017 Conference, which will take place on 2 October, 2017 in Johannesburg. For those new to the field, a sprint is like a hackathon - it is a way to get people working together to create, update or extend open projects like Library Carpentry. We’ll use this etherpad to organize the sprint.

Carpentries South Africa Trip

I’m taking off for South Africa later this eve and will be teaching and participating in several events while there. I’m super excited about joining my hosts, @anelda and @zjsteyn, for 2 weeks in Joburg.

Schedule of What is Happening


Chapter 2: Why It's Better Than It Seems

Why It’s Scary

“It’s hard to be shown up by a nineteen year old.”

In Teaching What You Don’t Know, Terese Huston interviews 28 faculty and administrators about their experiences teaching outside their expertise. There are clear reasons on why you would not want to do this. Generally, the author found that teaching what you don’t know can be very stressful. An instructor doesn’t want to be outsmarted by students or asked questions she or he can answer. Also, when teaching outside of your expertise, you typically spend more time preparing for the courses.

Why Teach Outside Your Expertise

Considering the downsides – lack of sleep, extra preparation, anxiousness, etc., – why would you do this to yourself? Why would you teach what you do not know? Huston found that there are a number of reasons why her interview cohort found it beneficial to teach outside of their expertise:

  1. An opportunity to learn something new and most academics love to explore and learn something new. A corollary benefit of this is that teaching is a real solid way to focus the mind to be able to learn something.
  2. An opportunity to connect with faculty outside of your department. If you are teaching subject where someone else on campus in another department has content expertise, asking for that person’s help with the class is a great way to make new acquaintances.
  3. Broadens your CV and you become more attractive to potential employers because you become more versatile.
  4. Can lead you to developing a new area of research.

Chapter 1: Teaching What You Don't Know

Teaching What You Don't Know

Huston, Therese. (2009) Teaching what you don’t know /Cambridge, Mass. : Harvard University Press.

In the first chapter, the author lays out some of the underpinnings for why teaching what you don’t know, though often not talked about, is prevalent in academia. She interviewed 28 faculty and administrators about teaching outside of their expertise and found that most teach what they don’t know, roughly, because of factors, such as where they teach, what they teach and the way higher education works.

Where they teach

Faculty and instructors that teach at smaller institutions are more likely to pick up course in topics they didn’t study in graduate school. The simple numbers dictate that you will have to cover more areas because of fewer instructors in smaller schools.

What they teach

Faculty that teach as part of a general education program or who are responsible for cross-disciplinary seminars are often teaching beyond their expertise. Additionally, many departments offer courses that are so broad that instructors can’t be experts in all of the represented topics. The author gave an example of a Law professor who teaches property law but noted that property law can cover material that is grounded in a thousand years of jurisprudence history. Subsequently, a professor who specializes in a sub-sub-sub part of that history will ultimately routinely be asked to teach outside of their area to contribute to the curricular offering of their school.


Stata estout: UCSD Epidemiology Group

On July 28, 2016, I presented to a UCSD epidemiology group at the Medical Teaching Facility on using estout to autogenerate regression tables for publication. estout is a Stata package that makes it easy to produce publication quality regression tables in Stata. It also has provides various output formats including CSV, RTF, HTML or LaTeX.


Using AWK to Filter Rows

After attending a bash class I taught for Software Carpentry, a student contacted me having troubles working with a large data file in R. She wanted to filter out rows based on some condition in two columns. An easy task in R, but because of the size of the file and R objects being memory bound, reading the whole file in was too much for my student’s computer to handle. She sent me the below sample file and how she wanted to filter it. I chose AWK because it is designed for this type of task. It parses data line-by-line and doesn’t need to read the whole file into memory to process it. Further, if we wanted to speed up our AWK even more, we can investigate AWK ports, such as MAWK, that are built for speed.


UCSF Sofware Carpentry

UCSF Sofware Carpentry 8/4-8/5 2016

I’m teaching bash shell and Git at a two day workshop at UCSF from 8/4-5. Ted Hart @emhart is teaching R. Lots of fun so far!


Contributing to carpentry lessons with GitHub


Set up your remote upstream and merge updates from there

  1. Fork the repo you want to work on.
  2. Clone that repo down, e.g., git clone
  3. So you can fetch changes from the originating repo, add remote reference there: git remote add upstream (you can see your remotes by git remote -v)
  4. Get changes from upstream: git fetch upstream
  5. Merge those changes locally: git merge upstream/gh-pages
  6. Repeat Number 4 & 5 above before you begin a new unit of work below to insure you have the latest base version of the lesson

Do work in a local branch and submit changes to the lesson repo

  1. Start with the latest version of the upstream lesson (see above)
  2. Create a branch for your improvements: git checkout -b new-lesson-improvement
  3. Install Jekyll if you want to preview your changes locally
  4. Run make serve to preview locally, typically at
  5. Once done with your work as you see fit, run git add, git commit, and then git push origin new-lesson-improvement up to your forked repository.
  6. Make a pull request from your repo in GitHub (this tells the upstream maintainers: hey, pull my improvement into the upstream repo)

Getting set up to improve lessons

After we taught Library Carpentry here at UCSD, we sat down and worked through the workflow for contributing to Carpentry lessons. Matt Critchlow, our IT Dev Manager, walked us through the document and I worked up the steps below from our meeting.

One of the confusing aspects on translating the common fork/pull-request development workflows is that most of the documentation found on the web is spelled out for master branches. This is because, by convention, the default branch when you initialize a repository in GitHub (or locally) is named master. However, with the Software/Data/Library Carpentry lessons the default branch is set to gh-pages. This is mainly for ease, because the web version of the lessons live in this branch and this is where Software Carpentry wants the work to go for contributions. Also, on GitHub, commits to this branch will be processed by Jekyll, a static site generator, making the nice lesson webpages we use in class. The main thing to know is that in Software Carpentry lesson land, when you see master in Git help or online documentation, you can mentally substitute it with gh-pages. Hopefully, this will help folks new to git contribute more to the lessons.

Update 2016-08-28: Corrected the Getting changes from the upstream default branch gh-pages section to remove using git status to check changes in the upstream remote as this won’t work! git fetch upstream is the right command to pull down any changes that may have been made.

Setup your fork and local clone

  1. Fork a lesson you want to contribute to, for instance, data-lessons/library-shell. Forking will create a linked copy of the repository in your own GitHub account.

  2. Clone the library-shell project to your local machine (USERNAME - your GitHub user account name). Having a local copy allows us to edit locally using our favorite tool, create branches for discrete work and keep the local repository in synch with data-lessons/library-shell:

    $ git clone

    clone grabs the repository and makes a local copy. It will create the directory (named for the repository name by default) and sets up the linkages between your clone and the remote repository (called origin). Let’s confirm this by running git remote -v.

    $ cd library-shell
    $ git remote -v
    origin (fetch)
    origin (push)