Living Without the Bunnies (old posts, page 30)

Tempo Game

2010-08-06 07:51

As you know, I've been working on educational games to help music students practice rhythms and intonation. Well, I'm finally ready to begin tests of aspects of these programs; the first test is to detect the tempo of a student-performed rhythm.

I've written an online flash game for rhythms. The game presents you with a rhythm and metronome, and you're supposed to tap the rhythm on your keyboard.

http://percival-music.ca/tempo/

Of course, since this game is supposed to be useful for students, you could try tapping the wrong rhythm – ignore the metronome, add an extra note, hold a few notes for twice as long as you should... make whatever mistakes you think a student might make! Or, if you're not a professional musician, you could just try to perform the rhythms correctly, and let the mistakes happen naturally. :)

Information about your gameplay will be automatically sent to me and used to improve the tempo detection. No names or personal information is recorded; I have no idea who is playing the game. Details of the information gathered is available on the website.

Graphing commits per developer

2010-08-05 08:01

I gave a talk at Rencontres Mondiales du Logiciel Libre, or the Libre Software Meeting 2010, over three weeks ago. This post has the scripts that I used to create the graphs of commits per developer. These scripts aren't particularly inspired, but there's been some interest in them, so in the spirit of open source and just plain good science, here there are. :)

Here's the ultimate goal: a graph showing commits per developer.

More context is in the previous post about the talk, Sustainable Development in F/OSS

First, we need a program to extract statistics. I used gitstats, and heartily recommend it to others.

Second, we need to know the range of dates to use. I used 6-month intervals. I wimped out here -- I'm sure that it's possible to make git tell you this automatically, but instead I just skimmed through the logs in gitk and manually picked out the first commit in each date range. The result is this:

2005-1  cccee40bb3a13b6c230fa98a8ca61f7c526d5f66
2005-7  b28d02e3cbb81b050cf4c6bba3260a331d4f7d3f
2006-1  16b713b9b9616d70c4a1b12506952098c01c5706
2006-7  eaed6aa8f16502d8159530eaf3ed4d56dbf8fef8
2007-1  ad33cc4b37628e047e4c8c3b3d42d93cb7525d0f
2007-7  699feb0fbe8448a94ca5d9de43dd126d34fd5341
2008-1  57221c9ee7a758169c9e2fe4805f0ed3598f50d5
2008-7  b57384106e4e54f4a22a93af88204f86f37d078b
2009-1  015ac40ff340255ea4ac76fbe5b5eab3c2a40adc
2009-7  bee6f18b78c002187d9197b6a334ff61c13a4ef9
2010-1  9429c53e9e4a1f88b32f4897fdc261ed9bd6684f
2010-7  ec376e079a0dc586ba3fe17113993525f2be69c2

(I used a tab between the columns, but that won't come out in the html)

Third, I ran the below script to run gitstats for each range of dates/commits:

# make-gitstats.py
# public domain if that matters; this is utterly trivial

#!/usr/bin/env python
import sys, os

GITDIR = "$HOME/src/lilypond"
COMMITS_FILENAME = "lily-dates.txt"

commits = open(COMMITS_FILENAME).readlines()

def getRange(begin, end):
        out_filename = begin.split('\t')[0] + '_' + end.split('\t')[0]
        commit_begin = begin.split('\t')[1].rstrip()
        commit_end   = end.split('\t')[1].rstrip()
        cmd = "gitstats"
        cmd += ' -c commit_begin='+commit_begin
        cmd += ' -c commit_end='+commit_end
        cmd += ' ' + GITDIR
        cmd += ' ' + out_filename
        print cmd
        os.system(cmd)

# whole range
getRange(commits[0], commits[-1])

for i in range(len(commits)-1):
        getRange( commits[i], commits[i+1] )

Fourth, I wimped out again. The previous step created a bunch of directories containing info about each date range nicely formatted in html. Instead of grabbing the info I wanted directly with python, I copied the table of authors (from the web page) into openoffice. I split up the commit field (which contains both the raw number, and the percentage of the total like "3656(26.20%)") with:

=VALUE(LEFT(B2;FIND("(";B2)-2))

Then I copied the columns around and exported it as a csv, ending up with something like this:

"2005-1 to 2005-7"   "2005-7 to 2006-1"   ...
626                  517                  ...
237                  77                   ...
123                  67                   ...

(again, there might be whitespace tab/space issues in this display)

Fifth, it's gnuplot time! Again, I could automate more of this file, but it worked for my purposes.

# combined.plot
set term png enhanced font
'/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf' 12

set title "Git commits to LilyPond, 2005 - 2010 in 6 month
intervals"
set xlabel "Rank of developer"
set ylabel "Number of commits"

set key autotitle columnhead
set yrange [6:]
#set xrange [:20]

set style line 1  lt 1  lw 2 lc rgb "#0000ff"
set style line 2  lt 2  lw 2 lc rgb "#0033cc"
set style line 3  lt 3  lw 2 lc rgb "#006699"
set style line 4  lt 4  lw 2 lc rgb "#009966"
set style line 5  lt 5  lw 2 lc rgb "#00cc33"
set style line 6  lt 6  lw 2 lc rgb "#00ff00"
set style line 7  lt 7  lw 2 lc rgb "#33cc00"
set style line 8  lt 8  lw 2 lc rgb "#669900"
set style line 9  lt 9  lw 2 lc rgb "#996600"
set style line 10 lt 10 lw 2 lc rgb "#cc3300"
set style line 11 lt 11 lw 2 lc rgb "#ff0000"

set output "combined-normal.png"
plot "combined.csv" using 1 ls 1 w lines, \
        "combined.csv" using 2 ls 2 w lines, \
        "combined.csv" using 3 ls 3 w lines, \
        "combined.csv" using 4 ls 4 w lines, \
        "combined.csv" using 5 ls 5 w lines, \
        "combined.csv" using 6 ls 6 w lines, \
        "combined.csv" using 7 ls 7 w lines, \
        "combined.csv" using 8 ls 8 w lines, \
        "combined.csv" using 9 ls 9 w lines, \
        "combined.csv" using 10 ls 10 w lines, \
        "combined.csv" using 11 ls 11 w lines

set output "combined-log.png"
set log y
plot "combined.csv" using 1 ls 1 w lines, \
        "combined.csv" using 2 ls 2 w lines, \
        "combined.csv" using 3 ls 3 w lines, \
        "combined.csv" using 4 ls 4 w lines, \
        "combined.csv" using 5 ls 5 w lines, \
        "combined.csv" using 6 ls 6 w lines, \
        "combined.csv" using 7 ls 7 w lines, \
        "combined.csv" using 8 ls 8 w lines, \
        "combined.csv" using 9 ls 9 w lines, \
        "combined.csv" using 10 ls 10 w lines, \
        "combined.csv" using 11 ls 11 w lines

If you want to use any of this, go ahead! I declare it to be public domain.

Caution: as I mentioned in my talk, the raw number of commits per developer is a vague measure. It doesn't include emails (either organizing a project, reviewing patches, etc), amount of work behind each patch, etc. For example, in the range of Jan 2010 - July 2010, I'd say that the two programmers who did the most work (both in terms of the bugfixes and new features they wrote, but also in terms of reviewing other people's patches) were #5 and #6 on the "commits per developer" list.

I've considered starting up a project to give a better measure of a project's "health" -- taking into account emails, bugs fixed, reviewing activity... in the case of lilypond, all that info wouldn't be too hard to gather, and it would certainly be a fun task to develop algorithms to deal with that data.

However, if I want my PhD to be finished ASAP -- and I definitely do want this done so that I can start on my postdoc life -- then I really can't afford to branch off onto that type of work. At least, not alone. If anybody else is interested in working on this project, I'd definitely considering on that team. :)

Sustainable Development in F/OSS

2010-08-01 01:13

I gave a talk at Rencontres Mondiales du Logiciel Libre, or the Libre Software Meeting 2010, over three weeks ago. I promised to put slides online, but travel, a music camp, and a publication deadline conspired to delay them. I also wanted to clarify a few points in the slides, which then turned into a more substantial rewrite as I discovered more and more things which didn't make much sense without me explaining them verbally. And then I showed them to my brother, an experienced BSD conference presenter, who suggested a bunch more changes. But I'm happy to announce that the (rewritten) slides are done!

Talk abstract:

The time and energy which developers spend on open-source projects is not an infinite resource. Developer effort can stall due to external demands on their time (such as family, career, or health), but also due to internal factors (such as a loss of motivation or interest). Long-term projects (5+ years old) should try to engage in sustainable development practices. How can we retain developer interest? How can we prepare for the inevitable loss of developers? How can we train the next generation of developers? This talk draws upon experiences from GNU/LilyPond (a 14-year old sheet music typesetter), but makes general suggestions (and warnings!) for users, developers, and project leaders.

Slides: sustainable.pdf

Talk source (released under CC BY-SA 3.0): sustainable.tar.bz2

Requires latex-beamer, includes raw data for git commit counts, but not the scripts which generated that data. I left those on my Glasgow computer by accident, but I could send them to anybody after Aug 3 if they really want.

Later edit: my scripts have been uploaded to Graphing commits per developer.