The online tempo experiment is now over; I am no longer collecting information. You are welcome to continue playing the game if you find it helpful, though! More information below.
Thanks so much for participating! We collected a total of 1041 games played. Of these, 882 games had the human player agree with the calculated tempo. In 89 games, the player disagreed with the calculated tempo and gave their own tempo estimation, and 70 games had the player disagree with the tempo without giving their own tempo estimation.
My task now is to examine all these games; my goal is to have "3-agent agreement". For the "agreed with tempo" games, I need to check that I agree that the tempo is good -- this means that the player, me, and the computer all agree on the tempo. For both categories of "disagreed with tempo", I need to figure out what the real tempo should be, then tweak my algorithm so that the computer produces that tempo -- without ruining any of the detected tempos of the "agreed" examples. And if there's any disagreement between me and the player, I'll have to show the specific examples to our resident professor emeritus of music so that he can give a tie-breaking vote. And if he disagrees with both of us... well, there will probably be so few of those games that I can just discuss them individually in a special section of the paper. (I doubt there'll be any of those, though)
A few people asked me privately about getting 100%, so I took a quick glance at the games. The very best game had an average error of 1.25 milliseconds (this was on level 5), while the best game on level 1 had an average error of 12.6 milliseconds. Mathematicians everywhere just winced at my use of the word "average", so let me clarify that those were the RMSE. The mean squared errors were 1.5*10^-6 and 1.5*10^-4, respectively.
Those games were very much the exception, however -- most "great" exercises had an "average" error of 30-40 milliseconds. This is just based on me skimming through a list of hundreds of numbers, though... a detailed (and non-subjective :) analysis will be coming in the next few days.