Artifastring 1.0

2010-11-21 00:03

I'm happy to announce the release of Artifastring 1.0, a highly optimized physical simulation of a violin for sound synthesis. http://percival-music.ca/artifastring/

This is an implementation of modal bowed string physical modeling as described by Matthias Demoucron; references are given on the project's main page. The code is available under the GPL v3 or higher.

I had a blast optimising this stuff -- actually, I had far too much fun working on tiny speed improvements. Surprisingly, making the code multithreaded didn't help; the extra overhead of the mutex and condition variables nuked the benefit of modeling each violin string separately for any realistic length of discretized input.

I won't get into everything I did, but here's the main four points:

As suggested in a conference paper, if a violin string wasn't vibrating and had no input, I didn't bother calculating new values for it -- I just returned 0.0. (this model doesn't include sympathetic vibrations, so there was absolutely no change in the output from doing this). I also detected if the string was barely vibrating (i.e. producing a force of less than 1e-6 on the bridge and each modal velocity was below 1e-4), and then "turned off" that string. This step did change the output, but only very slightly. I picked those values experimentally -- I plucked the string, then looked for the time when I really couldn't hear the sound any more. Admittedly, I was using a pair of £10 headphones... if I had a more expensive sound setup (whether headphones, speakers, a larger room, room with less background noise, etc), these values might be too low. If that were the case, it's easy enough to change those constants.
Caching any values more complicated than addition or multiplication. In particular, the results of any trig functions were cached -- the amount of extra memory required is trivial (something like 20 kb ?).
The code calculates eigenvectors of the string for each mode. This is a series of calculatations in the form
```
y_i = sin(x*i)   , i = 0, 1, 2... N (where N=56 in my case)
```
There's a well-known optimization for this: to calculate y_n = sin(nW+B), use y_n = 2*cos(W)*y_(n-1) - y(n-2).

I could improve on that, though. In my case, there's no B, and I know that sin(0) = 0. So I only need to calculate sin(-W) and cos(W).

But wait, it gets better! The quoted webpage includes this code:
```
double y[3];
double keep[N];
for (int i=0; i<N; i++) {
    y[2] = p*y[1] - y[0];
    keep[i] = sqrt_two_div_l * y[2];
    y[0] = y[1];
    y[1] = y[2];
}
```
but this involves unnecessary copying. We can avoid that by using a ring buffer of size 2:
```
double y[2];
double keep[N];
unsigned int latest = 0;
for (int i=0; i<N; i++) {
    y[latest] = p*y[!latest] - y[latest];
    keep[i] = sqrt_two_div_l * y[latest];
    latest = !latest;
}
```
hmm... it just occurred to me that I could avoid using y if I postponed the multiplication by sqrt_two_div_l until after I'd finished calculating the entire array. I mean, go through the array once calculating the sin() values, then go through again multiplying everything by sqrt_two_div_l. I could also get rid of the "latest" variable, and just use i. Hmm... :)
Fast ring buffer pointer calculation: the traditional way is to use modulo:
```
i = (i++) % BUFFER_SIZE;
```
But modulo is division -- it's slow. Another way is to use a comparison:
```
i++;
if (i == BUFFER_SIZE) {
    i = 0;
}
```
That's better, but still uses a bunch of CPU cycles. To improve it even more (courtesy of my brother), if you always use a buffer size which is a power of 2, you can just use an AND!
```
i++;
i &= BUFFER_SIZE_MINUS_ONE;
```
Of course, we've defined BUFFER_SIZE and BUFFER_SIZE_MINUS_ONE at compile-time.

Sadly, none of this optimization is actually useful. Ok, the "inactive string" trick cut the processing time by more than half -- but the other tricks only gave 2-5% improvements in speed. One of the first lessons for people doing computer science is that running time doesn't matter. The real improvements are made at the level of algorithms... if you can do O(ln(n)) less operations than the next person, then it doesn't matter if you wrote your code in perl and he used hand-crafted assembly. Your code is guaranteed to be faster for any sufficiently large problem.

... of course, in the grand scheme of "doing stupid things," spending 3 days obsessing about speed and making your code run 20% faster is hardly a terrible thing. I mean, the next stage of the project is using the physical model to play music; that will involve a *lot* of experiments, and the faster the code runs, the more comfortable those experiments will be.

Anyway, there's the code. I don't have any fancy examples yet; I'm still learning how to "play" this virtual instrument. I should have neat stuff in a month or two.