Playing Soundfonts with the Web Audio API

strudel

March 19, 2023

printing letters
by Andrea.C.Fi, CC BY-SA 3.0

In this post, I want to implement a way to load and play soundfonts with the Web Audio API.

Notes from the future: This post is originally from July 2022, but I never released it..

About Soundfonts and General MIDI

To get the whole story check out Soundfonts on Wikipedia. Essentially, soundfonts are for sound what normal fonts are for typography: A collection of interchangeable sounds that can be used to play different types of music.

The so called General MIDI specification (GM) is a standard set of sounds that a soundfont should have. You might know those from any old keyboard with 128 (or more) programs.

Besides General MIDI, the file format sf2 can also contain any collection of sounds, unrelated to GM. They were quite popular in computer games, when file sizes were limited.

The fact that soundfonts are not huge, and also because there are many soundfonts floating around, I think they are a good fit to be used on the web.

Loading Soundfonts

Luckily, other people have already figured out how to parse sf2 files on the web. After some digging, I found the lib soundfont2 being the best option:

import { SoundFont2 } from 'soundfont2';

async function loadSoundfont(url) {
  // load some sf2 file into an array buffer:
  const buffer = await fetch(url).then((res) => res.arrayBuffer());
  // convert buffer to Uint8Array:
  const data = new Uint8Array(buffer);
  // parse the sf2 file:
  return SoundFont2.from(data);
}
// let's try it out:
loadSoundfont(
  'https://raw.githubusercontent.com/felixroos/felixroos.github.io/main/public/Earthbound_NEW.sf2'
).then((font) => {
  console.log('font loaded', font);
});

What comes back is a huuuuuge object containing all the soundfont data! The top level properties are:

  • banks
  • chunk
  • instruments
  • metaData
  • presetData
  • presets
  • sampleData
  • samples

The SF2 Onion

So there is a whole lot of different data types in there. So far, I am too lazy to read the spec, but the general structure looks like this:

sf2 structure

I think the best idea to understand all of this is to work through layer by layer

Playing Samples

In the diagram, we can see the samples are the smallest entity of the system. Why not list all available samples and make them playable first?

The Sample Object

A sample object returned from the parser looks like this:

{
  "data": [
    /* Lots of values between -32768 and 32768 */
  ], // Int16Array
  "header": {
    "name": "accordian 2",
    "start": 6142, // first sample
    "end": 9758, // last sample
    "startLoop": 9566, // loop begin
    "endLoop": 9758, // loop end
    "sampleRate": 32000, // samples / second
    "originalPitch": 64, // midi number
    "pitchCorrection": -19, // cents
    "link": 0, // ?
    "type": 1 // ?
  }
}

Here we see an important feature of soundfonts: loop points! These allow notes to sustain endlessly while the file size can remain small.

Getting the Buffer Source

The get a playable AudioBufferSourceNode from a sample object, I did the following:

export function getBufferSourceFromSample(ctx, sample, pitch) {
  const { header, data: int16 } = sample;
  // convert Int16Array to Float32Array:
  const float32 = new Float32Array(int16.length);
  for (let i = 0; i < int16.length; i++) {
    float32[i] = int16[i] / 32768; // convert to [-1, 1]
  }
  const buffer = ctx.createBuffer(1, float32.length, header.sampleRate);
  const channelData = buffer.getChannelData(0);
  channelData.set(float32);
  const source = ctx.createBufferSource();
  source.buffer = buffer;
  // calculate playbackRate
  const baseDetune =
    header.originalPitch - header.pitchCorrection / 100.0 - zone.fineTune;
  const playbackRate =
    1.0 * Math.pow(2, (100.0 * (pitch - baseDetune)) / 1200.0);
  source.playbackRate.value = playbackRate;
  // set loop!
  if (header.endLoop > header.startLoop) {
    const loopStart = header.startLoop - header.start;
    source.loopStart = loopStart / header.sampleRate;
    source.loopEnd = (header.endLoop - header.start) / header.sampleRate;
    source.loop = true;
  }
  return source;
}

Playing the Sample

To conveniently start and stop the sample without pops, I used this function:

export const startSample = (ctx, sample, pitch, time = ctx.currentTime) => {
  let source = getBufferSourceFromSample(ctx, sample, pitch);
  let gain = ctx.createGain();
  gain.connect(ctx.destination);
  source.connect(gain);
  source.start(time);
  // return stop handle
  return () => {
    if (!gain || !source) {
      // already stopped / not started
      return;
    }
    const end = ctx.currentTime + 0.1; // fade out
    gain.gain.linearRampToValueAtTime(0, end);
    source.stop(end);
    source = undefined;
    gain = undefined;
  };
};

Result

Throwing all of that together, we can listen to all samples of the sf2 file!

And there we go! The core entity of the soundfonts seems to work, what's left is all the onion layers around..

Instruments

Let's peel the next layer: Instruments are a collection of samples:

Properties

Here is an example of an instrument object:

{
  "header": {
    "bagIndex": 0,
    "name": "Boombox Kit"
  },
  "zones": [
    {
      "keyRange": { "lo": 48, "hi": 51 }, // can also be undefined
      "generators": {
        "43": { "id": 43, "range": { "lo": 48, "hi": 51 } },
        "53": { "id": 53, "amount": 32 },ƒ
        "58": { "id": 58, "amount": 65 }
      },
      "modulators": {}, // i have yet to find an instrument with something in here
      "sample": {
        /* sample object */
      }
    }
    /* more zones */
  ]
}

So basically, each instrument has one or more zones where each contains a sample for a specific key range. Each zone can also define generators and modulators, whatever that is...

Generators

At this point, I have to find out what those generators are, by looking at the spec. We can find out what those numbers mean in the Section "8.1.2 Generator Enumerators Defined":

const generators = {
  // sample control
  0: 'startAddrsOffset', // moves sample start point
  1: 'endAddrOffset', // moves sample end point
  // loop control
  2: 'startloopAddrsOffset', // moves loop start point
  3: 'endloopAddrsOffset', // moves loop end point
  4: 'startAddrsCoarseOffset', // ?
  // pitch modulation
  5: 'modLfoToPitch', // modulation lfo pitch modulation in cents
  6: 'vibLfoToPitch', // vibrato lfo pitch modulation in cents
  7: 'modEnvToPitch', // modulation envelope pitch modulation in cents
  // filter
  8: 'initialFilterFc', // lowpass filter cutoff in cents
  9: 'initialFilterQ', // lowpass filter resonance
  // filter modulation
  10: 'modLfoToFilterFc', // modulation lfo lowpass filter cutoff in cents
  11: 'modEnvToFilterFc', // modulation envelope lowpass filter cutoff in cents
  //
  12: 'endAddrsCoarseOffset', // ?
  13: 'modLfoToVolume', // modulation lfo volume (tremolo), where 100 = 10dB
  14: 'unused1',
  15: 'chorusEffectsSend', // how much is sent to chorus 0 - 1000
  16: 'reverbEffectsSend', // how much is sent to reverb 0 - 1000
  17: 'pan', // panning, where -500 = left, 0 = center, 500 = right
  18: 'unused2',
  19: 'unused3',
  20: 'unused4',
  // mod lfo
  21: 'delayModLFO', // delay for mod lfo to start from zero (weird scale)
  22: 'freqModLFO', // frequency of mod lfo, 0 = 8.176Hz, unit: f => 1200log2(f/8.176)
  // vib lfo
  23: 'delayVibLFO', // delay for vibrato lfo to start from zero (weird scale)
  24: 'freqVibLFO', // frequency of vibrato lfo, 0 = 8.176Hz, unit: f => 1200log2(f/8.176)
  // mod env
  25: 'delayModEnv', // 0 = 1s declay till mod env starts
  26: 'attackModEnv', // attack of mod env
  27: 'holdModEnv', // hold of mod env
  28: 'decayModEnv', // decay of mod env
  29: 'sustainModEnv', // sustain of mod env
  30: 'releaseModEnv', // release of mod env
  31: 'keyNumToModEnvHold', // also modulating mod envelope hold with key number
  32: 'keyNumToModEnvDecay', // also modulating mod envelope decay with key number
  // vol env
  33: 'delayVolEnv', // delay of envelope from zero (weird scale)
  34: 'attackVolEnv', // attack of envelope
  35: 'holdVolEnv', // hold of envelope
  36: 'decayVolEnv', // decay of envelope
  37: 'sustainVolEnv', // sustain of envelope
  38: 'releaseVolEnv', // release of envelope
  39: 'keyNumToVolEnvHold',
  40: 'keyNumToVolEnvDecay',
  // zone
  41: 'instrument', // instrument index to use for preset zone
  42: 'reserved1',
  43: 'keyRange', // key range for which preset / instrument zone is active
  44: 'velRange', // velocity range for which preset / instrument zone is active
  45: 'startloopAddrsCoarseOffset', // ?
  46: 'keyNum', // instrument only: always use this midi number (ignore whats pressed)
  // gain
  47: 'velocity', // instrument only: always use this velocity (ignore whats pressed)
  48: 'initialAttenuation', // allows turning down the volume, 10 = -1dB
  49: 'reserved2',
  50: 'endloopAddrsCoarseOffset', // ?
  // tune
  51: 'coarseTune', // pitch offset in semitones
  52: 'fineTune', // pitch offset in cents
  // sample
  53: 'sampleID', // instrument zone only: which sample to use
  54: 'sampleModes', // 0 = no loop, 1 = loop, 2 = reserved, 3 = loop and play till end in release phase
  55: 'reserved3',
  56: 'scaleTuning', // the degree to which MIDI key number influences pitch, 100 = default
  57: 'exclusiveClass', // = cut = choke group
  58: 'overridingRootKey', // can override the sample's originalPitch
  59: 'unused5',
  60: 'endOper',
};

This is quite a list, though I am not sure if it makes sense to implement all of those.

Presets

So far, we've looked at samples, instruments and instrument zones. Let's look at the last 2 layers of the onion: presets and preset zones.

Here we see that each preset can have one or multiple zones containing an instrument. Like with instrument zones, each preset zone can define generators to set certain properties:

{
  "header": {
    "name": "FM Carillion",
    "preset": 1,
    "bank": 0,
    "bagIndex": 594,
    "library": 0,
    "genre": 0,
    "morphology": 0
  },
  "zones": [
    {
      "generators": {
        "41": {
          "id": 41,
          "amount": 236
        }
      },
      "instrument": {
        /*instrument object */
      },
      // "keyRange": undefined,
      "mondulators": {}
    }
  ]
}

From Note on to Sound

After knowing the structure of a soundfont, let's think about what happens when we play a note.

  1. Select a preset
  2. Press a key (defines midi number + velocity)
  3. Use all preset zones with matching keyRange and velRange
  4. For each active preset zone, decide which instrument zones to use, based on keyRange and velRange
  5. Play the sample of each active instrument zones
  6. Shape the sound using the generators of the preset zone, as well as the instrument zone

I made these steps up, without checking the spec, but I think this is the way it should work. We will find out later...

Let's now implement each of those steps!

Demo

In the following demo, you can press a key on the piano and the soundfont player will play the appropriate sound, using the steps outlined above:

Because this got a little more involved, I started a library called sfumato. You can check out the source code there! I will probably write another post about soundfonts in the future. It also has a separate Demo.

Felix Roos 2023