Technical: Music and Sound

botchan-musicIt’s time for a new technical post!

As promised today, we want to share with you how we implemented music and sound in CrossCode. This includes how we used the WebAudio API to increase the sound quality and make perfect loops possible. Any code you will encounter is not directly from the game and might be pseudo-code to some extend.

So… grab some fish’n’chips and tea, this is going to be a long post!

The WebAudio API

For starters, the WebAudio API is a more or less high-level API for processing sound via JavaScript. Now that means that we can not only play sounds via the API, but we can also manipulate the sound in any way we want. To make this magic happen, the API uses a routing graph. You have a master node (called destination in the API) and attach a number of different nodes to it. Look at this figure showing how this visually looks like:
graph_simple

AudioNode represents a single sound that will be played. As you can see there aren’t only AudioNodesthough. The WebAudio API features lots of nodes that make it possible to manipulate the actual sound. In the graph you can see a GainNode. This node is used to change the volume of the sound. Like any graph structure it is also possible to attach multiple nodes to another node. This can be really useful if you want to pass each sound through a single GainNode to change the volume for all sounds.
Other nodes make it possible to apply filters on a sound. Or you can use multiple sounds to cross-fade them based on your position in the game world. We don’t want to cover all the possibilities here since there already is a great article over at Html5Rocks. Check out this link:

Developing Game Audio with the Web Audio API

This article features some very nice effects you can use and is a great start for learning how to use the API.

The API’s starting point is a single object that has all the functions to create different nodes:

window["AudioContext"] = window["AudioContext"] || window["webkitAudioContext"];
this.context = new window["AudioContext"]();

Goals for CrossCode

For CrossCode we made a small list of things we wanted to have for both music and sound.

  • loop music without stutters
  • play sounds in a 3D enviroment
  • loop without making the music perfectly loop-able
  • fade-in/out music
  • pausing/resuming music

Lots of these points are already covered by the audio element. But the first 2 are hard to achieve. Perfect timing for music isn’t easily achieved and you might have noticed it too while playing CrossCode. The same goes for positioning sound in 3D. We might also come up with other ideas along the way that aren’t possible to do with the audio element so we started working on integrating the WebAudio API.

So, lets dive into the implementations! We’re going to begin with sound to explain some basic aspects of the API. We will also talk about the default implementation which is mostly covered by the music and sound classes impact.js provides.

Sound

As said, impact.js covers a lot of ground here already. But as usual we changed a lot to make it work for us. One of these changes is to separate the default implementation, which uses the audio element, from the WebAudio Implementation. Normally we load a sound roughly like this:

var sound = new Sound("path");

And we wanted to keep it that way. So we have two sound implementations sharing the same interface, SoundWebKit and SoundDefault:

if(window["webkitAudioContext"]) {
    Sound = SoundWebAudio;
} else {
    Sound = SoundDefault;
}

When the game loads, it will detect your browser and switch to the WebAudio API if it is supported. For the default implementation we simply load an audio element. Since each audio element can’t play its sound multiple times in parallel, we need to load several audio elements of the same sound for parallel playback. This is all handled nicely by impact.js so we don’t have to worry about it.

However for WebAudio we had to come up with something on our own. To load a sound via the API you mainly use a XMLHttpRequest. But simply loading isn’t enough. We also need to decode the data we get from the request. This is done via the context mentioned in the WebAudio API earlier. Let’s simply assume that we have a global context created and ready to use. The code to load a sound would look like this:

var request = new XMLHttpRequest();
request.open('GET', path, true);
request.responseType = 'arraybuffer';
request.onload = function () {
    sound.context.decodeAudioData(request.response, function (buffer) {
        sound.buffers[path] = buffer;
    }, someFancyErrorHandler())
}
request.send();

As you can see there isn’t much effort in loading the sound file and decoding it. After we decoded the sound we cache the buffer in some global state, so we don’t need to load it again. This is also true for quickly repeating the same sound! The buffer will then be referenced in the WebAudio implementation of the sound object.

You’ll need to create a node each time you play the sound. That’s right, In contrast to the audio element we can’t simply reset the playback position to zero and play again:

play: function() {
    // creates a AudioNode called AudioBufferSourceNode
    var node = sound.context.createBufferSource(); 
    node.buffer = this.buffer;

    node.connect(sound.context.destination);
    node.start(0); // or node.noteOn(0);
}

As you can see we assign the buffer of the sound to the node. The AudioNode here is a so called AudioBufferSourceNode. As the name implies it is the base node for playing back any sound. We then need to connect the node to the destination node of the context. As explained at the beginning of this post, this way the sound gets routed through the audio graph. To play it we use the start/noteOn method. Don’t get confused by the parameter. It is mandatory and is used as the offset time to start the playback.

But of course this is not all we do. We also want to position sound, right? For this we simply use another node that WebAudio already provides to set the position of a sound in 3D: the PannerNode. We now need to extend the method above to include a possible position:

play: function(pos) {
    var node = sound.context.createBufferSource();
    node.buffer = this.buffer;

    var position = sound.context.createPanner();
    position.setPosition(pos.x, pos.y);

    node.connect(position);
    position.connect(sound.context.destination);
    node.start(0); // or node.noteOn(0);

    return {pos: position, source: node};
}

For this method, we simply assume that every sound has a position. As you can see we create a new PannerNode here and connect it to the AudioBufferSourceNode. We then connect the position node to the destination and start playing the sound like before. In CrossCode we have a little bit extra code here which adjusts the position to be in world space. There is also a bit of code that makes sure that when you’re in a certain radius of an object that sound plays normally to create a smoother experience. Otherwise it would be very distracting if every sound makes a sudden gap from the right to the left speaker.
Notice that we also return an object here? In this pseudo-code it returns the position node and the source node. We do this to be able to adjust the position in real-time and be able to stop playback manually.

This is basically what we do with WebAudio here. Of course not all browsers currently support WebAudio. So you might wonder what we do with positioned sound for those players. Of course we can’t achieve the same effect with the audio element, so we simulate the same effect via the volume of the sound. The further you away from the sound the softer it is. We hope that eventually all browsers will support the API so everyone can have positioned sound in their favorite browser.

And this is it for sound! Let’s summarize everything in a small list:

  • create different implementations for WebAudio and audio element
  • use impact.js sound loading for the default implementation
  • load, decode and cache loaded sounds
  • play a sound with a position via PannerNode
  • simulate positioned sound via volume in default implementation
  • return a handle to manipulate the position of a sound in real-time

Music

In Weekly Update #9 we already told you that we basically rewrote the whole music player impact.js provides. We created a new interface that fits more our use-cases. We also separated the music player from the actual track. Each track has a minimal interface:

Track = {

    loopEnd: 0,

    play: function() {},
    pause: function() {},
    stop: function() {},
    setVolume: function() {},
    getVolume: function() {}
}

Just like with the sound we now have two alternative track implementations. One for the audio element and one for WebAudio. Again, for default we use the basic loading that impact.js provides. When creating a track we make a reference to the audio element and play it when we need it. The advantage of audio elements here is that they support streaming. That means we don’t have to wait until the whole track is loaded to start the game. For WebAudio… well, here is the bummer: we have the load the whole piece. The API supports the connection to audio elements (via the MediaElementAudioSourceNode) to support streaming. However, with the audio element as source we again lose all the timing advantages we get from WebAudio. Thus, we’re stuck with an increased loading time when using WebAudio. Of course since a browser will cache files, restarting the game will make the game load faster again, but we still decided it’s a better idea to provide music via Web Audio as an option the player can choose.
As you can see in the code we also have a property called loopEnd. It stands for the playback position when the track should loop. So our tracks don’t exactly loop at the end of the file. Instead we created a system for each track implementation which will start playing a second audio element or AudioNode when a timer hits the loopEnd mark. This makes it possible to seamlessly loop tracks. For the audio element we use the timeupdate callback that is provided by the element:

this.track.addEventListener('timeupdate', function(
    if(this.track.currentTime >= this.loopEnd) {
        // loop track
    }
));

For WebAudio we use the current time of the WebAudio context to precisely time the next loop. We also have a loop count, which increases every loop. With this we time each new loop from the first playback of the track. This makes sure that the next loop is always started before the previous loop ended. This can be done easily via WebAudio since every node has a start time parameter. We simply calculate the offset from the startTime multiplied with the loop count and get the same timing for every loop. Here is some code that explains roughly how we did this:

// called periodically 
loopcheck: function() {
    var currentTime = sound.context.currenTime;
    var nextOffset = track.duration - loopEnd;
    if (currentTime > (startTime + loopEnd * (loopCount + 1) + nextOffset)) {
        var tempNode = sound.context.createBufferSource();
        tempNode.buffer = soundBuffer;

        currentNode.stop(0);

        currentNode = this.nextNode;
        nextNode = tempNode;

        loopCount++;
        var offset = startTime + (loopCount + 1) * loopEnd;

        sound.context.connect(this.nextNode);

        nextNode.start(offset);
    }

},

This might seem a little bit confusing. But it does exactly what we want. As you can see we do not exactly start the next node, but rather the node after the next node. This also means that when starting the playback we have to start two nodes. However thanks to WebAudio we can simply shift the start time of the playback and thus time it the way we want.

All of this means that we have some sort of meta-data for each track. Next to the loopEnd property we might also have an intro that is played before the first loop starts. The intro has an introEnd property which works just like loopEnd. We adjust the loop about the time of of introEnd and voila, we have a custom intro for our tracks!

Now back to the interface we use to play the tracks via our music player. There are several methods our player has:

  • play
  • push
  • pop
  • inbetween
  • pause / resume

Play simply starts a track. It also stops the currently playing track. A track will be played by name, which means each track will be preloaded along with it’s meta-data and name.
Push allows us to start a new track while the old one will be paused.
Pop simply stops the track at the top of the stack and starts the paused track below the top stack element, continuing from where we left off.
With the push and pop commands we introduce a stack in our music player to control which music piece is currently played. This is done to make sure we can play tracks in between other tracks. Think of a regular background theme. Now a battle starts and the battle music begins to play. After the fight is over the battle music stops and the regular background music start playing at the position it stopped.
InBetween is a mix of push and pop. You could call it a push with an automated pop. This is even true for tracks that would normally loop. We use this for short tracks like an item get sound (shameless reference to the Zelda series here).

All of these four methods, also include parameters to fade-in and fade-out the tracks. When pushing a new track, we can fade-out the current one and fade-in the pushed one. This is especially nice when you end an battle and while the battle theme is fading out a new battle starts. We simply fade-in the battle theme without even restarting the track. It creates a much better feeling for the player too.
Pause and Resume simply pause/resume the currently playing track. We use this mainly when you switch tabs. The music stops until you re-enter the tab in which the game plays.

Okay, let’s summarize again what we do for music in CrossCode:

  • created different track implementations for WebAudio and audio element
  • use custom loop mechanic to perfectly loop tracks
  • music player loads tracks with meta-data
  • can push/pop tracks at any time
  • fade tracks when changing tracks
  • play a single track in between another track

Conclusion

All in all creating such a dynamic system sure wasn’t easy. But it was worth the effort, especially for WebAudio. We can not only position sound in 3D, we can also perfectly loop our tracks even when using our own looping mechanic. There is the small issue with completely pre-loading each track in WebAudio, but we hope that somewhere along the line of web technology a solution will emerge. And if not, we will make sure that the player gets some option since no one likes long loading times!

Pheeew~ This is all we have to say on how we do music and sound in CrossCode! If you have any questions leave a comment and we will make sure to answer it!
We hope you liked this technical rant. It was really a lot to cover and a lot to read (let’s hope we did not forget something on the way).
If you wish to read a technical topic about another feature we use in CrossCode you can ask us too!

Until next time!

7 Comments

  • Grenzen der Zeit on August 23, 2013 at 8:29 pm said:

    Did you use https://github.com/goldfire/howler.js as a base? It does almost everything you want!

    • Yes, we know of Howler.js. But as you said it only does “almost” everything we want.
      It does not support cross fades or looping at a certain position. This is very important for us, as you can read in the technical :)
      There are also some little things missing, like the stack behavior.

  • Kudos for the implementation, such technical articles & explanations for a real example are great to learn. If you got some time in the future, I would appreciate if you could talk about the AI which handles enemy/boss behavior and adapting their states :)

    [btw: In case you didn’t already read this article, perhaps you could consider including the mentioned monkey patch to support the standardized AudioContext API in Firefox as well…]

    • Hey there!

      Thanks for the suggestion. As you can read in the post, we already have some sort of patch for this.
      Since Firefox is going to implement the WebAudio API like described in the spec, everything should work normal as long as we make sure the prefix is correct.
      Of course, we make sure that methods are checked too.

      Making a post about our AI is possible. We currently have no boss in the TechDemo, so we might wait until we release the rel demo.

  • Is connectMusic() a typo? I can’t find any info on that method.

  • This ability to play sound in 3D environment is awesome! Just wondering, can you add feature to turn off music if the tab not opened? i mean, if the visitor open other tab beside this, the music/sound will turned off, and back on again when he go to this tab. This won’t be annoying if can be done. Thanks

One Trackback

Post a Reply to Scott P Cancel reply

Your email is kept private. Required fields are marked *