Improving Audio / Video Synchronization with the "Multi Sync"

Earlier this month v1.2.3 was released, introducing a major change to our audio / video synchronization that I'm excited to talk about today.

To set the stage, synchronization in this context refers to adjusting how fast the emulator is running to ensure audio and video is both generated and output at the correct rate. If the original console output 60 video frames and 44,100 audio samples per second, the emulator needs to generate the same amount and output it to the local machine's audio / display devices at a rate that matches the original console as close as possible.

When the emulator doesn't output the audio and video at the correct rate:

  • If audio is output too slow, too little will buffer and underrun, producing audible crackling.
  • If audio is output too fast, too much will buffer and overflow, causing unexpected latency.
  • If video isn't output in sync with the display's refresh rate, frames will be skipped or torn.

Given that garbled audio and torn frames provide a less-than-desirable user experience, it's imperative that the audio and video are generated and output at the correct rate. To help understand the new changes, let's first take a look at the previous implementations to get a feel for the problem domain.

The First Solution

During early development, real time was used to drive the emulation speed. When n milliseconds of real time had elapsed, n milliseconds of emulated time was ran. This solution was great for getting up and running, but it's prone to clock drift over time.

In short, the clock providing the real time and the audio / display clocks will drift apart, resulting in the audio and video generation being desynchronized from its output. This is somewhat amusingly described by Segal's law:

A man with a watch knows what time it is. A man with two watches is never sure.

Having both become desynchronized results in a completely unuseable experience, so this was scrapped once the emulator started actually running games.

The Next Generation

The next solution improved on this by removing the real time clock from the equation - it instead locked the emulation speed to either the video or audio output rate. Doing so ensures that single output stays synchronized, which is markedly better than neither output staying synchronized.

Synchronizing to video

Synchronizing to the video output is done by enabling vsync. With vsync enabled, after a frame has rendered, the video driver will wait until the display's next refresh cycle has completed before letting the emulator resume. Doing this will run the emulator at the local display's refresh rate, and fix any frame tearing issues.

Unfortunately though, the emulator doesn't want to run at the local display's refresh rate, it should run at the emulated console's refresh rate. If the local display's refresh rate is lower, the emulator will run too slow; if it's faster, the emulator will run too fast. When the emulation runs at the wrong rate, audio and video is generated at the wrong rate, even if the video is ultimately presented tear-free.

For the majority of displays out there which have a fixed refresh rate, this solution doesn't cut it as their refresh rate likely won't match the emulated console's. To help visualize what happens when they don't match, take a look at a simulation with the local display refreshing at 30 hz and the emulated console refreshing at 60 hz that's synchronized to video:

While video frames never tear with this solution:

  • The emulation can run at the wrong speed. In this case, half speed.
  • The audio is prone to constantly under and overrunning.
  • It only works well if the refresh rates match.

Synchronizing to audio

Synchronizing to the audio output is done by constantly monitoring how much audio is buffered. When the amount buffered is too low, run the emulator some to fill it up. When the amount buffered is too high, sit around and wait for it to again become low. This will lock the emulation speed to the local audio clock, fixing issues with crackles and overbuffering.

This works great because, unlike with the video sync where the display's refresh rate likely can't be altered, it's almost always possible to configure the audio device to use the same sample rate as the emulated console. Because of this, by synchronizing to the audio clock the emulator will always run at the correct speed.

The only downside is that frames will tear as the video isn't being synchronized. To help visualize, let's look at a 50 hz console being emulated on a 60 hz display that's synchronized to audio:

While frames may tear (look for the red swap blocks above) with this solution:

  • The emulation always runs at the correct speed.
  • The audio doesn't under or overruns.
  • It works across different refresh rates.

The Multi Sync

While the audio sync has proven to be a work horse, it was still tearing frames which needed to be fixed. To do so, I wanted to see if we could do as many modern games are doing, and decouple the frame rendering from the frame presentation. In this decoupled design:

  • The main thread would be synchronized to audio in order to drive the emulation speed correctly. When rendering frames, instead of rendering directly to the back buffer, this thread would instead render to a small swap chain of framebuffers. Once rendering is complete, it would submit the framebuffer to a new presentation thread.
  • A new, secondary presentation thread is created with its own shared GL context which is synchronized to video. This thread should sit in a loop waiting for framebuffers from the main thread. Each time the available framebuffers is non-zero, it blits the latest one to the back buffer and then pushes it to the display device.

After a few nights of work - it was a success. Not only did it work, it surprisingly wasn't too difficult to get working across most vendors.

To help visualize how this works, take a look at the previous simulation from the audio sync section using the new "multi sync":

While this solution does involve some added code complexity / memory bandwidth to handle the extra blits, it solves all of the issues thus far:

  • The emulation always runs at the correct speed thanks to the audio sync.
  • The audio doesn't under or overrun thanks to the audio sync.
  • It works across different refresh rates thanks to the audio sync.
  • Frames do not tear thanks to the video sync.

Beyond those issues, it also has a few more subtle benefits:

  • Lower spec machines have more time to do actual emulation work as the main thread is never blocked by vsync.
  • If the user has vsync force disabled in their control panel, emulation still runs at the correct speed since it's being driven by the audio sync.

In Closing

For users - this is available with the latest stable and development releases, give it a try!

For developers - if you're working with entire framebuffers at a time, using the multi sync has solved all of the major issues we've encountered thus far with presenting smooth audio and video across a wide variety of hardware setups. Additionally, it's fairly straight forward to implement, and has so far been compatible with even the oldest of OpenGL 3.1 era hardware from Intel / AMD / NVIDIA that we focus on supporting.

I'd like give a big thanks to @Sonicadvance1 for his help sharing his experiences and reviewing this article.

Bonus Material

If you'd like to see more simulations than the fixed ones above, tweak the drop downs here and go buckwild: