Progress Report August 2018

Last week we released our newest stable build, version 1.2.6; marking 6 months since our last progress report. During this time, many quality of life features have been implemented to make the emulator easier and more enjoyable to use, graphics have taken another step forward, and our audio emulation has finally reached a mature point where there are features yet to be implemented, but what's implemented now works great.

Additionally, over 200 GitLab issues have been closed another 10% of the Dreamcast's library has become playable during this time.

Let's get into looking at some of the more notable changes since our last report.

New User Interface

This new interface is platform-agnostic and still completely controller navigable. It's been rewritten to support being accessed while the emulator is running and contains many new quality of life features.

New library browser

In the previous interface, the library browser used disc-shaped art, loaded from each game's disc. On the Dreamcast, most games shipped with this disc-shaped art to be used by the system's CD player functionality. However, not all games shipped with it, and for those that did the art wasn't always descriptive. Now, instead of relying on the on-disc art that may or may not exist, each game's box art is downloaded at runtime, enabling us to provide art for all of the games in your library.

 
 

Custom controller binds

New support for customizable controller binds has landed. While most controllers are automatically configured, this enables support for less common controllers.

 
 

Improved arcade stick support

Support has also been added for binding the keyboard to multiple Dreamcast controllers. Not because using a real keyboard is practical for multiple controllers, but because some multi-player arcade sticks present themselves as a single keyboard, with each player's inputs bound to unique keys.

With this feature, these sticks can now be used to control multiple players in all the great fighting games on the Dreamcast.

Graphics improvements

Optimized texture decoding

Multiple waves of optimizations were made over the last 6 months to speed up our texture decoding. The result of this has been drastic - the frame rates in fighting games such as Street Fighter 3 - Double Impact have doubled and Mars Matrix is now fully playable.

Per-pixel blending

In most graphics APIs, when blending transparent polygons, the polygons must be submitted in a sorted order based on their depth in order to blend correctly. On the Dreamcast however, transparent polygons could be submitted in any order and the hardware itself would sort and blend the results at the per-pixel level. The Dreamcast's support for this is often referred to as Order-independent transparency, and its PowerVR GPU is one of the only GPUs to have ever shipped with explicit support for this.

Blending basics

As a quick primer, the reason the polygons must be sorted is because many blending operations are not commutative, take this example:

 

In this case, there are three colors being blended - the background, the blue polygon and the green polygon. As you can see, based on the order they are blended, the final color is altered. This is because the blending equation being used, out = dst.rgb + src.a * (src.rgb - dst.rgb), isn't commutative. Because the operation isn't commutative (e.g. 5 - 3 - 1 is not the same as 1 - 3 - 5) the blend order is extremely important. Surfaces must be drawn from back (furthest away) to front (closest to camera) in order to blend correctly.

The above example covers the common case, where the two transparent surfaces do not intersect. Unfortunately, things get much more complicated when they intersect:

In this case, the two polygons are intersecting, neither is strictly in front of or behind the other. If they were to be blended into the framebuffer one at a time, what would be the correct order? The short answer is, there isn't one. The first image above shows what would happen if the blue polygon was blended first, the second image shows what would happen if the green polygon was blended first and the final image shows what's actually expected.

Depending on the situation, there are a few workarounds. If the intersecting polygons are part of a model, the artist could split the blue polygon in half where it intersects with the green polygon ahead of time. If the intersecting polygons are the result of some particles colliding at runtime, some expensive code could be used to try and perform a similar subdivision at runtime. Ultimately though, this problem is a fundamental limitation of blending primitives one by one as is done with most graphics APIs.

Blending on the PowerVR

As mentioned earlier, the Dreamcast's PowerVR supported a feature commonly refered to as Order-Independent Transparency. To support this, the PowerVR didn't blend each primitive into the framebuffer one by one as I've described so far. Instead, the PowerVR used additional data structures to keep track of which primitives covered what pixels, and then iterated over each pixel, sorting the polygons based on their depth and blending them into the final framebuffer correctly.

While modern graphics APIs don't have a straight-forward way to blend transparent polygons in an order-independent fashion as the Dreamcast's PowerVR did, there have been many techniques developed that achieve the same results with some additional work. For our purpose, per-pixel linked lists were used to implement this. Let's take a look at what this technique would produce in this situation, as well as the intermediate per-pixel information it would need to get there:

The image on the left is the actual result that's expected when blending the two polygons in an order-independent fashion. The middle image contains labels for a few pixels of interest. Finally, the image on the right shows the information recorded for each pixel in order to correctly blend the result. This information is a list of:

  1. The color of each polygon that covers the pixel.
  2. The depth of each polygon that covers the pixel. In this case, larger depth values are farther away.

Let's walk through the process of blending each of the pixels of interest using this information:

  1. Only the green polygon covers this pixel. Nothing to sort, it's just blended with the background.
  2. The green and blue polygon both cover this pixel. After sorting the list by depth, the blue color would be blended first as it's farthest away.
  3. The green and blue polygon both cover this pixel. After sorting the list by depth, the green color would be blended first as it's farthest away.
  4. Only the blue polygon covers this pixel. Nothing to sort, it's just blended with the background.

After this process has completed for every pixel, the final result should be blended correctly regardless of the order of the original primitive data. This per-pixel blending let Dreamcast developers create scenes full of transparent polygons that would have otherwise taken a prohibitive amount of effort to ensure there were no blending errors due to incorrect sorting or intersections.

Take a look at the blending differences in Sonic Adventure when Chaos is blended per-triangle vs with the new per-pixel implementation:

Audio improvements

Amplitude envelope support

The Dreamcast's Yamaha AICA sound chip supported mixing up to 64 simultaneous channels of audio. Each of these 64 channels could have various effects applied to them, such as amplitude envelopes, filter envelopes or low-frequency oscillation.

The most commonly used (and most important to emulate) of these effects are the amplitude envelopes. These envelopes not only control the overall volume of each channel, but for channels which repeat in a loop, how long the channel will play for. The reason being that looping channels don't have a prescribed end time, they instead end once the amplitude envelope reaches a certain level.

Many games are programmed in such a way that events or actions happen only once a channel has stopped playing. Some events won't trigger in games, and in some cases the games will hang completely waiting for the channel to end if they aren't emulated correctly. Fortunately, support for emulating this is finally complete, greatly improving compatibility and quality of the generated audio. Check out the before and after from Marvel vs. Capcom 2:

Compact Disc Digital Audio support

After mixing the 64 channels of audio, the AICA could optionally mix in raw CDDA audio supplied directly from the GD-ROM. Not having this didn't cause games to break, but it did cause many games to not have background music.

Panning support

In addition to the per-channel amplitude envelopes, support for the per-channel panning settings has been added. The Neversoft intro from the Tony Hawk games now sounds correct:

General improvements

Improved Audio / Video Synchronization

Up until this past month, redream has used what's commonly known as an "audio sync" to control how fast the emulator runs. With this, vsync is disabled and the amount of buffered audio is used to drive the emulation speed. This avoids audio artifacts, at the expense of frames tearing.

We've now implemented what we refer to as the "multi sync" which synchronizes to both audio and video in a way that eliminates audio artifacts and frame tearing. A detailed write up of how this works can be found here.

Asynchronous DMA transfers

Direct memory access is a mechanism used by hardware subsystems to access parts of the main system's memory. In the case of the Dreamcast, DMA transfers are commonly used to load data from the GD-ROM into system memory, and to write audio and texture data to the audio and graphics hardware.

On the real hardware, these DMA transfers took some amount of time based on how much data was transferred. The process looked like:

  1. The CPU would request a DMA transfer, specifying the source and destination addresses, and the amount of data to be transferred.
  2. The DMA controller would see the request, and start running the transfer in parallel to the CPU continuing to run more code.
  3. Once the transfer was complete, the DMA controller would generate an interrupt informing the CPU that the transfer was complete.

The main takeway is that the transfer took some amount of time and ran in parallel. Up until a few months ago, this wasn't emulated and instead each transfer was performed instantly. Performing the transfers instantly made for simpler code and had some seemingly cool properties such as faster GD-ROM load times. However, it caused multiple games to be broken and caused hard to find bugs in many others.

As of v1.2.0 these DMA transfers are now emulated to run in parallel, resolving a slew of nuanced issues such as the PAL edition of Shenmue 1 corrupting VMU saves only when running in 60hz mode and the Skies of Arcadia intro hanging in the middle.

ARM emulation improvements

In May there was an effort to iron out a few lingering issues with the emulation of the ARM CPU used by the Dreamcast's AICA chip. The most important issue was related to unaligned memory accesses, which caused audio to go off the rails in Chu Chu Rocket and resulted in horrible distortion on the 2K Sports games.

For a bit of context, an unaligned access refers to a memory read or write where the address being accessed isn't a multiple of a particular number. The particular number here is determined by the CPU architecture, and is often either 4 or 8 bytes. Say for example that the alignment is 4 bytes, and the address being accessed is 0x4, that would be aligned. However, if the alignment is 4 bytes and the address being accessed is 0x6, that would be unaligned.

Every architecture deals with unaligned accesses differently, some:

  • Refuse to perform the unaligned access, resulting in an error.
  • Perform the unaligned access as expected, at a hit to performance.
  • Perform the unaligned access, returning different data than what's expected.

As it turns out, the ARM CPU on the Dreamcast falls into the last category - the access is performed, but the data isn't quite as expected. To illustrate, imagine that the ARM's memory is laid out like so:

Address Data
0x8004100
0x8005101
0x8006102
0x8007103
0x8008104
0x8009105

If 4 bytes were read from 0x8004, it'd seem that 100, 101, 102 and 103 would be read. Since 0x8004 is 4 byte aligned (because it's evenly divisible by 4), this is correct.

However, if 4 bytes were read from 0x8006, while it may seem that 102, 103, 104 and 105 would be returned - that's not the case with the Dreamcast's ARM7DI CPU. Instead, it will internally 4 byte align the read to 0x8004 reading 100, 101, 102 and 103 - but then rotate the result such that the data at 0x8006 comes first, producing a final result of 102, 103, 100 and 101.

As it turns out, redream was treating unaligned accesses the same as aligned accesses, which resulted in reading incorrect data from memory. Once fixed, several games which were having otherwise inexplicable audio issues had been resolved.

SH4 emulation improvements

In February, support for the MACL, MACW and TRAPA instructions landed. This fixed one of my personal favorite games, Propeller Arena as well as Legacy of Kain: Soul Reaver, POD 2 and F1 Racing Championship.

In Closing

Given that it's been more than 6 months since our last progress report, there's unfortunately many features and bug fixes that've slipped through the cracks on this update. We'll try to do these more frequently to prevent this in the future.

In summary though, the emulator is now easier than ever to use, our graphics and audio emulation has matured significantly and over 80% of the Dreamcast's library can now not only just be played, but truly enjoyed.

Recently work has started to resume on Android after sitting on the backburner for a long time - expect to see more from that soon. Additionally, we have a slew of new optimizations piling up thanks to the Android work to continue to drive down our system requirements.

As always, if you'd like to follow along with the development or suggest improvements, join us in our Discord server.