July 5, 2012

Windows Speaker IconEdited on 7/8/2012 to include WASAPI. Thanks to Dan Moxon for asking about it!

In Windows (and Mac, and Linux) you want to plug something in, launch a program, and have it “just work”. Most of the time this happens. When it doesn’t, you find yourself unplugging, replugging, rebooting, and screaming at the Internet until you find a solution.

Assuming your equipment is working, did you know that it’s possible that it could be working better?

Windows actually includes more than one “sound system” and each system has different performance and quality trade offs. With all operating systems, sound gets in and out of the computer through driver software. These drivers sit in between the hardware and the Operating System. The Operating System, in turn, has another layer in between the driver and your software, which is called an interface, or API (Application Programming Interface).

Windows currently ships with four audio interfaces: MME, DirectSound, WDM/KS*, and WASAPI. A fifth, ASIO, can be installed to give better performance and flexibility for advanced audio hardware. There are others, like GSIF, but they are far less common than the five above so they won’t be covered here.

For practical purposes, you should only use one of these systems for all of your audio production, though they will all work simultaneously.

Which One Should I Use?
For the best performance, you should use: ASIO if available, then WDM/KS, then DirectSound, and only use MME if there is no other option. WASAPI isn’t offered in most audio software and doesn’t bring much that WDM/KS doesn’t, so I can’t recommend using it.

What’s The Difference?
The main difference is what sort of technologies each interface uses, or at least prefers. Some rely on your main processor, some off-load to the chip on your soundcard, and still some will use advanced hardware features of certain hardware.

MME (MultiMedia Extensions) is also referred to as “wave in”, “wave out”, or “mmsystem” and is the oldest and “dumbest” of the contenders. It was introduced way back with Windows 3.0! It is still around as a fallback, because all sound cards and applications can talk to it, which is a huge benefit. The primary downside is that it is very inefficient, having the most latency of any of the technologies. Another issue is that it is limited to a 16 bit audio depth and 44.1 kHz sample rate. Sample rate and audio depth are explained in my post about getting a great sounding podcast episode in the smallest file size.

Latency is the time it takes for a signal to go from end to end. In multiplayer video games the higher the latency, the longer it takes for your action to get to the server, and can mean the difference between dodging a bullet or being fragged.

In audio it is a measure of the delay from the time you hit a key on a piano; strum a guitar; or talk into a microphone and hear the result out of your speakers or into your headphones. If the latency is too high, you will hear echoing, or find it impossible to line up a new recording with a track that is playing back.

Ten milliseconds (10ms) is the magic number. Anything under 10ms is imperceptible to most humans. Some say they notice anything above 5ms, but they are either kidding themselves or just have super sensitive ears. Many people will not perceive latency up to 20ms at all, though between 10 – 15ms most people will notice either an echo, or dulling of what they hear.

The HaaS Effect tells us that our ears (or at least our brains) will combine anything coming from a single sound source that is under 30ms. While true, that doesn’t mean that the audio isn’t diminished (muddied) or feel “echoey”.

DirectSound (part of DirectX) was introduced with Windows95 and improves hugely upon MME. It offers much lower latency and supports increased bit depth and sampling rates. One of the reasons latency is decreased is that audio processing can be off-loaded from the main CPU to the audio card’s chip but internally it has complicated caching and mappings that inherently reduce latency.

WDM KS (Windows Driver Model / Kernel Streaming) has been around since Windows 98. WMD is a universal driver structure and behavior making it possible to use a single binary driver for Windows 98 through Win7. Kernel Streaming is just what it sounds like: audio and video streams directly through the Windows Kernel offering extremely low latency. The kernel, if you’re wondering, is really the heart of an Operating System – being the lowest level software that everything else is built upon.

Since Windows 2000, MME and DirectSound are actually built on top of WDM/KS – so obviously you want to go WDM native to remove the extra layer between your software and equipment.

WASAPI (Windows Audio Session API) was introduced with Windows Vista, but hasn’t seen wide adoption because it doesn’t really provide anything that Kernel Streaming doesn’t and generally doesn’t provide better performance than ASIO.

So, WASAPI is in a weird mid point where it overlaps, but doesn’t bring much new to the table. In addition, it doesn’t provide sample rate conversion, so it requires all audio streams to use the same sample rate as the audio hardware (same as Ardour and JACK under Linux) which can either be a confusing pain the ass, or give a hit in audio quality because an application has poor conversion code.

ASIO (Audio Stream Input/Output) is a standard created by Steinberg (now owned by Yamaha). ASIO is designed for extreme performance and precision. Another benefit is more flexibility when it comes to multiple channels of audio. For instance, recording the individual channels of a mixer, rather than the combined stereo “master” channel.

The only hardware that manufacturers bother providing ASIO support for are higher end sound cards and audio interfaces which include most Firewire, and some USB2, USB3, and PCI (internal) cards.

There are “wrappers” like ASIO4ALL that will make your card look like it has ASIO support, but they are utilitarian and should only be used to solve a specific problem (like bridging devices that only support the other interfaces) and not to improve performance.

They either won’t improve performance, make it worse (adding some latency), or will just be another confusing layer that you have to troubleshoot when things go wrong.


Wrap Up
This was a basic overview. Each of these technologies have their own settings and tricks to lower their latency, or improve audio quality. Also, some of them have other limitations, like the number of simultaneous applications that can use them. Without getting into those specifics, just using the best available Windows interface will greatly improve at least your workflow and frustration, if not the quality of sound that is produced.



* Yes, I know that WDM isn’t actually an audio driver, but that’s how it is presented to the user, and most people won’t benefit from a detailed technical discussion of the differences.

1 thought on “Understanding The Windows Sound System

  1. Kenn Crawford

    Great explanation!
    I use ASIO4ALL when recording with my USB condenser mic and KRISTAL Audio Engine as a free alternative to Audacity for recording because it utilizes ASIO drivers for low latency monitoring, which is especially helpful if you’re USB condenser mic does not have a built-in headphone jack with “real” latency free monitoring.

    Turning your headphones down a little while recording not only helps prevent mic bleed (when the microphone picks up the sound coming from your headphones and it gets re-recorded), a lower volume helps keep latency from throwing you off when recording voice overs. I know if there’s too much latency I tend to talk slower and sound like Capt. Kirk where Every. Word. Is. A. Complete. Sentence. LOL

    If there’s mild latency and it’s throwing you off, turning the headphone volume down a little helps. So does monitoring in mono so you only hear it in one ear, but do not hold the headphones up to one ear like some singers do in videos (see mic bleed mentioned above.)

Comments are closed.