There’s lots of information on the interweb for Mastering Engineers (MEs) about mastering for streaming, but I still see a lot of questions about levels for streaming. This needs to be cleared up. I think the confusion exists because the information is a bit disjoined, so here’s my attempt to clean it all up for you so you can just go ahead and master songs without worrying about levels.
This article applies to any streaming service, Spotify, Apple Music, Tidal, YouTube etc, but I’ll be using actual figures from Spotify as they’ve publically provided the most detail on how their systems work. This will give you concrete information to work with, and you can use the following techniques to work with any streamer.
If you don’t want to read through all of the detail, you can jump to the takeaway here, otherwise read on.
A ME has two possibly conflicting roles. First is to make the music in front of you sound as good as possible to your, and/or your client’s ears. This is the creative part. Second is to make sure that the finished work is technically competent for mass release. This is the engineering part.
As we can see below, most of you have had the engineering part pretty easy for the last few decades, streaming requires you to pick your game up a bit 🙂
A long long time ago, in a studio far, far away
To be a ME meant working with vinyl, a compromised medium. You needed a creative head to exploit the vagaries of the medium, and a solid engineer head to make sure it would work. Levels, average and peak were very important, and if you got it wrong, either the tracks you’d been working on wouldn’t actually fit on the album, or the record would be unplayable – the needle would literally jump out of the groove!
For the last 40 years however we’ve all been working with digital files on CD. Now anything you did in the studio, anything stored on your DAW would be faithfully transcribed onto a CD.
This meant you could use any level, clip on your ADCs or ITB with levels or limiters, and the CD didn’t care. Creatively you could so what you wanted. Technically, the job was trivial – as long as you fed the CD output software with the right information and copied it right, it was job done.
Of course this ultimately led to the loudness wars, where MEs were instructed by the labels and/or clients to compress the life out of the music, to make it as loud as possible on the CD. The CD happily obliged. In the real world however, broadcasters started to normalise the music, so it was broadcast able, and also applied their own loudness algorithms to compete with other stations. Oh happy days.
The users in turn started to get irritated. Super loud adverts, everything sounding the same etc etc. It didn’t help that they all seem to have an aversion to using the volume controls on their music players to adjust levels.
Also behind the scenes, you probably weren’t aware that clipped music on your CDs might sound different when it was played on consumers CD players. More on this later on.
Cometh the Streamers
Streaming has always been sold as being on the consumer’s side, not the artists or engineers. They strive to make the whole process as easy as possible for the consumer. Part of this is loudness normalisation, where as you stream playlists of any combination of music in the world, the music player adjusts the volume, so they all sound roughly the same level. This means that a loud CD track plays out at the same level as quieter, more dynamic CD track.
It’s this level normalisation that has caused all of the confusion, with MEs not quite sure what levels to master for streaming. Actually it’s quite easy.
This is how Spotify does it. Note, I don’t love Spotify, they are just more candid publically on their processes than the others, so it’s easier to explain. This is how it works
- You send your master to Spotify as a Flac file ideally. At this point it is exactly the same as your studio master
- If needed, they sample rate convert it to 44.1KHz. The bit depth is maintained. i.e you supply it as a 192KHz 24 bit file, they end up with a 44.1KHz 24 bit file
- They convert it to multiple compressed formats. This is because the format you listen to depends on the player (Web or App), how much you pay (free or paid) and what settings you choose (low, high, very high). They make AAC 128, and 256 kbps files and Ogg/Vorbis 96, 160 or 320 kbps files
- They run a level detecting bit of software that works out the average loudness in integrated LuFs, of the compressed file, and it attaches this number as metadata in the compressed file
At this point, it’s important to note that your file has not been level adjusted or limited, just converted to a lossy format. There is a danger point here however – see Clipping later on.
On the user side:
- The user plays out the file on their phone, PC, whatever. The compression quality they hear is determined by point 3 above (which is why Spotify sounds different depending on how you listen to it)
- The Spotify player plays the file out so that it has a LuFs of -14dB. To do this it looks at the metadata in the file and either trims it down (metadata LuFs is higher that -14dB) or trims it up (metadata is less than -14dB). If it trims it up, it also applies a limiter at -1dBFs with a 5mS attack time
- If the user plays it out as an album track, the whole album is normalised to the loudest track, so the relative levels of the track in the album are maintained
So far so good, but the amount of clipping in the file also changes things slightly.
We’re used to the concept of clipping, and old digital meters, sometimes showed us “overs”, where the signal had clipped for several samples. The problem we didn’t realise at the time was that crappy consumer gear distorted badly with overs, so altering the sound. How bad? Difficult to say, but from a MEs perspective this is a technical issue you need to have a handle on.
It happens because of extra digital processing that lossy codecs and DACs perform on the audio data, so even with CDs, overs can cause problems. Virtually every DAC oversamples the data internally to around 2MHz, using a sort of SRC to do it. When the data however is at 0dBFs or over, it takes a bit more maths and silicon to make the DAC behave nicely. With consumer gear this means extra cost, so the billions of smartphones, PCs, CDs, DVDs etc out there tend to have the cheapest DACs in them, all suffering from this issue.
With the lossy codecs, their whole purpose is to remove data that we can’t hear to reduce the file size. They weren’t however designed to deal with clipped data, and they tend to distort the signal while compressing it. This in turn makes the loudness measuring machines see a louder file than it really is, so post compression, the LuFs figure is higher with a clipped piece of music than it was before compression.
This is where the True Peak measurement comes in. It analyses the waveform of the file and uses maths to work out how high the peak would have been, if it hadn’t been truncated by the 0dBFs level ceiling. Effectively, it extends the measured dynamic range of your music.
Now generally we can’t tell a dB or so of clipping, but I don’t know how much distortion a lossy codec adds to the signal, and how much this will increase the post encoder LuFs figure. However for mastering, all of the streamers suggest that your true peak level doesn’t exceed -1dBFs, for louder material, Spotify say keep your TP below -2dBFs.
So what’s a ME to do?
We can draw a few conclusions from the above that you can use in your work
1 – As long as your TP always sits below -1dB, you can do what you like
At the very worst, your song will be turned down, but it shouldn’t be distorted or processed with. There’s a note about louder sounds needing more peak headroom, but a loud sound is probably already heavily compressed, so perhaps you won’t hear any extra distortion?
Anyway as long as you keep to the TP rule you can be as creative as you like.
2 – Your maximum peak to average level is 13dB
If you keep your TP below -1dB, you get 13dB peak to average headroom max. If you’ve got really dynamic music (classical work?) and your LuFs is low and TP high, when it’s replayed, your music will trimmed up, and the peaks limited to fit into this magic 13dB figure.
3 – You can still run it hot
You can compress your work, run it hot, clip it, do whatever you want to make it sound the way you want it. However be aware that if you do this, if the level is greater than -14dB LuFs, and/or the TP is more than -1dBFs, two things will happen when it’s streamed: a) the level will be turned down and b) it will sound more distorted.
4 – Calibrate your room, adjust your levels
Get a calibrated monitor controller, and a set of proper VUs (average reading meters for music) and just get used to working at a certain level in your room. If you do this, you’ll know instantly if you have to trim up or down the material levels.
Don’t stare at the LuFs display or worry about the TPs, if you just work to what sounds right, there’s a high chance it will sound good and measure right. If when you’ve got the sound right by ear, you need to adjust the LuFs and TP figures, just adjust the overall level of the track to suit.
5- Experiment and learn
When you’ve got a moment, take some loud tracks and try to emulate what the streamers will do to them, so you understand and will instinctively know what the limits are.
To get a sense of what it might sound like distortion wise if the TP is greater than -1dB, try this:
- Note your current volume level on your monitor controller
- Get the TP level in dB of this track
- If it’s greater than 0dB, note down a figure of 1dB. If it’s less than 0, but more than -1dBFs, use a figure of 0.5dB
- Raise the gain of the track in the DAW by this figure – i.e., clip it even more
- If you’ve got a good monitor controller like a Crookwood, lower the current monitor level by the same amount that you’ve raised the track level, to keep the replay level the same
- Play out the track and see how it sounds
- If you want to compare it to the original, you can A/B it, using your Crookwood to auto correct for the relative levels as you A/B
To see what it sounds like level wise:
- Note your current volume level on your monitor controller
- Get the LuFs of the original master track
- Copy the master track onto another track to preserve the clipping
- Reduce the level of this track in the DAW by 14 – your LuFs figure. i.e. if your Lufs = -8dB, reduce the level by 14-8 = 6dB
- Play out this track, keeping your monitor controller volume the same
- This is how it will sound on Spotify. If it’s lost some power, you can raise your monitor controller volume to check, but bear in mind most users keep their volume fixed while listening. Interestingly enough, if you just sent this level adjusted track to Spotify, they’d apply no level normalisation to it because it fits their spec. You’re just not using any of the 6dB available headroom
Between these two methods, you’ll start to get a feel as to how high you can push certain genres before they suffer at the hands of the streamers.
I hope this helps, have fun.