Introduction

In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which are the sonically important parts of the waveform that is being encoded. The PAM looks for loud sounds which may mask soft sounds, noise which may affect the level of sounds nearby, sounds which are too soft for us to hear and should be ignored and so on. The information from the PAM is used to determine which parts of the spectrum should get more bits and thus be encoded at greater quality - and which parts are inaudible/unimportant and should thus get fewer bits.

In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes a frame. For each frame the PAM outputs just 32 values (The values are the Signal to Masking Ratio [SMR] in that subband). This is important! There are only 32 values to determine how to alloctate bits for 1152 samples - this is a pretty coarse technique.

The different PAMs listed below use different techniques to decide on these 32 values. Some models are better than others - meaning that the 32 values chosen are pretty good at spreading the bits where they should go. Even with a really bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time. All of these models have strengths and weaknesses. The model you end up using will be the one that produces the best sound for your ears, for your audio.

Psychoacoustic Model -1

This PAM doesn’t actually look at the samples being encoded to decide upon the output values. There is simply a set of 32 default values which are used, regardless of input.

Pros: Faaaast. Low complexity. Surprisingly good. "Surprising" in that the other PAMs go to the effort of calculating FFTs and subbands and masking, and this one does absolutely nothing. Zip. Nada. Diddly Squat. This model might be the best example of why it is hard to make a good model - if having no computations sounds OK, how do you improve on it?

Cons: Absolutely no attempt to consider any of the masking effects that would help the audio sound better.

Psychoacoustic Model 0

This PAM looks at the sizes of the scalefactors for the audio and combines it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.

Pros: Faaast. Low complexity.

Cons: This model has absolutely no mathematical basis and does not use any perceptual model of hearing. It simply juggles some of the numbers of the input sound to determine the values. Feel free to hack the daylights out of this PAM - add multipliers, constants, log-tables anything. Tweak it until you begin to like the sound.

Psychoacoustic Model 1 and 2

These PAMs are from the ISO standard. Just because they are the standard, doesn’t mean that they are any good. Look at LAME which basically threw out the MP3 standard psycho models and made their own (GPSYCHO).

Pros: A reference for future PAMs

Cons: Terrible ISO code, buggy tables, poor documentation.

Psychoacoustic Model 3

A re-implementation of psychoacoustic model 1. ISO11172 was used as the guide for re-writing this PAM from the ground up.

Pros: No more obscure tables of values from the ISO code. Hopefully a good base to work upon for tweaking PAMs

Cons: At the moment, doesn’t really sound any better than PAM1

Psychoacoustic Model 4

A cleaned up version of PAM2.

Pros: Faster than PAM2. No more obscure tables of values from the ISO standard. Hopefully a good base to work from for improving the PAMs

Cons: Still has the same "warbling"/"Davros" problems as PAM2.

Future psychoacoustic models

There’s a heap that could be done. Unfortunately, I’ve got a set of tin ears, crappy speakers and a noisy computer room. If you’ve got the capability to do proper PAM testing then please feel free to do so. Otherwise, I’ll just keep plodding along with new ideas as they arise, such as:

  • Temporal masking (there’s no pre-echo or anything in TwoLAME)

  • Left Right Masking

  • A PAM that’s fully tuneable from the command line?

  • Graphical output of SMR values etc. Would allow better debugging of PAMs

  • Re-sampling routines

  • Low/High pass filtering