HAY GUYS

Moguta · 10:48 PM

Rimo, you bring up a very good question. And I agree that you would probably be well-served sticking to -V2. It is my view that "upgrading" to -V0 does little more than offer the comfort that your audio is in higher bitrate, consuming that additional space without sounding significantly better.

I want to know, honestly, how many people discussing this have tried a scientific ABX listening test to evaluate what settings work for them? (ABX test software is easily available, via a component in the Foobar2000 audio player.) Because when differences in sensory input become small, such as when comparing high-quality encoding modes, subconscious psychological biases can easily distort one's perception. Only double-blind tests eliminate this strong suggestive bias. They do this by removing the listener's ability to know which sample is which, such as knowing that a certain sample is encoded at a higher bitrate, knowledge that often distorts the listener's perception such that they falsely "hear" it as being better because they expect it to be better.

To illustrate my point, recently I posted on OverClocked ReMix about how the size-restricted encode of Valse Aeris on the site had very noticeably jarring artifacts because it was encoded with an average bitrate of 103Kbps. This was while listening on my Sennheiser HD 280 Pro headcans. After people disagreed with my assessment of it sounding so horrible, I tried listening to it on my Klipsch ProMedia 2.1 speakers. The difference I heard clearly on my headphones I actually couldn't hear AT ALL on my speakers, even at a bitrate as low as 103Kbps!

So while I could easily ABX the difference between Valse Aeris encodes 12 out of 12 times on my headphones, I gave up trying to ABX the difference on my speakers after a few trials when the program basically said I could toss a coin and get the same results.

Before going on, I would like to note that I was ABX comparing the 103Kbps ABR encode against a -V2 --vbr-new encode of about 198Kbps.

After noticing that the bitrate could actually be bumped up from 103Kbps and still fit OverClocked ReMix's size limit, I fooled around and was able to get it to an average bitrate of 114Kbps staying within the limit. And the most surprising thing for me was that with only this 11Kbps average increase, the audio artifacts that were formerly so very audible and evident on my headphones became surprisingly subtle! And yet -V2, the command line I live by, encoded it an entire 95Kbps higher. With that drastic a decrease in flaws with only an 11Kbps increase, I somehow doubt that I would need a mode as high as -V2, with its much more drastic increase, to get transparent sound.

I also fooled around with -V6 while trying to fit Valse Aeris within the filesize limit. While it actually went over the limit (with a bitrate of 122Kbps), I noticed that it seemed to sound even better than the 114Kbps ABR (not surprising), but while I thought that I could still hear some subtle audio artifacts, I knew it would start to be a challenge to actually ABX test them at this point. And remember that was using just -V6, on my hi-fi headphones! And I couldn't hear the difference at all on my mid-level speakers!

Looking at the Hydrogen Audio wiki for LAME encoding, notice that it specifies -V3 through -V0 as "High quality: HiFi, home or quiet listening", and additionally states that "-V4 --vbr-new should be close to perceptual transparency". After my experiences with Valse Aeris, I suspect that this wiki entry is quite close to the truth. -V2 ought to be more than fine for encoding all your music, and if one is interested in greater space efficiency, -V3 and -V4 may be worth testing out.

(Note, that if you ABX test your own audio, you must know what to look for. Percussion that is high in the mix, especially cymbals and casanets, tends to artifact. And there are several testers and devs over at Hydrogen Audio that can tell you what other "problem" samples to look out for; just make sure you're talking to someone who actually knows their stuff.)

Also, I'd like to correct a few statements I saw above:

There is no reason to think that -V0 is the most you can get out of MP3, nor that -V0 is more tuned than any other preset. -b 320 will indeed get you the "best quality possible" because -V0 is unlikely to ever result in a 320Kbps MP3, even if there is enough audio information to be encoded to fill all those bits usefully. However, audible differences between 320Kbps and -V0 are probably about as unlikely to occur as those between -V1 and -V0, so this discussion about the "best quality of MP3" is really more academic than practical.

And LAME 3.98 beta 5 is considered the "best" version of LAME, by what authority or consensus? Hydrogen Audio still lists the recommended version as LAME 3.97 stable. Be careful of being on the bleeding edge, using betas and such. While 3.98 beta 6 seems to have some obvious flaws, without prudent testing what's to say that 3.98 beta 5 doesn't have more subtle regressions?

Jam it back in, in the dark.

Moguta · 06:15 PM

Originally Posted by LiquidAcid

@Moguta: CBR 320 is wasting bits when compared to VBR V0. That's why it's inferior.

I apologize, I thought when you mentioned "V0 maxes out what MP3 can give you" that you meant it gave the highest possible quality regardless of all else. So what you're really talking about then is bitrate efficiency, or in other words, how you can get the most perceptually-transparent audio with the least bits wasted to inaudible audio.

Along the lines of your own argument, I would propose that -V2 and perhaps even -V3 or -V4 are superior to -V0 because -V0 wastes bits with rarely any additional audible quality.

However, I notice you do mention hearing artifacts in -V2 encodes. Can you provably demonstrate that you hear artifacts in -V2 that you do not hear in -V0, by way of ABX testing? I have never been able to pick up any differences, but I am not arrogant enough to deny you the opportunity to prove that you do hear something that I do not.

Originally Posted by LiquidAcid

Changelog for the interested user:
LAME Changelog

Thanks for that. I was actually thinking of that very document when Spikey mentioned the newer betas of LAME, though I didn't remember where it was. It's nice to see such additional improvements, and although any mentioned quality changes are things that should make it better, unfortunately they may regress in certain cases, notably in the recent beta 6. However, I do see one line that looks promisingly definite -- "Known problem samples for the new VBR code: many of them are at an acceptable quality level now; with a big 'Thank You' to Francis Niechcial" -- because it appears that careful listening tests must have been performed to make this assertion. Also, it's interesting that --vbr-new (now default in 3.98 betas, evidently? does the old mode need --vbr-old or something?) is getting a new psymodel. This seems like the most volatile change, with the potential to incur much improvement or regression.

Originally Posted by Rimo

The problem is: which high quality VBR setting to use? This goes with the fact that MP3s are being shared among people and not everybody have an identical audition/equipment. Personally, I'm not excited about the idea of downloading files encoded at -V 8 and actually consider -V 0 to be slightly too much. Similarly, there are people who frown on anything lower than -V 0.

Indeed. However, I am naturally suspicious of the claims of some people with hi-fi equipment that they can hear the benefit of -V0 with regularity. If anything, the differences should be subtle and only exist with rare problem samples. And problem samples tend to artifact with -V0 as well as lower qualities such as -V2 since such "problem samples" tend to give trouble for the entire encoder or psymodel, rather than only being troublesome for certain quality steps. In such cases, the difference is usually how they artifact, not whether they artifact.

Also, I hope no one would be excited about downloading -V8 encoded music, considering they'd be getting 32KHz-resampled audio at bitrates around 85Kbps. I imagine it's enough to hurt anyone's ears.

Originally Posted by Rimo

Limiting the possibilities to -V 2 and up, the difference between these presets are relatively small, yet there doesn't seem to be a general consensus to attest which one is the best to use. However, many people here are currently using -V 0, yet I'm far from convinced they actually hear a quality difference.

Limiting the possibilities to -V2 and up, I would definitely choose -V2. I submit that extremely few people are actually able to hear the rare, small differences between those ~245Kbps -V0 files and the more space-and-bandwidth-efficient ~190Kbps -V2 files.

For this reason, I really wish that #gamemp3s had not increased their encoding mode to -V0. Not only I am quite hesitant to transcode the files lower to save my own space, at risk of additional degradation, but I think that #gamemp3s' distribution would benefit from smaller torrents that take up less space on users' computers. If I remember correctly, the decision to increase the encoding standard was simply done by putting a poll on the website. It's no surprise that voters wanted the "upgrade", with the common mentality that higher bitrates and higher quality switches must be better than anything lower.

Originally Posted by Rimo

In the same line of idea, I'd also be ready to use a lower preset, but kind of consider -V 2 to be the standard (I guess the name creates this effect).

I plan on personally messing around with -V3 and -V4 to check out how well they hold up with my ears and headphones. But, also, it's prudent to remember that the majority of people (and we're not exactly talking about the demographic that would visit Hydrogen Audio or my ripping guide) don't mind 128Kbps CBR for casual listening. So even if you can begin to hear what's different, perhaps one should question whether those differences will even be enough to be annoying or jarring... especially when considering quality modes just bordering audible transparency.

How ya doing, buddy?

Moguta · 11:20 PM

Originally Posted by LiquidAcid

@Moguta: My definition of "wasting bits" isn't the same as yours. You define wasted bits by "not so much" improvement on the perceived audio.

That's not my "wasted bits" definition, which is much more concrete: unused bits in the MPEG bitstream, unused because the framesize is too big for the bits delivered from the transform coding step.

Ah, I see. So you're talking about the fact that it's CBR, and the disadvantages inherent in such an encoding mode.

I agree that CBR is certainly less than optimal. But I don't see how that means that "V0 maxes out what MP3 can give you". There may be some wasted bits, yes, but it would seem that 320Kbps CBR "maxes out" because it can still end up encoding more audio information than a -V0 encode. And not only that, but seeing that most lossless audio tends to have bitrates somewhere around 900Kbps, I can't help but doubt that there are many cases besides digital silence where 320Kbps worth of audio is not present. Of course, I could be wrong, since MP3 does store different information than lossless formats.

But even if I'm wrong about that, I'm not sure that "VBR V0 is the end of MP3" or that "You can't get more quality without rewriting the standard". After all, why continue to develop LAME if no improvements can be made without a format rewrite? And additionally, one can always make a higher VBR preset. For example, a preset could encode everything at 320Kbps except if the frame did not have enough information to do so, so no bits are wasted.

Of course, once we get the idea that "Best for lossy encodings is determined by perceived audio quality divided by bitstream size", it begins to seem apparent that an encoding mode might possibly be too high, even if it doesn't waste bits by the strict definition. And that's why I think that -V0 is an unnecessary ~%25 size increase, just as I'm sure my proposed 320Kbps VBR setting would be overkill.

Originally Posted by LiquidAcid

EDIT: Oh yes, also it wasn't me stating that I could hear artifacts in V2 which wasn't there with V0.

So are you saying that you can't distinguish -V0 from -V2? I thought that you were implying the ability to hear a difference here:

Originally Posted by LiquidAcid

You see, I rip all of my discs in FLAC. In case I want to have something on my portable player I can always re-encode the file to a lossy encoding. As the hardware decoder of portables isn't very accurate, the DAC often is crappy and the standard headphones don't reproduce the sound very well - I can even go below V2, e.g. V4 or even lower. I probably won't notice the degraded audio quality at all.
Encoding quality is just good enough to drive this low-end playback chain.

On the other hand when at home and listening to music through my "good" equipment (AKG k701 dyn. headphones, DIY headphone amp and DIY USB-DAC) I'm not that limited and certain flaws (like ringing artifacts when audience is applauding) are now detectable.

Furthermore I get tired when listening to highly compressed (encoded, not the compression as in loudness war - I also get tired of this one) audio. I can listen much longer when playing from the original disc (or a lossless encoding), also it's more relaxing for me.
There is a lot that's destroyed when doing lossy encodings. Stereo imaging, dynamic range, all sorts of artifacts.

I'm not saying that I can always distinguish between a lossy and a lossless encoding. But there are differences, which are annoying when at perceivable level.

Originally Posted by LiquidAcid

The fascinating thing is that it's quite easy to implement it on embedded hardware (there is the integer-only imlementation vorbis-tremor, which should work on most ARM processors).
I think it's more a political thing...

About portable Vorbis implementation... It's not really so trivial. For every supported format, space must be allocated in a chip's limited ROM space to store the decoding procedures. Also, optimizing the decoding for high performance is critical to both the available processing power of a chip and the battery life of any portable digital player. I know that the first iRiver devices to support Ogg Vorbis could only support files in certain bitrate ranges, or the player would be forced to skip to the next song.

I agree that it would be great if Vorbis had wider hardware adoption. But the ratio of effort to implement vs. consumer demand just seems too out of kilter for it to be worthwhile for manufacturers.

This thing is sticky, and I don't like it. I don't appreciate it.

Moguta · Mar 30, 2008, 12:30 AM

So most lossy audio codecs store frequency information (as opposed to storing points of amplitude). On the other hand, most lossless audio codecs store small efficient equations that predict pieces of the waveform pretty well, trying to have as little error (and thus, sample adjustments to correct it) as possible.

Am I understanding it right?

I am a dolphin, do you want me on your body?

Moguta · 05:14 PM

Originally Posted by sup!

Well that's not correct. Even CBR files work a bit like VBR by using the bit reservoir. The spare bits won't be filled up with zeros.

While that's mostly true, the bit reservoir is limited and can only store a certain maximum amount to be used later, so a passage of complete silence in a 320Kbps CBR file would indeed be padded with useless data. In a VBR file, that same silent passage would be about 10x (320Kbps vs 32Kbps) smaller. As a side-note, it is a shame that the specifications of the MP3 format prevent silence from being encoded at 0Kbps, as in FLAC & other codecs.

And, wow, reading that HydrogenAudio wiki page you linked, it seems that VBR will use the bit reservoir too if necessary. And here I had thought that VBR encoding did not have a bit reservoir at all! The more you learn...

Originally Posted by sup!

Encoding with 320cbr (preset-insane) also changes other settings, not only the bitrate. But the extra quality is probably not perceivable and therefore wasted.

And, indeed, this really is the main issue.

Originally Posted by sup!

There's also this nice chart, visualizing the quality gain (notice the jump in filesize from -V0 to 320cbr):

Spoiler:

I do not disagree with your statement, as indeed the size jump represents what is typical. For propriety, however, I must note that this graph was not created from any actual data, but was drawn simply to illustrate the trend of diminishing returns in MP3 encoding. So while it's a useful guide, no one should treat it as conclusive proof or evidence.

I was speaking idiomatically.