Stereo to (fake-)Surround

Before I re-created my blog, I’ve written two articles about a pseudo-possibility for a stereo to surround conversation. I merged both and rewrote the article and hopefully this will help some people. Please note that I’m not an audio-engineer and that I might miss the needed background, this article is based on information I found in the world wide web, my personal taste and things I’ve noticed by research/trying. All sources I’ve used are linked at the bottom of this article.

note: this article is work in progress. I might change a few things. I re-added Jay’s comment because it contains useful information. Hope he’s fine with that :)

In a perfect world five dot one (5.1) would mean, that we’ll have sounds from behind (rear/surround speakers), sounds from the left and right (front left, front right speaker) vocals from the front (center speaker) and bass from the LFE channel (subwoofer). Notice that the “positional” estimation is important. So if we watch a movie, and someone is obviously speaking behind someone, we should hear that sound from the back (from the rear speakers). “Vocals” as in Speech should come from all directions, so if someone is talking on the left → the left speaker should output that. At least, that’s how I would (mis)-understand “Surround”.

As we aren’t in a perfect world, this is somewhat different. I checked the 6-channel sound files of a few dvds and the one of the openmovie sintel. The front left and right channels aren’t containing ANY vocals or they’re sounding like in a big hall, very quiet not easy to hear. First I thought it is related to the “haas”-effect, which says, what you hear first, tells you where the sound comes from. So if you hear something from front-left first (doesn’t matter whether vocal or not – might be some high tone) you’re thinking the vocals are coming from the left. I cannot tell if that is true. I found a small post in doom9 forums which might explain that behavior. http://forum.doom9.org/showthread.php?p=717044#post717044 ← that’s, why you’ll sometimes think there aren’t any vocals on the left and right channel. That’s because they’re just containing the difference as the center-channel contains the sum.

While trying around doing such upconversations it turned out (a few people on the net reported similar results) that it seems to be easier/better to do that conversation in steps, e.g.: stereo → quadrophonic → surround instead of stereo → surround. But: That might be personal taste.

Anyway. We have a stereo file, 48kHz, 16Bit. Due to it beeing stereo, we know it has 2 channels: left and right. We can create one additional channel out of them without much trouble. The center-channel. That channel is a mix of the left and the right channel with 50% volume of each:

    sox -S -V -c 2 source.wav -c 1 combined.wav mixer 0.5,0.5
    normalize combined.wav

That channel can be used now to create the new left and right channels. We’re inverting the specific channel, so that they’ll only contain the differences:

    sox -S -V -c 2 stereo.wav -c 1 sleft.wav mixer -l
    sox -S -V -c 2 stereo.wav -c 1 sright.wav mixer -r
    sox -S -V -M -c 1 -v -1 sright.wav -c 1 center.wav -c 1 right.wav
    normalize right.wav
    sox -S -V -M -c 1 -v -1 sleft.wav -c 1 center.wav -c 1 left.wav
    normalize left.wav

The “-v -1” causes an invert. The -M is used to merge both files (the inverted left / right and the center channel) that way, only the differences are left. Now we can over to our “surround” part. Let’s first talk about frequencies:

Most of the information I found about frequencies aren’t identical. While one says deep bass is between 20 and 60 Hz another says it’s between 20 and 40 Hz. Deep high tones between 3 and 12 kHz, another says between 2 and 3.5 kHz. Here, a simplified variant (which might not be correct)

  • 20 Hz – 20 kHz human hear-able range (the older, the lower the max., likely you’ll hear up to a max. of 16 to 18 kHz)
  • 80 Hz – 12 kHz voice / speech (there are also pages telling it’s up to 8 kHz)
  • 300 Hz – 3,4 kHz voice / speech on analog phones
  • 20 Hz – 200 Hz bass
  • 200 Hz – 2 kHz middles
  • 2 kHz – 12 kHz high tones
  • 12 kHz – 20 kHz upper high tones

20 Hz – 80 or 100 Hz bass which you can’t locate (so for this frequencies it shouldn’t matter, where the subwoofer is located)

You’ll see, that’s not as easy as one might think. However, let’s take a look at some speakers. Averaged and rounded results of a few speakers I took a look at:

  • high tone speakers (7): 2000 (2342) Hz – 20 kHz (21285)
  • middle tone speakers (5): 400 (375) Hz – 13 kHz (12800)
  • deep/low tone speakers (30): 40 (39) Hz – 6 kHz (5961)
  • subwoofer (9): 40 (35) Hz – 200 Hz (232) ← the expensive ones I checked, range to 120 Hz and 150 Hz

All these frequencies might help us with our surround audio file. For example: The real low frequencies are there for the subwoofer, we don’t need them on every speaker. To reduce the “load” of our other speakers, we’ll limit the frequencies. Then: The center-speaker should be optimized for our vocals. Thus it shouldn’t contain too high frequencies to not cause distortion there. We’ll need a frequency filter again. Our rear-speakers won’t benefit from high frequencies. So… With all the above kept in mind, my suggestion is:

  • front left, right: 80 Hz – 20 kHz
  • center: 80 Hz – 12 kHz
  • surround left, right: 100 Hz – 6 kHz
  • lfe: 20 Hz – 200 Hz

The next step would be to create the other channels and to add the frequency-limitations.

    sox -S -V -c 1 left.wav -c 1 left-sr.wav sinc 100-6000 reverb
    sox -S -V -c 1 right.wav -c 1 right-sr.wav sinc 100-6000 reverb
    sox -S -V -c 1 combined.wav -c 1 center.wav sinc 80-12000
    sox -S -V -c 1 combined.wav -c 1 lfe.wav sinc 20-200
    sox -S -V -c 1 left.wav -c 1 left-fr.wav sinc 80-20000
    sox -S -V -c 1 right.wav -c 1 right-fr.wav sinc 80-20000

Now we have all channels. Now we’ll just add a delay of 15ms to the rear speakers and we’ll put the whole stuff together. That’s all :)

    multimux -d 0,0,15,15,0,0 \
     left-fr.wav right-fr.wav \
     left-sr.wav right-sr.wav \
     center.wav lfe.wav > final.wav

As the Comment of Jay (thanks) states – The LFE Channel is not simply a subwoofer channel – Its for effects. So you might want to just remove the vocals instead of doing lowpass filtering and limiting the frequencies

Remember, our source has only 2 channels (left and right) – For 4 channel input or 6 channels input this whole document is not very helpful. I’m trying to make “surround” out of “stereo” – So i have to “guess” to “try” and to “hope” :-)

the whole script to do the conversation

#!/bin/bash

###
# stereo 2 surround
###

inFile="$1";
outFile="$2";
debug="$3";

# todo ...
#if [ ! -z sox ]; then
#echo "sox"
#  run=0;
#fi

#if [ ! -x multimux ]; then
#echo "multimux"
#  run=0;
#fi

#if [ ! -x soxi ]; then
#echo "soxi"
#  run=0;
#fi

#if [ ! -x normalize ]; then
#echo "normalize"
#  run=0;
#fi

# default parameter
soxParm="";
normParm="-q";

# debug parameter
if [ $debug -eq 1 ]; then
  soxParm="-V -S";
  normParm="-v";
fi

if [ $run -eq 0 ]; then
  echo "Error: Requirenment missing: normalize multimux, sox or soxi";
else
  echo " Preparing Source";
  normalize $normParm $inFile;
  rate=$(soxi $inFile | grep "Sample Rate" | awk '{ print $4; }');
  # if rate is 44100, we'll most likely have stuff from an audio-cd,
  # which we want to deemph at least i assume so
  if [ $rate -eq 44100 ]; then
    echo " + Source is 44.1kHz, De-Emphasing & Resampling...";
    sox $soxParm -c 2 $inFile source.wav deemph rate -v -a 48000
  else
    sox $soxParm -c 2 $inFile source.wav rate -v -a 48000
  fi
  # create combined channel
  sox $soxParm -c 2 source.wav -c 1 combined.wav mixer 0.5,0.5
  normalize $normParm combined.wav
  # create pre- left and right channels
  sox $soxParm -c 2 source.wav -c 1 sleft.wav mixer -l
  sox $soxParm -c 2 source.wav -c 1 sright.wav mixer -r
  sox $soxParm -M -c 1 -v -1 sright.wav -c 1 combined.wav -c 1 right.wav
  normalize $normParm right.wav
  sox $soxParm -M -c 1 -v -1 sleft.wav -c 1 combined.wav -c 1 left.wav
  normalize $normParm left.wav
  # frequency games
  sox $soxParm -c 1 left.wav -c 1 ls.wav sinc 100-6000 reverb
  sox $soxParm -c 1 right.wav -c 1 rs.wav sinc 100-6000 reverb
  sox $soxParm -c 1 combined.wav -c 1 c.wav sinc 80-12000
  sox $soxParm -c 1 combined.wav -c 1 lfe.wav sinc 20-200 
  sox $soxParm -c 1 left.wav -c 1 lf.wav sinc 80-20000
  sox $soxParm -c 1 right.wav -c 1 rf.wav sinc 80-20000
  # normalize it in batch-mode
  normalize $normParm -b ls.wav rs.wav c.wav lfe.wav lf.wav rf.wav
  # let's mux it
  multimux -d 0,0,15,15,0,0 lf.wav rf.wav ls.wav rs.wav c.wav lfe.wav > $outFile
  # cleanup
  rm left.wav right.wav combined.wav source.wav sleft.wav sright.wav
fi

Hopefully useful links
You’re visiting those pages on your own risk. I took a look over them, but i can’t assure that those pages are “good”, “correct” or anything else.

6 Comments

  1. Jay Moore (1 comments) says:

    Hi,

    Thanks for the pingback…always surprises me when people find that article.

    I’ve been reading your article…and I’ve got a few things I’d like to add to it (although I haven’t read the previous one).

    For starters, 5.1 sound is far from perfect as far as our ears are concerned. The human ear picks up sound from a full 360 degrees…and the entire deal of processing that sound is huge. The human hearing is really quite advanced and does all sorts of stuff you may not realize. For example…sound direction isn’t determined by exactly what ear obtains the information. What you said about the Haas effect is somewhat true….however the Haas effect is often incorrectly used to describe the underlying process of the precedence effect, and it basically says when similar sounds are coming from different locations, our ears localize it to the first one heard. But that relies on the delay being very small…but it largely has use with sound reinforcement systems..in which you don’t want to disturb the original ‘soundstage’ but want to make sure people can hear it..so generally they’ll delay the audio to distant speakers by 20ms and amplify it a bit….meaning people closer to those speakers still have the whole illusion of the sound coming from the original source..rather than from a speaker. This doesn’t apply to the stereo in your living room…but rather…is more of use in say a concert hall when you’re using sound reinforcement. The further away from the stage you are, the harder it is to hear…so the speakers further from the stage have processing applied so that, to the people near them…the sound isn’t coming from the speakers behind them…but from the stage.

    The other aspect that’s interesting about sound…at least in a natural environment is the way we perceive it. I had an ear infection the other year and my right ear was 98% blocked. I couldn’t hear anything out of it except muffled voices (since the ear canal naturally amplifies those sounds) and anything cranked WAY up. But one thing I noticed is that if a sound came from the right side of my head…despite not being able to hear it from the right ear, i could still tell it was coming from the right side. The best I could figure out from studying the inner ear is that there are additonal bone structures that detect vibrations and help with directional sound….but the other kicker is for sounds below 1000hz. Below 1000hz the waveform is long enough that it actually wraps around the head the same way a low-frequency RF signal follows the curvature of the earth. So our brains developed a way of doing phase discrimination…and this may play in to the precendnce effect as well. If a sound is coming from the right…then it would hit the right ear, curve around the head and hit the left ear…however, it’s phase would be changed…it’s this change in phase that helps the brain determine where the sound is coming from. This is actually why the higher the frequency…the more difficult it is for our brains to determine what direction it’s comming from.

    5.1 is a different game. The amount of delay you’re dealing with in your home stereo is rather small. Some systems will allow you to adjust the delay of the rear speakers…somewhat enhancing the effect….however, I still say phasing plays a lot in to it. Take a 4.0 system for example. You can still get the effect of having a center speaker by mixing audio evenly in to both channels and maintaining it’s phase.

    When you stated that you opened a DVD that was mixed in 5.1 and the left/right contained no vocals…that’s perfectly normal. In most movies..the dialog is pretty much the center of the action and therefore goes to the CENTER channel. The front left and front right only basically containing directional sounds. It’s not proper to say it would contain all the sound in front of you….that would create a pretty false confusing sense of surround. They do..sometimes…contain some of the rear audio information…but delayed or with a different phase. When the sound from the rear hits your ears…likely due to the precdence effect (which I stated didn’t apply as much to home theater but I’m kind of brainstorming as I write this)…you percieve it from the rear. It also allows you to get a more accurate surround stage…say something is comming from DIRECTLY left of you…well…if you were sitting in an ideal surround setup…if you evenly mix the audio between the front and rear left’s….you’ll get this effect that it’s coming from the side. It’s like in my truck, which has speakers in the door and speakers behind the seat it sounds like the sound is coming from a location that has no speaker. If you had a vocal coming from the left AND center…it would sound like it was coming from just left of center….again..possibly due to hass-like effects…but again..i’m not that kind of audio engineer and since I don’t do sound reenforcement…I’ve never studied it…but i’ve likely encountered the same thing in mixing.

    What you’ve done is quite simple…and while i’m not knocking it…I don’t think it’s how I would of done things. For starters…LFE is not just the “subwoofer” channel…it’s far from it. LFE is actually designed for effects that contain a lot of sub-sonic components that are routed to a subwoofer which gives the extra punch…almost the physical punch of the sound. AC3 assumes you have full-range speakers on each channel and if, for example, you have something bassy in the left…then it might be solely left and have no LFE. But…with the way modern systems are designed…we use rather frequency limited drivers and make up for it with a subwoofer…so MOST systems have the ability to redirect the bass from EVERY channel to a subwoofer. Bose was pretty much the first to popularize this. I will admit…you can get away with it…possibly because the aforementioned effects will allow the bass to come from somewhere else but with a slight bit of direction…come from the proper location…but…you don’t need to route all the low-end to the subwoofer. The stereo will do that if needed…otherwise…you’re creating a mix that sounds good on YOUR system…but someone who has full range speakers on thier home theater (like..for example…I do happen to have full-range speakers on all channels and when i’m doing something like playing music…the subwoofer is..surprise…mostly silent).

    but I think you’ll find that if you actually get in to some of the channel seperation techniques out there that use FFT phasing…like the stuff I mentioned or the even better Center Channel Extration which is in Adobe audition which you don’t use since (I assume) you’re a linux guy….even just doing slight gentle stuff yields amazing results…and even if say your center channel sounds slightly distorted…you could use un-seperated left/right audio with your seperated center at half-amplitude and GREATLY enhance the whole 5.1 effect.

    I can’t help you much on what to do for rear channels. Most 5.1 content…even music…is a true 5.1 mix and taking the left/right only content and throwing it in the rear just…it kind of sounds funky.

  2. Krzysztof Marczak (1 comments) says:

    Thank you for publishing this interesting script. I have built base on it a little improved converter.
    In your script there were some mistakes and this script produced wrong music files. Some audio channels were swapped and front channels are upmixed also wrong.
    My script use only $inFile parameter, so it is easier to make batch conversion.
    One more change. This version saves music in compressed ac3 format.
    Below is the code:

    #!/bin/bash

    ###
    # stereo 2 surround
    ###

    inFile="$1";

    #default parameter
    soxParm="";
    normParm="-q";

    # debug parameter
    if [ $debug -eq 1 ]; then
    soxParm="-V -S";
    normParm="-v";
    fi
    echo "************************************************************"
    echo "$inFile"
    echo " Preparing Source";
    #normalize $normParm $inFile;
    rate=$(soxi "$inFile" | grep "Sample Rate" | awk '{ print $4; }');
    # if rate is 44100, we'll most likely have stuff from an audio-cd,
    # which we want to deemph at least i assume so
    if [ $rate -eq 44100 ]; then
    echo " + Source is 44.1kHz, De-Emphasing & Resampling...";
    sox $soxParm -c 2 "$inFile" source.wav deemph rate -v -a 48000
    else
    sox $soxParm -c 2 "$inFile" source.wav rate -v -a 48000
    fi
    # create combined channel
    sox $soxParm -c 2 source.wav -c 1 combined.wav mixer 0.5,0.5
    #normalize $normParm combined.wav
    # create pre- left and right channels
    sox $soxParm -c 2 source.wav -c 1 sleft.wav mixer -l &
    sox $soxParm -c 2 source.wav -c 1 sright.wav mixer -r
    sox $soxParm -M -c 1 -v -1 sright.wav -c 1 combined.wav -c 1 right.wav
    #normalize $normParm right.wav
    sox $soxParm -M -c 1 -v -1 sleft.wav -c 1 combined.wav -c 1 left.wav
    #normalize $normParm left.wav
    # frequency games
    sox $soxParm -c 1 -v 1.0 left.wav -c 1 ls.wav sinc 80-20000 reverb &
    sox $soxParm -c 1 -v 1.0 right.wav -c 1 rs.wav sinc 80-20000 reverb &
    sox $soxParm -c 1 -v 0.4 combined.wav -c 1 c.wav sinc 80-12000 &
    sox $soxParm -c 1 -v 0.5 combined.wav -c 1 lfe.wav sinc 20-200 &
    sox $soxParm -c 1 -v 0.3 sleft.wav -c 1 lf.wav sinc 80-20000 &
    sox $soxParm -c 1 -v 0.3 sright.wav -c 1 rf.wav sinc 80-20000
    # normalize it in batch-mode
    #normalize $normParm -b ls.wav rs.wav c.wav lfe.wav lf.wav rf.wav
    # let's mux it
    multimux -d 0,0,0,0,15,15 -a 320 -f lf.wav rf.wav c.wav lfe.wav ls.wav rs.wav > "$inFile".ac3
    # cleanup
    rm left.wav right.wav combined.wav source.wav sleft.wav sright.wav "$inFile".ac3
    rm c.wav lf.wav lfe.wav ls.wav rf.wav rs.wav
    mv 6ch.ac3 "$inFile".ac3

    • Jean (4 comments) says:

      In my tests, the channels were correct, but I noticed that the channels are different from format to format and from player to player. Actually I was pretty confused by that myself. “mplayer” for example was playing an ac3 5.1 file different than for example an ogg file.

      Deemphasing is (as far as i know) not required for every type of audio cd (i don’t know if it hurts to use it anyway) – If i remember correct, this is only required for “older” audio cd’s.

      Nice to see someone is reading my stuff :-)

  3. Coeur Noir (2 comments) says:

    Hello…

    …2 years later…

    Well, here I am on Ubuntu 13.10 and when I try your script I get :

    normalize – unable to find commande
    multimux – unable to find command

    and also

    “mixer is deprecated”

    Any help welcome…

0 Trackbacks

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>