Stereo to (fake-)Surround

Before I re-created my blog, I’ve written two articles about a pseudo-possibility for a stereo to surround conversation. I merged both and rewrote the article and hopefully this will help some people. Please note that I’m not an audio-engineer and that I might miss the needed background, this article is based on information I found in the world wide web, my personal taste and things I’ve noticed by research/trying. All sources I’ve used are linked at the bottom of this article.

note: this article is work in progress. I might change a few things. I re-added Jay’s comment because it contains useful information. Hope he’s fine with that 🙂

In a perfect world five dot one (5.1) would mean, that we’ll have sounds from behind (rear/surround speakers), sounds from the left and right (front left, front right speaker) vocals from the front (center speaker) and bass from the LFE channel (subwoofer). Notice that the “positional” estimation is important. So if we watch a movie, and someone is obviously speaking behind someone, we should hear that sound from the back (from the rear speakers). “Vocals” as in Speech should come from all directions, so if someone is talking on the left → the left speaker should output that. At least, that’s how I would (mis)-understand “Surround”.

As we aren’t in a perfect world, this is somewhat different. I checked the 6-channel sound files of a few dvds and the one of the openmovie sintel. The front left and right channels aren’t containing ANY vocals or they’re sounding like in a big hall, very quiet not easy to hear. First I thought it is related to the „haas“-effect, which says, what you hear first, tells you where the sound comes from. So if you hear something from front-left first (doesn’t matter whether vocal or not – might be some high tone) you’re thinking the vocals are coming from the left. I cannot tell if that is true. I found a small post in doom9 forums which might explain that behavior. http://forum.doom9.org/showthread.php?p=717044#post717044 ← that’s, why you’ll sometimes think there aren’t any vocals on the left and right channel. That’s because they’re just containing the difference as the center-channel contains the sum.

While trying around doing such upconversations it turned out (a few people on the net reported similar results) that it seems to be easier/better to do that conversation in steps, e.g.: stereo → quadrophonic → surround instead of stereo → surround. But: That might be personal taste.

Anyway. We have a stereo file, 48kHz, 16Bit. Due to it beeing stereo, we know it has 2 channels: left and right. We can create one additional channel out of them without much trouble. The center-channel. That channel is a mix of the left and the right channel with 50% volume of each:

    sox -S -V -c 2 source.wav -c 1 combined.wav mixer 0.5,0.5
    normalize combined.wav

That channel can be used now to create the new left and right channels. We’re inverting the specific channel, so that they’ll only contain the differences:

    sox -S -V -c 2 stereo.wav -c 1 sleft.wav mixer -l
    sox -S -V -c 2 stereo.wav -c 1 sright.wav mixer -r
    sox -S -V -M -c 1 -v -1 sright.wav -c 1 center.wav -c 1 right.wav
    normalize right.wav
    sox -S -V -M -c 1 -v -1 sleft.wav -c 1 center.wav -c 1 left.wav
    normalize left.wav

The “-v -1” causes an invert. The -M is used to merge both files (the inverted left / right and the center channel) that way, only the differences are left. Now we can over to our “surround” part. Let’s first talk about frequencies:

Most of the information I found about frequencies aren’t identical. While one says deep bass is between 20 and 60 Hz another says it’s between 20 and 40 Hz. Deep high tones between 3 and 12 kHz, another says between 2 and 3.5 kHz. Here, a simplified variant (which might not be correct)

  • 20 Hz – 20 kHz human hear-able range (the older, the lower the max., likely you’ll hear up to a max. of 16 to 18 kHz)
  • 80 Hz – 12 kHz voice / speech (there are also pages telling it’s up to 8 kHz)
  • 300 Hz – 3,4 kHz voice / speech on analog phones
  • 20 Hz – 200 Hz bass
  • 200 Hz – 2 kHz middles
  • 2 kHz – 12 kHz high tones
  • 12 kHz – 20 kHz upper high tones

20 Hz – 80 or 100 Hz bass which you can’t locate (so for this frequencies it shouldn’t matter, where the subwoofer is located)

You’ll see, that’s not as easy as one might think. However, let’s take a look at some speakers. Averaged and rounded results of a few speakers I took a look at:

  • high tone speakers (7): 2000 (2342) Hz – 20 kHz (21285)
  • middle tone speakers (5): 400 (375) Hz – 13 kHz (12800)
  • deep/low tone speakers (30): 40 (39) Hz – 6 kHz (5961)
  • subwoofer (9): 40 (35) Hz – 200 Hz (232) ← the expensive ones I checked, range to 120 Hz and 150 Hz

All these frequencies might help us with our surround audio file. For example: The real low frequencies are there for the subwoofer, we don’t need them on every speaker. To reduce the “load” of our other speakers, we’ll limit the frequencies. Then: The center-speaker should be optimized for our vocals. Thus it shouldn’t contain too high frequencies to not cause distortion there. We’ll need a frequency filter again. Our rear-speakers won’t benefit from high frequencies. So… With all the above kept in mind, my suggestion is:

  • front left, right: 80 Hz – 20 kHz
  • center: 80 Hz – 12 kHz
  • surround left, right: 100 Hz – 6 kHz
  • lfe: 20 Hz – 200 Hz

The next step would be to create the other channels and to add the frequency-limitations.

    sox -S -V -c 1 left.wav -c 1 left-sr.wav sinc 100-6000 reverb
    sox -S -V -c 1 right.wav -c 1 right-sr.wav sinc 100-6000 reverb
    sox -S -V -c 1 combined.wav -c 1 center.wav sinc 80-12000
    sox -S -V -c 1 combined.wav -c 1 lfe.wav sinc 20-200
    sox -S -V -c 1 left.wav -c 1 left-fr.wav sinc 80-20000
    sox -S -V -c 1 right.wav -c 1 right-fr.wav sinc 80-20000

Now we have all channels. Now we’ll just add a delay of 15ms to the rear speakers and we’ll put the whole stuff together. That’s all 🙂

    multimux -d 0,0,15,15,0,0 \
     left-fr.wav right-fr.wav \
     left-sr.wav right-sr.wav \
     center.wav lfe.wav > final.wav

As the Comment of Jay (thanks) states – The LFE Channel is not simply a subwoofer channel – Its for effects. So you might want to just remove the vocals instead of doing lowpass filtering and limiting the frequencies

Remember, our source has only 2 channels (left and right) – For 4 channel input or 6 channels input this whole document is not very helpful. I’m trying to make „surround“ out of „stereo“ – So i have to „guess“ to „try“ and to „hope“ 🙂

the whole script to do the conversation

#!/bin/bash

###
# stereo 2 surround
###

inFile="$1";
outFile="$2";
debug="$3";

# todo ...
#if [ ! -z sox ]; then
#echo "sox"
#  run=0;
#fi

#if [ ! -x multimux ]; then
#echo "multimux"
#  run=0;
#fi

#if [ ! -x soxi ]; then
#echo "soxi"
#  run=0;
#fi

#if [ ! -x normalize ]; then
#echo "normalize"
#  run=0;
#fi

# default parameter
soxParm="";
normParm="-q";

# debug parameter
if [ $debug -eq 1 ]; then
  soxParm="-V -S";
  normParm="-v";
fi

if [ $run -eq 0 ]; then
  echo "Error: Requirenment missing: normalize multimux, sox or soxi";
else
  echo " Preparing Source";
  normalize $normParm $inFile;
  rate=$(soxi $inFile | grep "Sample Rate" | awk '{ print $4; }');
  # if rate is 44100, we'll most likely have stuff from an audio-cd,
  # which we want to deemph at least i assume so
  if [ $rate -eq 44100 ]; then
    echo " + Source is 44.1kHz, De-Emphasing & Resampling...";
    sox $soxParm -c 2 $inFile source.wav deemph rate -v -a 48000
  else
    sox $soxParm -c 2 $inFile source.wav rate -v -a 48000
  fi
  # create combined channel
  sox $soxParm -c 2 source.wav -c 1 combined.wav mixer 0.5,0.5
  normalize $normParm combined.wav
  # create pre- left and right channels
  sox $soxParm -c 2 source.wav -c 1 sleft.wav mixer -l
  sox $soxParm -c 2 source.wav -c 1 sright.wav mixer -r
  sox $soxParm -M -c 1 -v -1 sright.wav -c 1 combined.wav -c 1 right.wav
  normalize $normParm right.wav
  sox $soxParm -M -c 1 -v -1 sleft.wav -c 1 combined.wav -c 1 left.wav
  normalize $normParm left.wav
  # frequency games
  sox $soxParm -c 1 left.wav -c 1 ls.wav sinc 100-6000 reverb
  sox $soxParm -c 1 right.wav -c 1 rs.wav sinc 100-6000 reverb
  sox $soxParm -c 1 combined.wav -c 1 c.wav sinc 80-12000
  sox $soxParm -c 1 combined.wav -c 1 lfe.wav sinc 20-200 
  sox $soxParm -c 1 left.wav -c 1 lf.wav sinc 80-20000
  sox $soxParm -c 1 right.wav -c 1 rf.wav sinc 80-20000
  # normalize it in batch-mode
  normalize $normParm -b ls.wav rs.wav c.wav lfe.wav lf.wav rf.wav
  # let's mux it
  multimux -d 0,0,15,15,0,0 lf.wav rf.wav ls.wav rs.wav c.wav lfe.wav > $outFile
  # cleanup
  rm left.wav right.wav combined.wav source.wav sleft.wav sright.wav
fi

Hopefully useful links
You’re visiting those pages on your own risk. I took a look over them, but i can’t assure that those pages are „good“, „correct“ or anything else.

No Comments

Post a Comment