MTU.Community - View Single Post

infoby · #11 September 5th, 2002, 03:05 PM

My Friend,

Richard Faith had written a vocal remover that works in ASCII Format it's a great job retaining the music its major draw back is it's painfully slow. He looked at my Vogone product and was quite impressed playing with it, he found some suggestions you may want to consider, here they are.

Here are some ideas I have come up with regarding Vogone.

1) In order to optimize the cancellation for songs wherein the main vocal is not exactly center panned, it would be good to add a slider control for the user to balance the channels.

2) Even when the amplitudes of right and left channels are made to match each other precisely, as would be accomplished by (1) above, perfect cancellation may be impossible because of mismatch on the time axis. An example of how this can happen is a song that was mastered on an analog tape machine with lateral misalignment of the record and/or reproduce heads. I have experimented with this to some extent, and have written a program that receives the input data in ASCII format (here Vogone has the superior advantage of working directly with the .WAV file) and allows one channel to be delayed with respect to the other by a desired (integer) number of sample intervals (e.g., 1/44100 second and integer multiples of that value).
This processing is helpful in a situation where the best null that can be accomplished with amplitude balancing contains a significant amount of residual sibilance from the vocal. How this could apply to Vogone would be to provide another slider which would normally be set at or near center position, and which would then be tweaked in real time by the user to optimize the rejection of higher-frequency parts of the main vocal.

3) Although these are rare, there is occasionally a recording wherein the main vocal is recorded OUT of phase in the two channels. With such a recording, any attempt to reduce the vocal by subtraction will be counterproductive. To handle cases like this, an "Invert/Normal" switch could be added to flip the polarity of one channel only.

4) I believe that Vogone owes much of its speed advantage to the fact that it works directly on the .WAV file instead of requiring input in ASCII format, and also to (here I'm speculating) using integer math internally within and between stages of processing. I believe that if all the internal processing were done in double-precision, floating-point format, better sound quality would result, although disastrously at the expense of speed. However, here is how the best of both worlds could be brought together: keep the program just as it is (but perhaps with features 1~3 above added) as "fast" processing, wherein the user could tweak all the sliders to get the best possible vocal cancellation, and then re-run the processing in a "slow" mode, using the settings last used in the "fast" mode, with the only difference being that the internal processing would be double-precision, floating-point math. A real dyed-in-the-wool karaoke nut would be typically more than happy to let his computer crunch on a file while he goes shopping or to a movie, if the result is a truly superior karaoke CD.