VOICE COMPRESSION EMPLOYING DAUBECHIES WAVELETS
Greg Lenihan, University of New Mexico, Albuquerque, NM
ABSTRACT
Much work has already proven the DWT (Discrete Wavelet Transform’s) superiority to orthogonal transforms like the DCT and FFT when applied to image compression. By its very sub-band, and thus, octave nature, the DWT should be well suited for voice compression. This paper shall explore the performance of the DWT employing the Daubechie Wavelet coefficients when applied to voice compression.
1. INTRODUCTION
A typical data compression system is illustrated in Figure 1 for clarity. The transformation block could be implemented via an orthogonal transformation, like the DCT, or with a sub-band filter bank. Both transformation schemes have the same goal: to remove correlation between data samples. Both schemes are simply decompositions on L2(Z) , employing orthonormal basis functions, satisfying the following properties:
TTT = I (1a),
á j k[n], j p[n]ñ = d [k-p] (1b).
In (1), "T" represents the transformation matrix and j [n] represents the basis function [1].
In referring to Figure 1, this paper shall not be concerned with the quantizer and entropy coding blocks: For a more detailed analysis of quantizer and entropy coding implementation, refer to [2]. Although, one aspect of the quantizer will be utilized—namely the criteria for sending the transformed coefficients. The criteria of selecting coefficients based on power spectral content is critical, and thus will be utilized in this paper [3].
The next section shall discuss the framework for implementing the Daubechies coefficients.
To implement a wavelet transform requires the implementation of PRFBs (Perfect Reconstruction Filter Banks). Figure 2 illustrates a single stage PRFB. Figure 3 illustrates a multi-stage PRFB topology that will be implemented in this paper. The "Perfect reconstruction" aspect is developed by alias cancellation as well as maintaining linear phase and constant attenuation. This type of filter topology utilizes the QMF (Quadrature Mirror Filter) scheme in which each low pass filter has pass band on [0,p/2] and each high pass filter has pass band on [p/2,p]. In referring to Figure 2, a set of equations can be derived for the upper (low pass) and lower (high pass) paths:
0.5· [H0(z)· X(z) + H0(-z)· X(-z)]· G0(z) (2.1a),
0.5· [H1(z)· X(z) + H1(-z)· X(-z)]· G1(z) (2.1b).
The 0.5 term results due to the decimation by 2 operation. It is then easy to see that X(z) is found by summing (2.1a) and (2.1b). To achieve alias cancellation and no distortion, the following constraints are required:
H0(-z)· G0(z) + H1(-z)· G1(z) = 0 (2.2a),
H0(z)· G0(z) + H1(z)· G1(z) = 2· z-p (2.2b).
(2.2a) is the constraint imposed upon the filter bank to achieve no aliasing, and (2.2b) is the constraint that removes distortion. In (2.2b), "p" represents the number of filter stages and should be set to odd length.
Typically, H0(z) is chosen first, and thus, H1(z) is easily chosen from H0(z). Next, G0(z) is chosen as H1(-z) and G1(z) is chosen as -H0(-z).
In referring to Figure 3, the collection of analysis stages represent the forward discrete wavelet transform, while the collection of synthesis stages represent the inverse discrete wavelet transform. The H1(z) (high pass) outputs represent the wavelet basis functions that span the space {Wj}, and the H0(z) (low pass) outputs represent the scaling functions that span the space {Vj}. And further,
V1 = V0 Å W0 (2.3a),
V2 = V0 Å W0 Å W1 (2.3b),
Vj+1 = V0 Å W0 Å … Wj (2.3c),
Vj Ì Vj+1 (2.3d).
Because of (2.3), any {Vj} can be created from a set of {Wj}s by appropriate translates and dilations. In some literature, the scaling function at the output of the first stage is referred to as the "Father" scaling function, j 1[n], and the wavelet output of the first stage is referred to as the "Mother" wavelet, Y 1[n] [4]. The scaling and wavelet functions can be expressed by the following:
j k[n]=Ö 2· å h0[k]· j [2n - k], k® N (2.4a),
y k[n]=Ö 2· å h1[k]· j [2n - k], k® N (2.4b).
In (2.4) above, h0[k] and h1[k] represent the low and high pass filter coefficients respectively. The high pass coefficients can be found from the low pass coefficients by,
h1[k] = (-1)k· h0[N-1-k], k® N (2.5).
3. SPEECH COMPRESSION
DWT’s should lend themselves well to speech compression since the dyadic shifts produced by the successive decimation at each stage yields an octave filter band structure. This paper will utilize 4 sub-bands: i.e., 0-500 Hz, 500-1,000 Hz, 1,000-2,000 Hz, and 2,000-4,000 Hz. Then, 8 Khz sampling will be employed.
The data set to be manipulated shall be that of a real speech signal: i.e., the annunciated phrase, "DO YOU" illustrated in Figure 4. The data set shall employ a 4 stage DWT and IDWT using the 4-tap Daubechie set below:
h[0] = -0.129409522
h[1] = 0.22414386
h[2] = 0.8365163
h[3] = 0.482962921.
Three programs are utilized in this paper: The first, called, "DWAV_ENC," performs the forward transform utilizing the 4 stages. The output coefficients—namely the ci[k]’s (scaling function coefficients) and di[k]’s (wavelet coefficients) are written to distinct files. The wavelet coefficients, and the final stage scaling coefficient, c0[k], are then read into a power detection program called, "P_DET.C." The power detection routine computes the power over a 32 sample window, via the following IIR equation:
p_ref[n] = (1-a )· p_ref[n] + a · u[n]2 (2.6).
In (2.6), u[n] represents the input data sequence which is either the wavelet or scaling coefficients. If p_ref[n] is equal to or greater than the specified compression level, then the respective block is saved; otherwise, the respective block is zeroed out. 32 samples was chosen since for 8 Khz sampling, 32 samples represents 4 ms. Voice is typically Wide Sense Stationary for 2-4 ms. Finally, the output coefficients are synthesized by the "DWAV_DEC.C" routine.
For clarity (visual resolution), "snap-shots" of the compression results for the "DO YOU" signal will be analyzed: i.e., a snap-shot of the low frequency part—namely a zoom in of the "DO" part, and a snap-shot of the high frequency part—namely a zoom in of the "YOU" part. For each part, two distinct compression thresholds will be employed.
The first set of plots, Figures’ 5 and 6, represent snap-shot non-compressed comparisons of the original signal with the wavelet decomposition, thus illustrating "near" perfect re-construction. There is an obvious phase delay. The slight amplitude dissimilarity is due to round-off error in the program.
The second set of plots represent the low and high frequency portions of the "DO YOU" signal when compression is employed. Figures’ 7 and 8 illustrate a zoom in comparison of the original and compressed signals when a discrimination threshold of 0.0005, or 20.8 dB of separation is employed. Figure 9 illustrates a zoom in comparison of the original and compressed signals for the high frequency portion when a discrimination threshold of 0.0029, or 13.25 dB of separation is employed.
The low frequency portion using the latter threshold is not illustrated since it is the same as in Figure 7: This is because all of the significant signal energy is concentrated in the 4th band, or the 0-500 Hz band. And as such, any further compression would eliminate the low frequency portion all together.
The results seem to indicate that compressing the high frequency portion is effective, while compressing the low frequency portion is not as effective.
4. CONCLUSIONS
Preliminary findings look good; although more research must be performed to determine why the high frequency content seems to yield to better compression characteristics than low frequency signal content. Double-precision was not employed in this study, and the plot routine itself introduced some unwarranted truncation. Double-precision was not employed primarily to see how practical it would be to port the DWT/IDWT to a real-time platform.
Future work shall include utilizing more filter banks as well as more coefficients, and studying over sampling versus critical sampling.
As of now, it would appear that any real-time implementation would necessitate an FFT built-in, so that dynamic compression could be facilitated: i.e., so that more compression could be achieved during higher frequency signal portions.
5. REFERENCES
[1] Martin Vetterli and Jelena Kovacevic, "Wavelets and Subband Coding," Prentice-Hall 1995.
[2] Richard A. Haddad and Thomas W. Parsons, "Digital Signal Processing: Theory, Applications, and Hardware, " Computer Science Press, 1991.
[3] N. Ahmed and K.R. Rao, "Orthogonal Transforms for Digital Signal Processing," Springer-Verlag · Heidelberg ·
New York 1975.[4] Barbara Burke Hubbard, "The World According to Wavelets," A K Peters, Ltd., 1996.






