29 May 1999
Source: US Patent Office Online:
http://www.uspto.gov/
Search "National Security Agency" though none of the patents disclose the full name.
For related images see IBM's patent server: http://www.patents.ibm.com/ibm.html
United States Patent | 4,897,878 |
Boll , et al. | January 30, 1990 |
A method and apparatus for noise suppression for speech recognition systems which employs the principle of a least means square estimation which is implemented with conditional expected values. Essentially, according to this method, one computes a series of optimal estimators which estimators and their variances are then employed to implement a noise immune metric. This noise immune metric enables the system to substitute a noisy distance with an expected value which value is calculated according to combined speech and noise data which occurs in the bandpass filter domain. Thus the system can be used with any set of speech parameters and is relatively independent of a specific speech recognition apparatus structure.
Inventors: | Boll; Steven F. (San Diego, CA); Porter; Jack E. (San Diego, CA) |
Assignee: | ITT Corporation (New York, NY) |
Appl. No.: | 769215 |
Filed: | August 26, 1985 |
U.S. Class: | 381/43; 381/47 |
Intern'l Class: | G10L 007/08 |
Field of Search: | 381/41-50 364/513.5 |
4499594 | Feb., 1985 | Lewinter | 381/46. |
4567606 | Jan., 1986 | Vensko et al. | 381/43. |
4624008 | Nov., 1986 | Vensko et al. | 381/43. |
Foreign Patent Documents | |||
0216118 | Apr., 1987 | EP | 381/46. |
Boll, Suppression of Acoustic Noise in Speech Using Spectral Subtraction, IEEE Trans. on ASSP, vol. ASSP-27, No. 3, Apr. 1979, pp. 113-120. Tierney, A Study of LPC Analysis of Speech in Additive Noise, IEEE Trans. on ASSP, vol. ASSP-28, No. 4, Aug. 1980, pp. 389-397. F. Itakura, Minimum Protection Residual Principle Applied Speech Recognition, IEEE Trans. on ASSP, vol. ASSP-23, pp. 67-72, Feb. 1975. Porter and Boll, "Optimal Estimators for Spectral Restoration of Noisy Speech", ICASSP, San Diego, CA, Mar. 1984, pp. 18A.2.1-18A.2.4. |
Primary Examiner: Clark; David L.
Assistant Examiner: Knepper; David D.
Attorney, Agent or Firm: Twomey; Thomas N. Werner; Mary C.
TABLE 5.1 ______________________________________ Average Mean Square Error Values Condition mse ______________________________________ noisy - clean 9.4 optimal parameters - clean 3.3 noise metric - clean 2.5 ______________________________________
Although this represents only a course examination of performance, it does
demonstrate that the metric is performing as desired. A more realistic test
requires examining its performance in a wordspotting experiment as defined
below.
WORDSPOTTING USING UNNORMALIZED PARAMETERS
The wordspotter was modified to use unnormalized 4th root parameters and
Euclidean distance with or without the variance terms added. All other aspects
of the wordspotting program remained the same, i.e. standard blind deconvolution,
overlap removal, biasing, etc. Results are presented using the same scoring
procedure as described in App. III. The table shows the average ROC curve
differences for each template talker.
______________________________________ Wordspotting Results Using Unnormalized Parameters Condition 50 51 joco gara chwa caol ave ______________________________________ clean -19 -19 -7 -15 -10 -12 -13.6 noisy -25 -27 -21 -21 -21 -26 -13.3 Optimal Params -20 -23 -8 -14 -22 -12 -16.6 Only Noise Metric -20 -22 -9 -16 -22 -12 -16.7 ______________________________________
Although overall performance using unnormalized parameters is lower than
using normalized features, these experiments show some interesting
characteristics. Specifically, for five of the six template talkers, use
of the optimal parameters and/or the noise metric returned performance to
levels nearly equal to the clean unknown data. This degree of restoration
is not found in the normalized case. Stated another way, normalization tends
to minimize the deleterious effect of noise and the restoring effect of the
optimal parameters.
NOISE METRIC USING NORMALIZED PARAMETERS
In a preliminary development of the noise metric, the analysis used first
order terms in the power series expansion of the reciprocal square root.
Use of only first order terms leads to results which differ slightly from
the results when second order terms are included. The development with second
order terms is given below. Wordspotting performance is presented using the
corrected formulation.
BACKGROUND
Let x, x, x represent noisy, noise-free and estimated noise-free parameter
vectors, and let primes denote 1.sub.2 normalization: ##EQU2##
The (unnormalized) estimator error is
.epsilon.=x-x
We define .eta. to be the error in estimating the normalized noise-free
parameters by normalizing the (unnormalized) estimator, ##EQU3##
.eta. can be expressed to first order in .epsilon. as ##EQU4##
The previous analysis proceeded to use this first order approximation as
a basis for computing the effect of second order statistics of .epsilon.,
in the form of variances of the components of .epsilon.. This leads to the
conclusion, for example, that expected value of .eta. is zero since the expected
value of .epsilon. is
E(.epsilon.)=E(x-x.vertline.x)
which is zero when we ignore cross channel effects. This treatment is
inconsistent and leads to error, as second order effects are ignored some
places and used in other places. The previous analysis can be corrected by
carrying all second order terms in .epsilon. forward. This leads, among other
things, to the result that the expectation of .eta. is not zero.
The analysis is now repeated by carrying forward all second order terms in
.epsilon.. Other than this change, the development is little different from
the previous one. When the expectation of the noise-free l.sub.2 - normalized
distance given the noisy observations has been expressed to second and higher
order in .epsilon. we will then assume third moments vanish and then ignore
cross-channel covariances.
We start from the definition of the noise-free distance given the noisy
observations: ##EQU5##
To simplify notation, we drop the notation specifying the noisy observation
conditioning. Then, using the fact that we are dealing with unit vectors,
the expected value can be expressed in terms of dot products as: ##EQU6##
where .eta. is defined above. Expanding the dot products give: ##EQU7##
The product term in .eta..sub.u and .eta..sub.t is an interesting problem.
For the most part, the error term .eta. will be the result of noise, and
noise at the template recording and at the unknown recording are very reasonably
assumed to be uncorrelated, so approximately,
E(.eta..sub.u .eta..sub.t)=E(.eta..sub.u)E(.eta..sub.t)
But this not quite correct, as the expectation is over speech and noise.
Correlations between .eta..sub.u and .eta..sub.t can therefore arise due
to the speech aspect of the expectation, (and, in fact, it can be expected
to differ in match and no-match conditions). Fortunately, in the present
analysis, where we're willing to have templates treated as noise free,
.eta..sub.t.ident. 0 and the problem doesn't arise.
We continue by computing the expectation of a clean normalized parameter
vector. Since the treatment applies to both template and unknown, we don't
distinguish between them. ##EQU8##
Substituting these in the expression for the expectation of x gives ##EQU9##
This expression for the expectation of the noise-free normalized parameter
vector is true for any estimator x which is a function of the noisy observations.
It is complete in the second order of residual error of the estimator, hence
is an adequate model for computing the effect of second order statistics.
We now specialize to the optimal estimator we have been using. (Notice we
have made no simplifying assumptions yet.)
First some subtler points. From the definition of .eta., we have ##EQU10##
Since it is not known whether a distance calculation is for a match or no
match condition, correlation which exist between the template and the error
in the unknown cannot be used. It is therefore reasonable to make:
Assumption 1a:
E(.epsilon..sub.u .vertline.x.sub.u,x.sub.t)=E(.epsilon..sub.u .vertline.x.sub.u)
Assumption 1b:
cov(.epsilon..sub.u .vertline.x.sub.u,x.sub.t)=cov(.epsilon..vertline.x.sub.u)
Since the expectation of a vector is a vector of expectations, ##EQU11##
and each component can be expressed
E(x.sub.u,i -x.sub.u,i .vertline.x.sub.u)=x.sub.u,i -E(x.sub.u,i
.vertline.x.sub.u)
where
E(x.sub.u,i .vertline.x.sub.u)=E(x.sub.u,i .vertline.x.sub.u,l, . . . ,
x.sub.u,n)
Our optimal estimators are derived independently for each channel; that is,
x.sub.u,i .ident.E(x.sub.u,i .vertline.x.sub.u,i)
In doing this, we ignore inter-channel dependencies. Thus we make
Assumption (2a) For any i
E(x.sub.u,i .vertline.x.sub.u)=E(x.sub.u,i .vertline.x.sub.u,i)
Assumption (2b) For any i
var(x.sub.u,i .vertline.x.sub.u)=var(x.sub.u,i .vertline.x.sub.u,i)
The effect of assumptions (1a) and (2a) is to make ##EQU12##
Next we make the necessary assumptions needed to compute the estimator residual
error statistics. We have the within-channel variances, but don't want to
deal with the multitude of cross-channel covariances or higher moments. So
we make
Assumption (3a) ##EQU13##
Assumption (3b) Higher order moments of .epsilon..sub.u vanish, i.e.
E(O.sub.3 (.epsilon..sub.u))=0
Under these conditions, the expectation of the i.sup.th component of the
normalized parameter vector is ##EQU14##
To find the noise immune metric, we first need the expectation of .eta..
##EQU15## and using the expression above, we find the components of this
expectation are given by ##EQU16## and similarly for the template vector,
when assumptions 1 through 3 are extended to it.
As shown in the first part of this section, ##EQU17##
To estimate it using the previous results, we formalize previous remarks
with Assumption 4: ##EQU18## and .gamma. and .beta. are defined similarly
for the template.
In the wordspotting case, we generally assume the template is noise free,
so the .beta. and .gamma. terms for the template vanish. In that case the
result simplifies to ##EQU19##
WORDSPOTTING RESULTS
Wordspotting runs were made with and without the corrected metric on 10 dB
noisy speech. The table lists the results of the wordspotting experiments.
The standard scoring approach is given. That is, for each condition and each
template talker, the number represents the average amount that the ROC curve
differs from a selected baseline consisting of speaker 50 on clean unknown
speech.
The legend for the table is as follows:
base: clean templates vs. clean unknowns
base.sub.-- noisy: clean templates vs. noisy unknowns at 10 dB SNR
base.sub.-- opt: clean templates vs. optimally restored unknowns.
TABLE ______________________________________ Wordspotting Performance Condition 50 51 joco gara chwa caol ave ______________________________________ base 3 7 9 15 3 17 9.0 base noisy -12 -18 -2 -3 -10 -6 -8.5 base opt -10 -12 2 -2 -17 1 -6.3 ______________________________________
Referring to FIG. 18, there is shown a simple block diagram of a speech
recognizer apparatus which can be employed in this invention. Essentially,
the speech recognizer includes an input microphone 104 which microphone has
its output coupled to a preamplifier 106. The output of the preamplifier
is coupled to a bank of bandpass filters 108. The bank of bandpass filters
is coupled to a microprocessor 110. The function of the microprocessor is
to process the digital inputs from the bandpass filter bank and to process
the digital inputs in accordance with the noise immune distance metric described
above.
Also shown in FIG. 18 are an operator's interface 111, a non-volatile mass
storage device 114 and a speech synthesizer 116. Examples of such apparatus
are well known in the field. See for example, a patent application entitled
APPARATUS AND METHOD FOR AUTOMATIC SPEECH RECOGNITION, Ser. No. 473,422,
filed on Mar. 9, 1983, now U.S. Pat. No. 4,624,008 for G. Vensko et al and
assigned to the assignee herein.
As indicated above, the algorithm or metric which has been described is suitable
for operation with any type of speech recognizer system, and hence the structures
of such systems are not pertinent as the use of the above described algorithm
will enhance system operation. In any event, as indicated above, such speech
recognition systems operate to compare sound patterns with stored templates.
A template which is also well known is a plurality of previously created
processed frames of parametric values representing a word, which when taken
together form the reference vocabulary of the speech recognizer. Such templates
are normally compared in accordance with predetermined algorithms such as
the dynamic programming algorithm (DPA) described in an article by F. Ita-Kura
entitled MINIMUM PROTECTION RESIDUAL PRINCIPLE APPLIED TO SPEECH RECOGNITION,
IEEE Transactions, Acoustics, Speech and Signalling Processing, Vol. ASSP-23,
pages 67-72, Feb. 1975.
The algorithm allows one to find the best time alignment path or match between
a given template and a spoken word. Hence as should be apparent from FIG.
18, modern speech recognition systems employ templates and incorporate such
templates in computer memory for making a comparison of the templates with
the digital signals indicative of unknown speech sounds which signals are
developed in the bandpass filter bank. The techniques for generating the
digital signal from unknown speech signals have been extensively described
in regard to the above noted co-pending application.
See also a co-pending application entitled A DATA PROCESSING APPARATUS AND
METHOD FOR USE IN SPEECH RECOGNITION, filed on Mar. 9, 1983, Ser. No. 439,018,
now U.S. Pat. No. 4,567,606 by G. Vensko et al and assigned to the assignee
herein. This co-pending application describes a continuous speech recognition
apparatus which also extensively employs the use of templates. In any event,
as can be understood from the above, this metric compensates for noise by
replacing the noisy distance with its expected value. Hence a speech recognizer
operates to measure the similarity between segments of unknown and template
speech by computing, based on an algorithm, the Euclidean distance between
respective segment parameters. The addition of noise to either the unknown
speech or the template speech or both causes this distance to become either
too large or too small. Hence based on the algorithm of this invention, the
problem is solved by replacing the noisy distance with its expected value.
In order to do so, as explained above, there are two forms of information
required. The first is the expected values of the parameters and the second
is their variance.
Thus based on the above description, as further supplemented by Appendices
II, III and IV, there is described the necessary calculations to enable one
to calculate the required parameters while the specification teaches one
how to combine the parameters to form the noise immune metric. As indicated,
the processing can be implemented by the system shown in FIG. 18 by storing
both the parameters and their variances in either memory 114 or in the
microprocessor memory 110.
In accordance with the invention, a method of compensating for noisy input
speech in order to improve the recognition result of the speech recognition
apparatus comprises the following steps for producing an improved minimum
mean square error estimate conditioned by compensatory characteristics of
the noisy input speech:
(a) computing optimal estimated distance values over the given range of
frequenciesfor noise-free template speech, based upon comparing known speech
segments, which are input in a noise-free environment and converted into
corresponding templates of known speech signals t.sub.s, with unknown speech
segments, which are input in a noise-free environment and converted to unknown
speech signals u.sub.s ;
(b) computing estimated variance values corresponding to the optimal estimated
distance values for a sample population of noise-free speech segments;
(c) storing said optimal estimated distance values and variance values on
a look-up table associated with the template speech;
(d) computing squared distance values over the given range of frequencies
for input noisy unknown speech signals u.sub.s+n compared with signals t.sub.s+n
representing template speech to which a spectral representation of noise
n in the actual input environment is added;
(e) replacing the computed squared distance values for the unknown speech
signals with conditional expected distance values calculated using the optimal
estimated distance values and variance values obtained from the look-up table,
in order to derive noise-immune metric values for the unknown speech signals;
and
(f) computing the minimum mean square error of the noise-immune metric values
for the unknown speech signals compared with the noise-free template speech
signals, whereby an improved recognition result is obtained.
In regard to the above, the implementation of the noise immune distance metric
is mathematically explained in Appendix V. Appendix V describes how the metric
or algorithm can be stored into an existing metric which is widely known
in the field as the Olano metric. As indicated, noise immunity is obtained
in this system by replacing the Euclidean square distance between the template
and unknown frames of speech by its conditional expectation of the square
distance without noise, given the noisy observations. As can be seen from
Appendix V, the conditional expected value is the minimum means square error
estimate of the distance.
It, therefore, will reduce the noise on the frame-to-frame distance values
to its minimum possible value for given data. In order to implement the use
of the above described system, the noise metrics can be installed in any
system by substituting the optimal parameter values as derived and as explained
and by augmenting the feature vector with the variance information. Thus
for each signal frame which, as indicated above, is implemented in a voice
recognition system by means of the bandpass filter outputs after they have
been digitized, one performs the following steps:
1. Replace (by table lookup) the noisy estimate with optimal estimate.
2. Obtain the variance, (also by table lookup).
3. Normalize the filterbank parameters.
4. Normalize the variance to account for parameter normalization.
5. Augment the feature vectors with the variance information.
The mathematics, as indicated, are explained in great detail in Appendix
V and particularly show how to modify the Olano metric.
The following Appendices are included herein and are referred to during the
course of the specification to establish the mathematics used in accordance
with this invention:
1. Appendix II--OPTIMAL ESTIMATORS FOR RESTORATION OF NOISY DFT SPECTRA.
2. Appendix III--MEAN AND VARIANCE OF TRANSFORMED OPTIMAL PARAMETERS.
3. Appendix IV--COMBINING SPEECH AND NOISE IN THE BANDPASS FILTER DOMAIN.
4. Appendix V--UNORMALIZED NOISE METRIC STUDIES.
APPENDIX II
OPTIMAL ESTIMATORS FOR RESTORATION OF NOISY DFT SPECTRA
This Appendix considers processes which optimally restore the corrupted spectrum,
x=s +1c to a spectrum which minimizes the expected value of the norm squared
error between a function of the clean speech, f (s), and the same function
of the estimate, f (s), using only the noisy spectrum x and the average noise
energy at each frequency, P.sub.N. The restoration is done for each frequency
individually, and any correlation which might exist between spectral values
at different frequencies is ignored. The functions, f , to be considered
include: ##EQU20##
These compression functions are commonly used in both speech recognition
and speech compression applications. Having optimal estimators for each case
allows the estimation to be matched to the type of compression used prior
to the distance calculation. That is if the recognizer matches cepstral
parameters, then the appropriate function to select would be log, etc. The
power function was estimated to measure performance differences with spectral
subtraction techniques based on the power function. Each of these minimizations
is described below.
ESTIMATING THE MAGNITUDE SPECTRUM
Many speech recognition algorithms are sensitive only to spectral magnitude
information. Human perception is also generally more sensitive to signal
amplitude than to phase. A speech enhancement system used to restore speech
for human listeners, or as a preprocessor for an automatic speech recognition
device, might therefore be expected to perform better if it is designed to
restore the spectral magnitude or power, ignoring phase. In this case,
appropriate optimization criterion functions, f , are:
f(s)=.vertline.s.vertline.,
or
f(s)=.vertline.s.vertline..sup.2,
and the optimal restoration function will minimize the ensemble average of
the error quantity
E[(.vertline.s.vertline.-.vertline.s.vertline.).sup.2 .vertline.x,P.sub.N]
or
E[(.vertline.s.vertline..sup.2 -.vertline.s.vertline..sup.2).sup.2
.vertline.x,P.sub.N ].
ESTIMATING THE COMPRESSED MAGNITUDE SPECTRUM
Studies of audition suggest that there is an effective compression active
in some perceptual phenomena (especially the sensation of loudness). Some
speech recognition devices also incorporate compression in the feature extraction
process. This suggests the criterion function:
f(s)=c(.vertline.s.vertline.)
where c is a compression function. In this case, the optimal restoration
function will minimize the ensemble average of the error quantity
E[(c(.vertline.s.vertline.)-c(.vertline.s.vertline.)) .sup.2 .vertline.x,P.sub.N
]
We shall consider two compression functions: the logarithm and the square
root.
Note that since the cepstrum is the Fourier Transform of the logarithm of
the magnitude spectrum, and the Fourier Transform is a linear process,
minimization of the mean square error in the cepstrum is obtained when the
optimality criterion is f(s)=log .vertline.s.vertline..
ESTIMATING THE COMPLEX SPECTRUM
Adopting the identity function for f , leads to a complex spectrum estimator
which minimizes the error quantity
E[.vertline.s-s.vertline..sup.2 .vertline.x,P.sub.N ].
RELATION TO WIENER FILTER
By integrating over all time, the Wiener filter minimize the mean square
error of a time waveform estimate, subject to the constraint that the estimate
is a linear function of the observed values. In the time domain the Wiener
filtering operation can be represented as a convolution, and in the frequency
domain as multiplication by the filter gain function. At a single frequency
the Wiener filter spectrum estimate is therefore a constant times the corrupted
spectrum value, x i.e. a linear function of the spectral magnitude. If speech
spectral values, s, had a Gaussian distribution, then the spectrum estimator
which minimizes the error quantity mentioned above would also be linear,
i.e., a constant times x. However, the distribution of speech differs greatly
from a Gaussian distribution, and the true optimal estimator function is
highly non-linear. FIG. 1 shows the cumulative distribution of real speech
spectral magnitudes and the cumulative distribution of the spectral magnitude
of a complex Gaussian time signal of equal power. The speech distribution
was obtained using a 1000 frame subset from the 27,000 magnitude frames used
to compute the estimators described in the implementation section. The Gaussian
signal was generated by averaging 20 uniformly distributed random numbers
of equal energy. The optimal linear estimator, corresponding to a Wiener
filter, is shown with the non-linear estimator averaged over all frequencies
and the mapping for Spectral Subtraction in FIG. 2.
2.2.4. Minimum Mean Square Error Estimators
The minimum mean square error estimate of a function of the short-term speech
spectral value is the a posteriori conditional mean of that function given
the speech and noise statistics and the noisy spectral value. This estimate
can be calculated as follows. Let f represent the function of the spectrum
to be estimated. Let s and x be clean and noisy complex spectral values,
respectively, and n the complex noise. Let f (s) denote the optimal estimator
of the function f (s) and let E{.}.sub.p denote expectation with respect
to the probability distribution p. Then the minimum mean square estimate
is given by:
f(s)=E{f(s)}.sub.s.vertline.x =.intg.f(s)p(s.vertline.x)ds.
When speech and noise are independent,
p(x.vertline.s)=p(s+n.vertline.s)=p.sub.n (n)=p.sub.n (x-s),
where p.sub.n is the a priori noise density function. Thus the density of
the joint distribution of clean and noisy spectral values is given by:
p(s,x)=p(x.vertline.s)p.sub.s (s)=p.sub.n (x-s)p.sub.s (s),
Where p.sub.s is the a priori speech probability density. Substituting gives
##EQU21##
Thus, the optimal estimator, f (s), equals the ratio of expected values of
two random variables with respect to the distribution of clean speech spectra.
SPECIALIZATION TO A GAUSSIAN NOISE MODEL
Assume that the noise has a zero mean, is uniform in phase, and has a Gaussian
distribution with power P.sub.N. Then the noise density function is:
p.sub.n (n)=.gamma.exp(-.vertline.n.vertline..sup.2 /P.sub.N)
where .gamma. is a normalization factor.
Substituting x-s for n in the expression for the optimal estimator gives:
##EQU22## Clean speech spectral values are observed to be uniformly distributed
in phase so p.sub.s (s) depends only on .vertline.s.vertline.. The density,
p.sub..vertline.s.vertline., of .vertline.s.vertline. on the positive real
line is then related to p.sub.s, the density of s, in the complex plane by:
##EQU23## The integrals in the expression for the optimal estimator are evaluated
in the complex plane using polar coordinates. Using the fact that ##EQU24##
where I.sub.n is the nth order modified Bessel function, the integrals can
be reduced to the real line.
Two cases are considered, f(s)=s and f(s)=c(.vertline.s.vertline.), where
c is a compression function to be specified.
In the first case the estimator reduces to: ##EQU25## where .psi. is the
phase of the corrupted spectral value, x. This shows that the phase of the
best estimate of the complex spectral value is the noisy phase.
In the second case the estimator reduces to: ##EQU26##
EVALUATION USING A LARGE SAMPLE OF SPEECH
The estimates given above can be evaluated by interpreting them as ratios
of expectations with respect to the distribution of .vertline.s.vertline.
on the real line. Each integral in the expressions above is an expected value
with respect to the distribution of .vertline.s.vertline., as characterized
by its density, p.sub..vertline.s.vertline.. These expected values are functions
of .vertline.s.vertline., .vertline.x.vertline. and P.sub.N. They can be
conveniently approximated as average values of the given functions summed
over a large sample of clean speech.
Using the ratio of sample averages to approximate, the optimal estimator
has the significant practical advantage that the a priori distribution of
.vertline.s.vertline. need not be known or approximated. In view of the
significant error introduced by the fairly common erroneous assumption that
speech spectral values have a Gaussian distribution, this distribution-free
approach to finding the optimal estimator is particularly attractive. From
a theoretical point of view, the ratio of sample averages can be defended
as giving a consistent estimate of the optimal estimator. Although it is
a biased estimate, the bias can, in practice, be made negligible by using
a large sample. For this study, 27,000 samples of spectral magnitude were
taken from the marked speech of the six males and two females in the X data
base.
Of course, an optimal estimator obtained in this way is optimal with respect
to the distribution of .vertline.s.vertline. in the population of speech
from which the sample is taken. We have observed the distribution of
conversational speech spectral magnitude to be stable and reproducible when
averaged over twenty seconds or more, and normalized with respect to the
rms value after removal of silence. To make this normalization explicit,
with respect to speech power, we introduce the normalized spectral magnitude:
##EQU27## where P.sub.S is the average speech power in the sample S of speech.
TABLE GENERATION
The expressions given above for the optimal estimators can be expressed as
tables in terms of the speech-to-noise ratio SNR=P.sub.S /P.sub.N, the noise
power, P.sub.N, and the distribution of the dimensionless clean speech spectral
magnitude, .sigma.. For restoration of speech it is convenient to implement
an optimal estimator in the form of tables which gives the spectral component
magnitude estimate as a function of the noisy spectral component magnitude,
.vertline.x.vertline., using a different table for each SNR value of interest.
It has been found useful to normalize the table input and output by
.sqroot.P.sub.N, since the tables are then only weakly dependent on SNR.
Accordingly, we introduce the dimensionless input quantity: ##EQU28## Tables,
t(.xi.,SNR), for estimating the complex spectrum, are then computed using
the expressions above for s, with the expectations converted to averages.
The estimator for the first case reduces to: ##EQU29## The estimate is then
implemented as: ##EQU30##
Defining .vertline.s.vertline..sub.c as the spectral component magnitude
estimate which leads to the minimum mean square error in estimating the tables
for c(.vertline.s.vertline.) are defined by: ##EQU31## which, when the
compression function c is any power or the logarithm function, reduces to
##EQU32## The estimate is then implemented as: ##EQU33##
IMPLEMENTATION
The restoration procedure consists of generating a table for mapping noisy
magnitude spectra into optimal estimates of the clean speech spectra. Values
for the table are calculated using a large population of talkers to obtain
a speaker independent process. The table is incorporated into a short time
spectral analysis-synthesis program which replaces noisy speech with restored
speech.
TABLE GENERATION
The optimal estimators are functions of the distribution of .vertline.s.vertline.
in the DFT frequency bin, the SNR in that bin, and the spectral magnitude
.vertline.x.vertline. of noisy signal divided by P.sub.N. A large sample
of conversational speech (27,000 frames) was taken from the wordspotting
data base, and a Gaussian noise model was used to build a set of tables
specifying the optimal estimates at a preselected set of five frequencies
and three SNR values. The five frequencies selected were a subset of the
center frequencies of the bandpass filterbank used to measure the spectral
parameters in the speech recognition system. The frequencies were 300, 425,
1063, 2129, and 3230 Hz. The optimal estimator tables were calculated at
each of these node frequencies. For the initial experiments, estimates at
other DFT bin frequencies were obtained by linear interpolation from these
five tables. Subsequent experiments used a single table representing the
average over all frequencies.
GENERATION OF A REPRESENTATIVE SPEECH POPULATION
A marked data base, was used as a representative conversational speech sample
for calculating the estimators. The data base consists of eight speakers
(six males and two females). Each 10 ms frame of speech has been marked as
either speech or non-speech by a trained listener, and only frames marked
as speech were used. For each frame, the DFT complex spectrum is calculated
at each of the specified node frequency bins. A total of 27,000 frames of
speech were used to estimate each table.
SIGNAL-TO-NOISE RATIO ESTIMATION
Table values for the optimal estimator are dependent upon the speech distribution
and the noise power. Thus, they are dependent upon the local signal-to-noise
ratio in each frequency bin. Tables were generated based on average
signal-to-noise ratios, across all frequencies, of 0 dB, 10 dB, and 20 dB.
At each of these levels the average noise power was measured.
Average speech power was measured separately by first generating a histogram
of speech power from the multi-speaker conversational data base. The contribution
in the histogram due to silence is suppressed by noting that non-speech manifests
itself in the histogram as a chi-squared distribution at the low end of the
histogram. Non-speech power is removed by subtracting a least squares fit
to the chi-square distribution using the low end histogram samples from the
overall distribution. Speech power is then calculated by summing the difference.
Table entries are computed for normalized magnitude values, .xi., from 0
to 10 in steps of 0.2. The table is linearly extended from 70. to 700. Each
entry is calculated by specifying the value of P.sub.N based upon the average
signal-to-noise ratio and the value of .xi.. The tables are calculated by
averaging over all speech samples at a given frequency.
OPTIMAL ESTIMATORS
The optimal estimators for each criterion function, f, are presented in FIG.
3. These tables were calculated based upon an average signal-to-noise ratio
of 10 dB.
The estimator is a function of the signal-to-noise ratio. This is demonstrated
by computing the tables based upon signal-to-noise ratios of 0, 10, and 20
dB. Examining the resulting estimator shows that the signal-to-noise ratio
dependence is similar for all frequencies. FIG. 4 gives an example of the
SNR dependence for the complex spectrum estimate at frequency 1063 Hz.
ANALYSIS-SYNTHESIS PROCEDURES
The analysis-synthesis procedures were implemented using an algorithm similar
to that used to implement Spectral Subtraction. The input noisy speech was
analyzed using 200 point, half-overlapped hanning windows. A 256-point DFT
is taken and converted to polar coordinates. The magnitude spectrum is normalized
at each frequency by the square root of the average noise spectrum, P.sub.N.
The restored magnitude spectrum is found using the optimal estimator tables
at the five node frequencies and linearly interpolating at other frequencies.
EVALUATION ON CONNECTED DIGITS
The effectiveness of the estimator as a noise suppression preprocessor was
measured both qualitatively by listening to the synthesis, and quantitatively
by measuring the improvement in performance of a connected digit recognition
algorithm using noisy speech with and without noise stripping at a
signal-to-noise ratio of 10 dB. Recognition performance is compared with
other approaches to noise stripping [8], performance without noise stripping,
and performance using alternative optimality criterion functions, f.
RECOGNITION EXPERIMENT
The recognition experiment used a 3, 4, 5, and 7 connected digit data base
spoken by eight talkers (four males and four females). Template information
consisted of nine tokens per digit per speaker. Three of the tokens were
spoken in isolation and six of the tokens were manually extracted. For each
speaker there were 680 trials. The recognition experiments were done speaker
dependently. The feature vectors from templates and unknowns were matched
using a prior art metric. White Gaussian noise was added to the unknown data
to give an average signal-to-noise ratio of 10 dB.
SUMMARY OF RESULTS
Results are presented in terms of recognition error rates averaged over eight
speakers as a function of the type of preprocessing. Also given is the error
rate and the percent recovery from the noisy error rate, i.e., ##EQU34##
The need for two dimensional interpolation was also tested by collapsing
the five frequency tables into a single averaged table. The averaged table
for the root estimator is presented in FIG. 2.
______________________________________ The legend for the table is: Clean: Speech recorded using a 12 bit analog-to- digital converter in a quiet environment. Noisy: speech with Gaussian Noise added to give a 10 dB signal-to-noise ratio. SS: Noisy Processed by Spectral Subtraction [8] Spectrum: Noisy Processed by using .function.(s) = s. Power: Noisy Processed by using .function.(s) = .vertline.s.vertline. .sup.2. Mag: Noisy Processed by using .function.(s) = .vertline.s.vertline. . Root: ##STR1## Single Table Root: Noisy Processed by Single Table of Root Log: Noisy Processed by using .function.(s) = log.vertline.s.vertli ne.. ______________________________________ Score Error Recovery Unknown Template (%) Rate (%) (%) ______________________________________ Clean Clean 98.4 1.6 100 10 dB Clean 58.1 41.9 0 SS Clean 88.7 11.3 76 Root Clean 89.8 10.2 79 Root-Ave. Clean 88.6 11.4 76 Log Clean 91.1 8.9 82 Mag Clean 87.9 12.1 74 Power Clean 81.2 18.8 57 Spect Clean 86.5 13.5 70 10 dB 10 dB 96.4 3.6 95 SS SS 95.5 4.5 93 Spect Spect 96.5 3.5 95 Power Power 97.6 2.4 98 Mag Mag 97.7 2.3 98 Log Log 97.9 2.1 99 Root Root 97.9 2.1 99 Root-Ave. Root-Ave. 97.8 2.2 99 ______________________________________
OBSERVATIONS
Use of the optimal estimators reduces the error rate for a speaker dependent
connected digit speech recognition experiment using a 10 dB signal-to-noise
data base from 42% to 10%. In addition, by processing the template data in
the same way as the unknown data, the error rate can be further reduced from
10% to 2%. Standard Spectral Subtraction techniques perform at a level near
those of the optimal estimator.
The use of a single table reduced performance by 1.1% compared to multiple
tables when the recognizer used clean templates, but resulted in essentially
no degradation when the recognizer used processed templates.
LISTENING TESTS
Informal listening tests were conducted to compare the alternative forms
of processing. The results can be divided into roughly three characterizations:
(1) speech plus musical noise; (2) speech plus white noise; and, (3) a blend
of 1 and 2. The spectral subtraction, SS and complex spectral estimate, Spect,
clearly fall in the first category. The Mag and Pow estimates are characterized
by the second category. Finally, the Root and Log processes are characterized
by the third category.
These results can be correlated with the transfer function characteristics
by noting how the low amplitude signals are treated. When the low amplitude
magnitudes are severely attenuated, as in the Spect and SS options, the spectrum
is "more spike-like" with many narrow bands of signal separated by low energy
intervals giving rise to the musical quality. When the low amplitude signals
are set to a constant, as in the Mag and Pow options, the effect is to fill
in between the spikes with white noise.
APPENDIX III
MEAN AND VARIANCE OF TRANSFORMED OPTIMAL PARAMETERS
Introduction
This section addresses two topics: estimation of the magnitude value which
minimizes the mean square error between compressed magnitudes; and estimation
of the variance of the estimator in terms of precomputed mean value tables.
This section shows that using this approach to magnitude estimation produces
an unbiased estimator. It also shows that for monomial compression functions
such as square root or power, the variance can be calculated directly from
the mean tables.
Optimal Magnitude Estimator
Define the output power in a bandpass channel (either BPF or DFT), to be
P in the absence of noise, with magnitude, M=.sqroot.P. Define P* as the
noisy power due to the presence of stationary noise with mean power value
P.sub.n. Let the mean power value of the clean speech signal be P.sub.s.
The general form for the optimal estimator, c(M) of c(M) (not necessarily
a compressed function of the magnitude), which minimizes the error quantity:
##EQU35## is the conditional expected value: ##EQU36##
In Appendix IV there is desired methods for computing estimators of the spectral
magnitude, M=.sqroot.P, which minimize this mean square error, with respect
to various compression functions, c. The compression functions considered
include the identity, log, square and square root. This section presents
this formulation again from a perspective which emphasizes the relation between
the compression function and the conditional expected value.
The optimal magnitude estimator M.sub.c must satisfy
c(M.sub.c)=c(M).
We can solve for M.sub.c by considering compression functions c which are
one to one on the real line, R.sup.+, and thus have inverses on this domain.
Then ##EQU37##
The Optimal Estimator As a Table Lookup
Our method of computing M.sub.c uses the distribution of the noise (assumed
to be Guassian) and the distribution of clean speech. The Gaussian noise
is completely characterized by its mean power P.sub.n. We assume that the
speech power is scaled by P.sub.s. Thus normalizing the instantaneous speech
power by P.sub.s, results in a fixed distribution which is obtained from
any sample of speech, by just normalizing.
Under these conditions and also that c is a power or the logarithm, it can
be shown that a scaling factor can be extracted from M.sub.c, permitting
the table lookup to be a function of two variables. We chose to use
.sqroot.P.sub.n, as it had dominant effect.
The optimal estimator is implemented as a table lookup with .sqroot.P*/P.sub.n
as the argument and SNR=P.sub.s /P.sub.n as a parameter, i.e., different
tables for different values of SNR. Define the estimator, M.sub.c in terms
of the table t.sub.c as: ##EQU38## where t.sub.c represents the table lookup
function based upon compression function c. Solving for t.sub.c gives: ##EQU39##
The form actually implemented normalizes M by .sqroot.P.sub.n first before
forming the expected values.
Compression Functions
We use various compression functions applied to the magnitude to form the
recognition parameters. For example, the Olano metric uses the square root
of magnitude. In general, we get a recognition parameter, x, from compression
function k on the magnitude:
x=k(M).
In the presence of noise we use the optimal estimator for M rather than the
noisy value. Suppose we use the optimal estimator M.sub.c. Then we will be
using recognition parameters:
x=k(M.sub.c).
which will differ from the true, noise-free value by
.epsilon.=k(M.sub.c)-k(M).
Statistics of Recognition Parameters
In this section we derive the statistics of the recognition parameters with
respect to noise effects. Thus ##EQU40## When the compression functions k
and c are the same, ##EQU41## Evaluating the bias of the estimator gives:
##EQU42## So that in this case the estimator is unbiased.
The variance of the estimation parameters can be obtained as: ##EQU43## Since
E{c(M)}=c(M.sub.c)
substituting for M.sub.c gives: ##EQU44##
By the same approach, when c.sup.2 is one to one on the real line, R.sup.+,
and thus has an inverse, ##EQU45## where t.sub.c s is the table lookup estimator
for the compression function c.sup.2 =c.times.c, (multiplication of functions
not composition).
Variance of the Square Root Estimator
If we use Olano parameters (before normalization) with a square-root c table,
c=.sqroot., denoted by r, and c.sup.2 =id. In that case ##EQU46## Where t.sub.m
and t.sub.r are the tables for the magnitude and root estimators.
Variance of the Magnitude Estimator
If we were to use magnitude values themselves without compression as the
recognition parameters, c=id and c.sup.2 is the square law, which we have
called p. In that case ##EQU47##
Examples of Mean and Variance Data
Mean and variance data based upon the square root function were generated
with frames marked with SxSy categories 0 through 5 and displayed with two
types of scatter plots. Spectral outputs from the ninth filter were selected
as approximately representing the average signal to noise ratio over the
entire baseband. FIG. 5 shows the clean verses noisy 4th root of power spectral
frames using just speech frames. In FIG. 6 both speech and non-speech frames
are included. Along any vertical axis the estimator lies at the mean value.
Likewise the standard deviation represents about 30 percent of the scatter
away from the mean. The dark band in FIG. 6 corresponds to frames where the
clean speech was near zero. Since the optimal parameter tables where trained
on speech only frames, the mean distance is not biased by this non-speech
concentration.
APPENDIX IV
COMBINING SPEECH AND NOISE IN THE BANDPASS FILTER DOMAIN
Introduction
This section derives the probability density function for the noisy filterbank
parameter, X, given the clean filterbank parameter, S, P(X.vertline.S). This
density describes how speech and noise combine in the filterbank domain.
It is needed in order to generate the conditional expected value, S, and
its variance, of the clean filterbank parameter given the noise parameter,
E[S.vertline.X]. As discussed in Appendix III, this conditional expected
values, S, minimizes the mean square error:
E[(S-S).sup.2 .vertline.X].
Each bandpass filter, BPF channel is modeled as the sum of independent DFT
channels. The number of independent channels will be less than or equal to
the number of bins combined to form a filterbank output.
Each DFT channel is modeled as an additive complex process, with the signal
s.sup.k .xi..sup.k +i.eta..sup.k in the kth channel and noise n.sup.k
=.xi..sub.n.sup.k +i.eta..sub.n.sup.k. The noise is assumed Guassian and
uniformly distributed in phase. Define the noisy signal as,
x.sup.k =s.sup.k +n.sup.k =(.xi..sup.k +.eta..sup.k)+i(.lambda..sub.n.sup.k
+.eta..sub.n.sup.k)
The noisy channel signals add as: ##EQU48##
We assume the noise in each independent channel has the same value,
.sigma..sub.n, (to be determined). Then the density of the noisy signal given
the clean signal in the complex plane is: ##EQU49##
Let X be the BPF channel output with noise and S the BPF channel output without
noise.
The joint density of the individual channel observation (x.sup.1, . . . ,
x.sup..chi.) given the signal values, (s.sup.1,. . . , s.sup..chi.), is the
product density: ##EQU50## where .chi. is the number of individual channels.
We see that the conditional distribution of X given the .chi. signal values
(s.sup.1, . . . , s.sup..chi.), is just the distribution of a sum of squares
of 2.sub..chi. normal variates which are independent, each having variance
.sigma..sub.n.sup.2, and means .xi..sup.1, .eta..sup.1, .xi..sup.2, .eta..sup.2,
. . . , .xi..sup..chi., .eta..sup..chi.. This is the non-central chi-squared
distribution in 2.sub..chi. degrees of freedom.
Kendall and Stewart (Vol. II, page 244, Advanced Theory of Statistics) shows
that the density of the quantity ##EQU51## where each x.sub.i is unit variance
Guassian with mean .mu..sub.i, (all independent) is ##EQU52##
(We note that the density of Z depends on the means .mu..sub.1, . . . ,
.mu..sub.n, only through the sum of their squares, .lambda., which is fortuitous,
as it makes the density of X depend on the individual DFT channel means
(.xi..sup.k,n.sup.k), through the sum of their squares, which is S).
To apply this, note that ##EQU53## showing that X/.sigma..sup.2 is distributed
as the sum of 2.sub..chi. unit variance, independent Gaussians with means
##EQU54## Therefore the density of ##EQU55##
Simplification Using Bessel Functions
Abramowitz and Stegin (AMS55) formula 9.6.1 shows that the modified Bessel
function of the first kind of order .upsilon. is ##EQU56## Comparing this
with the expression for P(X.vertline.S), see that ##EQU57## For later use,
we note the special case, obtainable directly from the power series expansion,
##EQU58## This agrees exactly with the (central) chi-squared distribution
in 2.sub..chi. degrees of freedom, as it should.
Determination of the Number of Independent Degrees of Freedom
The special case S=0 predicts that the BPF channel will pass Guassian noise
yielding outputs with the statistics of the chi squared distribution with
2.sub..chi. degrees of freedom. The mean and variance are ##EQU59## Let's
define P.sub.n to be the mean noise power in the channel with no signal present:
that is ##EQU60##
By measuring the average and the variance of the output of the BPF channel,
we can therefore estimate the properties of the channel by the way it passes
Gaussian noise. The channel can be characterized by any two of the four
parameters .chi.,.sigma..sub.n.sup.2,P.sub.n,var(P.sub.n).
We choose .chi. and P.sub.n because the former should be independent of the
channel input and the second should be a constant gain times the variance
or power of the input noise.
Density In Terms of Measurement Parameters
The model predicts the distribution of noisy output, given the clean output,
in terms of the channel characteristic .chi., and the noise-only mean output
P.sub.n : ##EQU61## and, in the special case of noise only, ##EQU62## FIG.
7 shows an example of how well the noise-only case fits observation. Shown
is the actual cumulative distribution for Gaussian white noise through the
first channel and the predicted distribution, P(X.vertline.S=0). The first
channel has an independent equivalent count, .chi., of 1.81. The sample size
is 1103 frames was used to generate the distribution. FIG. 8 shows an example
of how well the noisy speech case fits the predicted distribution,
P(X.vertline.S). Shown are the fractiles for the distribution superimposed
on a scatter plot of WIJA's clean versus noise channel parameters taken from
the 25th channel.
Distribution Using Normalized Parameters
We have been using quantities with the dimension of magnitude (versus power)
and normalizing by the rms magnitude of noise. What is actually required
is the distribution of the non-dimensional quantities ##EQU63##
As one check of these results, we see that they reduce to the previous case
of individual DFT channels when .chi.=1. In that case, these distributions
are
P.sub.1 (.xi..vertline..eta.)=2.xi.e.sup.- (.xi..sup.s +.eta..sup.s)I.sub.0
(2.xi..eta.)
and
P.sub.1 (.xi..vertline..eta.=0)=2.xi.e.sup.- .xi..sup.s
The first formula is used to find optimal DFT channel estimators and the
second is a normalized Rayleigh distribution.
For scaling purposes we have found it convenient to use a different
transformation for the Bessel function calculation. We use ##EQU64##
APPENDIX V
UNNORMALIZED NOISE METRIC STUDIES
Introduction
This section presents an analysis of the noise immune distance metric without
normalization. Parameters consist of the fourth root of filterbank output
power. To demonstrate that the metric has been calculated properly, scatter
plots of clean distances versus noise parameters superimposed with the noise
metric are presented. Also in order to verify the installation of the variance
tables and new metric code, wordspotting runs using unnormalized parameters
were made. As expected, use of these parameters reduce overall performance
by 10% to 20%. However, the intent of these experiments was to verify the
code and correlate a reduction in rms error to increased wordspotting
performance. The results demonstrate that the combination of optimal parameters
plus variance terms improves performance to the same level as using clean
speech for five of the six template talkers.
Minimum Error Estimate
The noise metric is based upon the premise that adding noise to the unknown
or template generates noisy distances. Noise immunity is obtained by replacing
the Euclidean squared distance between the template and unknown frames by
its conditional expectation of the squared distance without noise, given
the noisy observations. That is,
d.sup.2 =E[(t.sub.s -u.sub.s).sup.2 .vertline.t.sub.s+n, u.sub.s+n, P.sub.s,
P.sub.n ]
where
t.sub.s =4th root of power for template
u.sub.s =4th root of power for unknown
t.sub.s+n =noisy template filterbank parameter
u.sub.s+n =noisy unknown filterbank parameter
P.sub.s =Average Power of Speech
P.sub.n =Average Power of Noise.
The conditional expected value is the minimum mean squared error estimate
of the distance, given the observations. It will reduce the noise on the
frame-to-frame distance values to its minimum possible value for the given
data. It is also an unbiased estimate.
Expanding the expected value and replacing mean values by their optimal estimates
gives:
d.sup.2 =.SIGMA.(t.sub.i -u.sub.i).sup.2 +.sigma.t.sup.2.sub.i
+.sigma..sub.u.sup.2.sub.i
The quantities t.sub.i and u.sub.i are the expected values of the template
and unknown for each channel and .sigma..sub.t.sbsb.i and .sigma..sub.u.sbsb.i
are the variances of these estimates for each channel. Notice that this metric
model reduces to a standard Euclidean norm in the absence of noise. The metric
model is also symmetric and can be applied when either the template or unknown
or both are noisy.
Values for these means and variances are obtained by table lookup. These
tables are generated using filterbank parameters as previously described.
To establish that the metric was working properly two types of experiments
were conducted: First scatter plots of clean distances versus noisy filterbank
parameters were generated and superimposed with the euclidean metrics using
noisy and optimal parameters and with the optimal parameters plus the variance
terms. Second, wordspotting runs with these parameters and metrics were made.
Verification of Expected Value
In the same manner as used in Appendix III the validity of the noise metric
as a conditional expected value can be examined by plotting clean distances
versus noisy parameters. The distance requires a noisy unknown frame and
a clean or noisy template frame. In order to plot in just two dimensions,
the template frame was held constant and a set of distances were generated
for various unknown conditions and metrics. Three template frames, 0, 10,
and 50 were selected from the Boonsburo template of speaker 50 representing
the minimum, average and maximum spectral values. Distances and spectral
outputs from the ninth filter were selected as approximately representing
to the average signal to noise ratio over the entire baseband. FIGS. 9 through
11 show the scatter data along with the noisy distance, (straight parabola),
euclidean distance with optimal parameters, and the noisy metric. For this
single channel, single template frame configuration, there is little difference
between using just the optimal parameters and the parameters plus the variance
term. However in each case the noisy metric passes through the mean of the
clean distances given the noisy unknown parameter. The dark band in each
figure corresponds to distances where the clean speech was near zero, resulting
in a distance equal to the square of the template parameter. Since the optimal
parameter tables where trained on speech only frames, the mean distance is
not biased by this non-speech concentration. Note that for large values of
the noise parameter, that all three distances agree. This is to be expected,
since the mean has approached the identity and the variance has approached
zero (See FIGS. 10 and 11).
Reduction in Mean Square Error
The mean square error for each of these cases was also computed. The error
was claculated as: ##EQU65## As expected the error reduced monotonically
going from noisy to the optimal parameters, to the noise metric. Below is
the computed mean square error between clean distance and the distances computed
with each of following parameters: noisy, optimal estimator and optimal estimator
plus variance, i.e., noise metric. The distance is straight Euclidean, i.e.
the sum of the squares between the unknown spectral values minus the template
spectral values. These distances for the mean square error calculation, were
computed by selecting the 10th frame from the Boonsburo template for speaker
50 and dragging it by 1100 speech frames from the first section of WIJA.
The average mean square error values are:
______________________________________ Condition mse ______________________________________ noisy - clean 9.4 optimal parameters - clean 3.3 noise metric - clean 2.5 ______________________________________
Although this represents only a course examination of performance, it does
demonstrate that the metric is performing as desired. A more realistic test
requires examining its performance in a wordspotting experiment as defined
below.
Wordspotting Using Unnormalized Parameters
The wordspotter was modified to use unnormalized 4th root parameters and
Euclidean distance with or without the variance terms added. All other aspects
of the wordspotting program remained the same, i.e. standard blind deconvolution,
overlap removal, biasing, etc. FIGS. 12 through 17 show the ROC curves for
each template talker.
Observations
Although overall performance using unnormalized parameters is lower than
using normalized features, these experiments show some interesting
characteristics. Specifically, for five of the six template talkers, use
of the optimal parameters and/or the noise metric returned performance to
levels equal to the clean unknown data. This degree of restoration is not
found in the normalized case. Stated another way, normalization tends to
minimize the deleterious effect of noise and the restoring effect of the
optimal parameters.