[R] spec.pgram() normalized too what?

Tue Jan 24 10:57:16 CET 2006

Dear list,

What on earth is spec.pgram() normalized too? If you would like to skip my
proof as to why it's not normed too the mean squared or sum squared
amplitude of the discrete function a[], feel free too skip the rest of the
message. If it is, but you know why it's not exact in spec.pgram() when it
should be, skip the rest of this message. The issue I refer herein refers
only too a single univariate time series, and may not reflect the issues
encountered in 3 or more dimensions.

I've been using Numerical Recipes too confirm my own routines, because the
periodogram estimate is not normalized to the sum squared, nor the mean
squared of the original discrete function, a[]. I'd expect the
non-overlapped, and non-tapered version of the periodogram too sum to
EXACTLY too some normalization of the discrete signal a[], and the word
"estimate" should come into play only when making inferences about the real
signal 'a' that the discretely sampled signal came from. 

>a	<-	sin(2*pi*(0:127)/16)
>N	<-	length(a)	# 128
>PSD	<-	spec.pgram(a, spans=NULL, detrend=F)

## Sum Squared amplitude of a[t] on [0, 127].
>sum(abs(a)^2)
[1] 64

## Mean Squared amplitude of a[t] on [0, 127]
sum(a^2)/N
[1] 0.5

By Parseval's theorem, the integral of the one-sided PSD over zero and
positive frequencies should equal the mean squared or sum squared amplitude
of the discrete signal a[], assuming the PSD is normalized too the
mean-square or sum squared of the signal a[] respectively. 

## Integral of the PSD returned by spec.pgram() does not equal MS or SS of
a[]
>sum(PSD$spec)
[1] 32.28501

Since the integral over positive frequencies does not equal the mean or sum
squared of the signal, I did some digging. The documentation for
spec.pgram() states that only the power at positive frequencies is returned
'due to symmetry'. Press, et al (2002) make mention of this situation.

"Be warned that one occasionally sees PSDs defined without this factor two.
These, strictly speaking, are called two-sided power spectral densities, but
some books are not careful about stating whether one- or two-sided is to be
assumed" (2002, p. 504).

As a result, I infer that spec.pgram() is returning what Press, et al. would
call a two-sided PSD. Thus, the power spectrum returned by spec.pgram()
should be multiplied by 2 between (DC, nyquist) non-inclusive, and the mean
can be resolved separately as the DC component is not returned by
spec.pgram(), as:

## N/2 used here because the length of PSD$spec is N/2 rather than N/2 + 1
#  as it does not return a DC value.

>mean(a) + PSD$spec[floor(N/2)]  + 2 * sum(PSD$spec[2:floor(N/2)-1])
[1] 64.54347

This situation is closer, and indicates that the normalization is likely too
the sum squared amplitude of a[], but it is not EXACT, as I would expect.
But I also checked a manual PSD estimate just to show the power spectrum of
a single periodicity would actually match the mean or sum squared amplitude
of a[] (the discrete function) exactly.

>a.fft	<-	fft(a)

# Two-sided estimate normed to SS Amplitude
>1/N * sum(Mod(a.fft[1:floor(N/2)+1])^2)
[1] 32

# One-sided estimate normed to SS Amplitude. The first part gets the
# quantities across the nyquist range from 0 too fc. The next 2 lines
# take out the factor 2 from DC and nyquist since those 2 terms are not 
# doubled in the one-sided estimate.

>a.PSD			<-	1/N * 2 * Mod(a.fft[1:(floor(N/2)+1)])^2
>a.PSD[1]			<-	a.PSD[1]/2	#DC freq
>a.PSD[floor(N/2)+1]	<-	a.PSD[floor(N/2)+1]/2
>sum(a.PSD)
[1] 64

This proves that the integral of the one-sided PSD estimate of the discrete
function a[] is actually exactly the same as the sum squared amplitude of
a[], at least in this simple case. Likewise when the periodogram is
normalized to the mean-squared amplitude of a[],

>sum(a.PSD)/N	# MS amplitude of a[]
[1] 0.5

## So I don't have to dimension a.PSD[], I took twice the modulus squared
#  of f[0] too f[c] inclusive all at once, and took out the erroneous
#  factor 2 from f[1] and f[c] independently

>a.PSD			<-	1/(N^2) * 2 * Mod(a.fft[1:(floor(N/2)+1)])^2
>a.PSD[1]			<-	a.PSD[1]/2
>a.PSD[floor(N/2)+1]	<-	a.PSD[floor(N/2)+1]/2

Of note, is that I have yet to encounter a case where the normalized raw
periodogram didn't equal the quantity it was normalized too, even when
segmenting data into multiple PSD estimates, and taking their average at
each frequency. I have not applied a 'window function' (Welch, Hanning, etc)
yet, but the only case I encountered where it's not exact is with
overlapping segments, and that, I suspect, is because the first section of
the first window, and last section of the last window are only used once as
opposed to twice like all of the other data points. If the last segment
wrapped around too the first in order too form the last window, I'd expect
the norming to be exact again. I have yet to encounter an estimate returned
by spec.pgram() which actually did equal the norming conventions that I've
expected too see (MS or SS amplitude).

I have tried various combinations of parameters in spec.pgram() to turn off
tapering and spans, so that essentially there is no leakage correction of
the data so I could reproduce the MS and SS amplitudes exactly, just as I
did calculating them independently. Either that can not be done with
spec.pgram(), or spec.pgram() is normalized too something other than the sum
squared amplitude of the signal a[]. 

What does spec.pgram() normalize too? Perhaps the periodogram returned by
spec.pgram() is normalized to the variance or some other quantity? If so,
perhaps someone could indicate why that particular norming convention is
used? 

The reason that this is so important is that the harmonic content fit by
least squares used in the chi-square periodogram is normalized to the
variance (I think), and so too is the Lomb periodogram. Those periodograms
can be used to get p-values from chi-square and exponential distributions
respectively, too assess "important" frequencies, but they're very slow
transforms. The fast fourier transform and/or periodograms can be related
too both methods, for fast computation (albeit, especially for the Lomb
method, it's not a direct relationship), so if the justification for
relating the fft's to any particular method for fast computation depends
partly on what the data are normed too, then it is important to be very
explicit about what conventions are used in any given routine. That way any
transforms on fourier amplitudes or, in the case of a periodogram, powers -
can be translated too and from one form of estimate or another; or if they
cannot be translated, the reason would be evident in the stated conventions.
Thus, I should hope that if spec.pgram() is normed too the SS amplitude of
the discrete function, someone could communicate why it isn't exactly the SS
amplitude, or conversely what the norming convention is.

Thank you very much for your feedback,
KeithC. - an aspiring signal analysis guru & honorary enginerd
chamberk at colorado.edu
Psych Undergrad, CU Boulder
RE McNair Scholar

"Perhaps we're the reason they call it 'psycho'physics..."