Non-Periodic Phenomena in Variable Stars IAU Colloquium, Budapest, 1968 SPECTRAL ANALYSIS OF THE LIGHT CURVES OF T TAURI STARS AND OTHER OBJECTS STEPHEN PLAGEMANN Institute of Theoretical Astronomy, Cambridge, England LIST OF SYMBOLS (1) f frequency in units of cycles per day. (2) f_N Nyquist frequency, where fn = 1/2 Delta t (3) omega angular frequency in radians, f_N = pi / Delta t (4) Delta t sampling interval. (5) n number of observations. (6) lambda(k) lag of time domain window, also known as the covariance averaging kernel. (7) k number of lags in the time domain. (8) m the total number of lags. (9) X_i(t) observed time series (light curve). (10) sigma^2 variance. (11) mu_i mean of time series. (12) gamma_11(k) sample estimate of the autocovariance of lag k uncorrected for the mean. (13) gamma_12(k) crosscovariance of X_1(t) and X_2(t). (14) rho_11(k) autocorrelation coefficient. (15) c_11(k) sample estimate of the autocovariance. (16) r_12(k) ordinary product-moment correlation between X_1(t) and X_2(t). (17) U_k series used as filter in time domain with convolution. (18) T(omega) spectral window or kernel. (19) b(omega) bias. (20) nu degrees of freedom. (21) B omega bandwidth. (22) A_11(k) autocorrelation function of X_1(t). (23) A_21(k) crosscorrelation function of X_2(t) and with X_1(t). (24) I_11(omega) spectral estimate for harmonic analysis. (25) eta(omega-y) spectral window in frequency domain. (26) f (omega) spectral density function of X_1(t); the cosine transformation of X_1(t). (27) R_12 contribution to the sample variance from the jth harmonic. (28) E(k) the energy per unit interval of f(a m Delta t) centred upon k = 0, 1, ..., m. (29) Z_(K) the co-spectrum; the cosine transformation of the crosscorrelation (30) W(k) the quadratine spectrum; the sine transformation of the crosscorrelation. (31) R(k) the coherence. (32) theta(k) phase difference. (33) G(omega) the gain. (34) T(i omega) transfer function. I. INTRODUCTION The purpose of computing power spectra from the light curves of irregular stars is both theoretical and practical. A power spectrum of a star is an ideal method to study a complicated light or velocity curve whose inherent periodicities may be obscured by noise. If one lacks theoretical light or velocity curves, power spectrum analysis of the empirical data can reveal the presence of peaks above the noise level, giving an exact value for the periodicities and their amplitude, and the distribution of energy at different vibrational frequencies. We can also study the changes in time of the period, amplitude and energy distribution. This information can in turn be compared with the T Tauri variables, such as their peculiar absorption and emission spectra, their spectral type and their age and mass when compared to a theoretical H-R diagram. The preparation of power spectra of T Tauri stars involved a research scheme carried out in three stages. Stage I began with an extensive search of the literature over the past fifty years for published light curves of all the 130 T Tauri stars given in a table by Herbig (1962), plus some additions. This task has been eased by references in two biographical works, "Geschichte und Literatur des Lichtwechsels der Veränderlichen Sterne" and the "Astronomischer Jahresbericht". It was necessary to check through each reference to discover which contained the useful data. The light curves were then put on to punched cards, each card having a single Julian Date and magnitude. After the light curves were punched on IBM cards (some 27,000 readings in all) they were sorted out serially and counted on an IBM 1401 computer, the light curves were printed out and checked visually to reveal the presence of flares. The criteria used was an increase in brightness greater than 0.4m in the course of one hour passage. Stage II involved the calculation of pilot power spectra of T Tauri stars for different sampling intervals Delta t and lags K in order to select stable spectral estimates. Checks for stationarity and normality were made by breaking up long data sequences in halves and comparing the power spectra of each. Stage III of estimating the power spectra of T Tauri stars was done using optimal sampling intervals and lags, i.e. those that all the essential details appeared in the spectrum. An important factor in revealing the data was a prewhitening or filtering of the data in order to reduce the effects of power leakage from higher frequencies that tend to distort the power spectrum. As a further check of the power spectrum's stability the light curves were converted to a new time series by the addition of a Gaussian distributed random numbers within a standard deviation of the mean, and the resulting power spectrum was then computed. Definitions and procedural details will be presented below. All of the computations involving time series used a system of programmes (BOMM) developed by Bullard et al. (1966) for geophysical applications. The calculations were carried out by the IBM 7090 computer at Imperial College, London. Total computation times for each light curve analysed will be provided and other details will be presented in a later paper. II. TIME DOMAIN AUTOCORRELATION In the following pages, two forms of spectral analysis will be discussed: a) auto-spectral analysis of a single time series, and b) cross-spectral analysis of pairs of time series for the purpose of comparison of sets of empirical data with theoretical models. We must first consider in the time domain some statistical limitations upon the data. The spectrum of the discrete series X_1(t) is only defined for frequencies up to f_N = 1/2 Delta t cycles per day. As we shall see, some arrangement must be made so that the process contains negligible power at frequencies higher than f_N. If there are no obvious linear trends in the data we can assume stationarity, which is a type of statistical equilibrium which implies that the random variables, X_1 and X_2 have the same probability density function, i.e. f(X_1) and f(X_2) are dependent of t. Practically we can estimate f_1(X_1) and f_2(X_2) by forming a histogram of the observed values of X_1(t) and X_2(t). A small visual difference is sufficient to confirm the stationarity assumption (Jenkins, 1965). Tukey (1961) observed that unless one is sure about the processes, it is wise not to average spectrum over too long a time, since stationarity may hold only for a short time. In fact frequently we are more interested in changes in the energy distribution at each frequency than the power spectrum itself, since this may give physical insight about the mechanism producing the irregular light curves. Another way of formulating stationarity is to insist that all multivariate distributions involving X_1 and X_2 depend only on time differences t-t' and not on t and t' separately. The joint distribution of X_1(t) and X_2(t) is then the same for time points K lags apart, allowing us to construct a scatter diagram of X_1(t) and X_2(t-k). If this joint distribution can be approximated within a reasonably close measure by means of a multivariate normal distribution, then the joint distributions characterized by its means and its covariance matrix. If the errors are Gaussian or normal, then mu_i and sigma^2 characterize the distribution completely, and the condition of normality is fulfilled. We must now more carefully define the covariance matrix, its components and relations to other statistical qualities. The variance sigma^2 is: (2.1) where (2.2) and also (2.3) where p(x) is the probability density function of the errors and mu_i is the mean value of X and p(x). We can also write the sample estimate c_11(k) of the autocovariance of lag K uncorrected for the mean. (2.4) Since c_11(k) is even (2.5) If the number of observations n increases indefinitely, c_11(k) becomes the ensemble autocovariance gamma_11(k). Priestley (1965) points out that C_11(k) has a bias b(omega) the order m/n, but since most estimates of the spectral density use only the first m(<<n) values of gamma_11(k), so the bias in gamma_11(k) is unimportant. If we wish to compute only the autocorrelation, then we use an unbiased estimate of c_11(k). However, to calculate the power spectrum of X_1(t) requires a biased estimate of c_11(k) (see designation 3.14 and the immediate discussion), thus a weighted autocorrelation becomes: (2.6) The autocorrelation coefficients rho_11(k) are phi_11(k)=gamma_11(k)/gamma_11(0); here the same as the regression coefficients of X_1(t) on X_1(t-k). The cross correlation coefficients rho_12(k) become (2.7) Thus if the time series is stationary and normal, its statistical process, i.e. the joint distribution of X_1(t) for time t_1, t_2, ..., t_n is characterized sufficiently by mu_i, sigma_2 and rho_12(k) (Jenkins, 1961). More formally the estimators gamma_12(k) for rho_12(k) are the ordinary product-moment correlations between sequences (2.8) where mu_1 and mu_2 are means calculated from the first n-k and last n-k terms, respectively. The plots of r_12(k) versus the lag k are called the autocorrelation functions A_11(k) and r_12(k) against k the cross-correlation functions. It is worth noting that the means mu_1 and mu_2 must be subtracted to produce proper autocorrelation coefficients. If we do not remove the sample mean before calculating the autocovariances it will dominate the contribution to its cosine transform not only at zero frequency, but also at frequencies close to zero. A Fourier transform of rho_11(k) will dominate only the zero frequency, but if there is linear trend present in addition, there will be an additional contribution to the power at all frequencies, but predominantly at low frequencies. We note also that if the lag k becomes too large, R_12(k) becomes erratic in appearance since fewer observations fall into each group, so that the variance of the estimate increases. Thus resolution cum number of lags k works in the opposite directions to the variance, and a compromise between the two must be effected to accurately estimate autocorrelation coefficients. III. HARMONIC ANALYSIS OF STRICTLY PERIODIC TIME SERIES The sample autocorrelation function is a difficult quantity to interpret physically since neighbouring correlations tend to be correlated. We shall try to look at the time series transformed into the frequency domain. We shall first decompose a time series that is valid only for strictly periodic phenomena. After we show how certain types of noisy spectra can be represented, i.e. we shall estimate a power spectrum of a time series containing both periodic signals and noise. Following Jenkins (1965), we shall unite N = 2n sampling intervals of a strictly periodic time series (light curve) which will be fitted to a harmonic regression of the form (3.1) or (3.2) where The observations are represented by a mixture of sine and cosine waves whose frequencies are multiples of the fundamental frequency 2 pi / N . Then (3.3) (3.4) and (3.5) If only a few of the harmonics are used, and the remainder are regarded as error, then (3.4) and (3.5) give the usual least square estimates of the constants. The total sum of the squares (variances) about the mean can be decomposed. (3.6) If we assume no harmonic terms are present, then N/2*R_ij^2 is distributed as chi-square with two degrees of freedom. If we define the sample estimate of the autocovariance of lag k uncorrected for the mean c_11(k) then the sample spectral density function is (3.7) We can now carry out a harmonic analysis for strictly periodic phenomena as (3.8) If the series X_1(t) is stationary, and if the number of observations N increases indefinitely the histogram R_ij becomes a continuous curve known as the spectral density function f(omega) and, C_11(k) is replaced by gamma_11(k) so that (3.9) It is worth noting that f(omega) and gamma_11(k) are Fourier transforms of each other. We have (3.10) and we write rho_11(k) = gamma(k)/sigma^2 (3.11) The Weiner-Khintchine theorem states this more formally: that a function f (omega) being the integrated spectrum of X (t) has the physical interpretation that follows from the Fourier expansion of X_1(t)_1 i.e. (3.12) where dM(omega) is an orthogonal stochastic process in the interval (-infinity, infinity) with the estimate (3.13) Thus df(omega) represents the average "power" associated with the components of X_1(t) whose frequencies lie between omega and omega + d omega. We shall now see why this easy progression in the proof does not yield valid results for physically meaningful spectra, since simply, the tenants of the basic assumption of strictly periodic X_1(t) are violated for stars with irregularly varying light curves. As we have stated harmonic analysis works only for strictly periodic phenomena. For mixed time series harmonic analysis does not give a natural estimate of the spectra. In fact engineers and statisticians know harmonic analysis of noise or random numbers produces a highly spiked spectrum. For large sample sizes the mean periodogram I_11(omega) does tend to the spectral density, but the variance of the fluctuations of I_11(omega) about f(omega) does not tend to zero as n -> infinity. For a large n, the distribution of I_11(omega) is a multiple of a chi-squared distribution with two degrees of freedom, independently of n. Another way to see that for mixed spectra harmonic estimates of the periodogram I_11(omega) are spurious is that (3.14) where As we see, the weights tend to zero as k tends to N, giving erratic and incorrect estimates. We shall now see a true method of estimating time series as developed by Weiner (1967), Blackman and Tukey (1958), and others. IV. ESTIMATION OF MIXED AUTOSPECTRA We here define a mixed spectra by decomposing F(omega), integrated spectrum of X_1(t) as (4.1) where F_1(omega) is an absolutely continuous function having a derivative f(omega) also called the spectral density function, and F_2(omega) is a step function at certain frequencies omega_n, n = 1, 2,... If a = 1, a_2 = 0 the spectrum is purely continuous; if a_1 = 0, a_2 = 1, it is purely discrete. If a_1 unequal 0, and a_2 unequal 0 are not constant, then the time series is said to possess a mixed spectrum. If a process contains periodic terms and residual processes which in themselves have continuous spectra, then we have to estimate (1) the number of sine waves, (2) their amplitudes and frequencies, and (3) the spectral density function of the residual process. To estimate mixed spectra one defines both a series lambda(k) as a time domain window and a spectral window tau(omega) as a window in the frequency domain. We choose for mixed spectra to define spectral estimates in the form (4.2) where we assume that the width of the spectral window associated with lambda(k) is so small that the spectrum is small over the window. We can also write for gamma_11(k) (4.3) Another expression of (4.2) is (4.4) where (4.5) with (4.6) and (4.7) The tau(omega-y) is the spectral window. In changing omega we move the slit along through the entire frequency range, with a bandwidth +-2 pi/N about y = omega. Jenkins (1961) gives several physical definitions of bandwidth which approximate a rectangular band pass window. In general we will replace the autocovariances gamma_11(k) with the autocorrelations rho_11(k) so that (4.8) For the choice of moving weights lambda_k, we can choose from a number of cosine-like series, all nearly the same. We have used (4.9) This weight gives variance f(omega)~ 3m/4n, thus making m small increases the bandwidth, hence decreasing the variance for this choice of lambda_12. This operation can be regarded as a filtering process using a rectangular-like bandpass with small side lobes in order to increase the accuracy of the spectral density f(omega). Unless one is faced with peaky or a steeply slanted spectrum, the exact shape of the window is not too critical. This window (4.9) and others like it tend to a normal distribution, hence its transform is the same as itself. It retains its shape in both time and frequency domains. For his particular window, the eta(omega-y) in (4.1) is given by Jenkins (1961) as (4.10) Most windows are similar to those generated from the window lambda_12 = 1, giving a shape similar to the probability density function of a rectangular random variable. This particular window (4.9) has both small bandwidths and small side lobes, cutting the leakage of spectral power from frequencies distant from y = omega, which would distort the true picture at y = omega_0. It must be emphasized that there are a number of spectral windows, each with different bandwidths. We shall see later that filters similar to lambda_12 are used to suppress certain undesirable features in the spectrum such as power leakage. More important than the choice of window shape is the choice of the total number of lag k steps, which we call m. The nature of the analysis attempted here is to use a fixed run of observations and a fixed sampling interval Delta t. Priestley (1962) has attempted to establish quantitative expressions for designing an optimum spectral estimate, involving the relation of variance sigma^2 and bias b(omega). As we have mentioned earlier, one must first fix the form of the weight sequence and the number of lags m. A not always realizable suggestion by Priestley (1962) is to have the bandwidth Bomega not greater than the width of the narrowest peak of f(omega), more specifically Bomega is the distance between the `half-power' points in the main lobe. Blackman and Tukey {1958) discuss an optimal design in terms of sampling errors of f(omega) assuming a chi-squared distribution with nu degrees, of freedom, including a measure of f(omega) in terms of variance, but not bias b(omega). We shall discuss these points later when considering the errors inherent in any estimate of f(omega). These considerations of sigma^2 and b(omega) are not strictly applicable to this particular type of power spectrum of light curves analysis of light curves of irregular variable stars taken from the astronomy literature. Unless one studies only short period fluctuations of variable stars, or has several observatories in the world keeping constant watch on certain stars, one's sampling rate will always be disturbed by the day-night variation. The weather will also confound the most diligent observer, and tend to frustrate any decrease of Delta t even for very long records of light curves. Jenkins (1965) suggests an empirical method which has been followed in this work, and which will be illustrated by a few examples and diagrams. Those who try to determine an optimal m always find that the answers depend on the spectral density f(omega). We therefore choose an m small (or large bandwidth). We then increase m and decrease the bandwidth improving the resolution, stopping when we are satisfied about the detail of the spectrum (following the suggestion of Priestley). The matter may be judged in the light of the consideration that the variance tends to increase with m; we must in fact balance three considerations: accuracy of spectral estimate (the variance) against the number of lags m, i.e. the resolution of fine structure in the spectrum. The ideal method to increase resolution at low frequencies which would amount to an increase in the total length of data, hence a range of possible periods of up to 1,000 days. Along with this increase in n we could safely increase m and still retain a high level of confidence in our statistical estimates of the spectra. The diurnal variation of the Earth, infrequent bad weather and the new Moon puts a very real limitation on an accurate f(omega), without an enormous increase in time and effort of the observational astronomers. The irregular nature of the observations that make up a light curve is dealt with by a subroutine of BOMM that interpolates the time series, using second differences, at regular intervals Delta t over the range of observations that one chooses. In pilot studies, we usually choose several values of Delta t. We are led finally to conclude this section with the specific formulation of the spectral density as computed by a BOMM subroutine. The actual cosine transformation of the autocorrelation is written: - (4.11) where Epsilon_1 = 1/2 if i = 0, Epsilon_1 = 1 otherwise; j = 0, 1, ... , 1', 1' is the greatest integer such that omega' + l'/ 2 Delta t <= omega_N where omega' is the low frequency limit, usually 0 and omega_N is the Nyquist frequency. Jenkins (1961) observes that the logarithmic transformation of the spectral density f(omega) produces a distribution whose variance is nearly independent of the frequency. We see that (4.12) where beta_i depends on the window used for the spectral estimate and n is the total number of observations. If we then write an expression independent of the frequency (4.13) Log_10 f(omega) produces estimates that do not rely so heavily upon normality, in fact it produces distributions closer to normality. It is assumed that the errors here are produced by the variance and not by leakage of power from other frequencies or by aliasing. It is worth noting that this leads us easily to the method engineers use for estimating spectral power since (4.14) We see that an increase of power by a factor of 2 is equivalent of a change of 3 decibels. V. ESTIMATION OF MIXED CROSS SPECTRA Often we need either to compare two empirical time series X_1(t) and X_2(t) that arise in similar fashions. To do this, we shall show how one estimates the cross spectrum. We shall show now that if X_1(t) and X_2(t) do not have similar origins, then one can estimate the gain of a system, and perhaps the coherent energy. Let one of the time series X_1(t) be a theoretical complex input and X_2(t) the actual recording of the light curve. We can write (5.1) where n(t) is the noise term which arises because other input variables are not well controlled. If the fluctuations in X_1(t) are not too large, then we linearize the above equation to: (5.2) where n(t) now contains quadratic and higher terms. This relationship is a linear dynamic equation, the regression or impulse response omega(u) can be estimated in several ways, by changing X_1(t) by a well defined step pattern or sinusoidally. In the case of the latter we let X_1(t) = delta cos omega t and (5.3) then the transfer function becomes (5.4) T(i omega) is the transfer function, G(omega) the gain and Phi(omega) the phase shift. If n(t) is small, the gain G(omega) may be obtained from the ratio of the amplitudes of the output and input, and Phi(omega) by matching up the two waves. In this case we used a sine wave generator to actuate X_1(t) and sweep out a range of frequencies to generate all the information. We shall now estimate omega(u) from the existing noisy fluctuations in X_1(t) and X_2(t). These two series are now given to be stationary time series, and X_1(t) and n(t) are uncorrelated, so we can write the crosscovariance by multiplying (5.2) by X_1(t - k) and averaging (5.5) where gamma_12(k) is the crosscovariance function of lag k between X_1(t) and X_2(t), and gamma_11(k-u) is the autocovariance function. If gamma_12(k) and gamma_11(k-u) are known, then omega(u) may be estimated by solving (5.5), usually called the Weiner-Hopf equation. If we calculate the Fourier transform of both sides, T(omega) is then (5.6) where f_12(omega) is the cross spectral density function, and sigma_1 and sigma_2 are the standard deviations of X_1(t) and X_2(t). The spectral density f_12(omega) is given by (5.7) We have shown that the frequency response tau(omega) is given by the ratio of the cross spectral density and the input auto spectrum. We now write gamma_12(k) as: (5.8) We have already defined gamma_11(k) and f_11(omega) in an earlier section of the paper. Since gamma_12(k) unequal gamma_21(k) in general, then (5.7) gives both a cosine and sine transform of the crosscorrelation (5.9) (5.10) where (5.11) (5.12) and (5.13) In this case W_12(omega) is the quadrature or out-of-phase spectrum, and Z_12(omega) is called the co- or in-phase spectrum. Using (5.6) and (5.8) we can write the gain G(omega) and phase Phi(omega) as (5.14) (5.15) Tan Phi_i(omega) here is the phase difference. This formulation is due to Munk, et al. (1959). If we are examining the covariance between two stationary time series X_1(t) and X_2(t) and not as input and output of some linear system, one usually plots the coherence R^2(omega), which is normalized to run between 0 and 1 so that (5.16) The coherency R^2(omega) is analogous to the correlation coefficient calculated at each frequency omega. The spectrum of the noise term n(t) in (5.2) is now (5.17) which is analogous to the residual sum of squares in linear regression. (5.18) We see if R^2(omega) is large, the noise spectral density is small relative to the output spectral density and vice versa. We can also write R^2(omega) as (5.19) If G(omega) is specified, the coherency is large if the signal-to-noise ratio is large, that is S(omega)/N(omega) (5.20) Munk, Snodgrass and Tucker (1959) point out that, if both X_1(t) and X_2(t) are identical and simultaneous, Z_12(omega) = +1, W_12(omega) = 0. If they are identical but there is a time lag in one record corresponding to a phase difference Phi(omega), then the coherence is +1. Jenkins (1965) points out that positive cross correlations will result in a high frequency cross amplitude spectrum with most of its power at low frequencies; negative cross correlations will results in a high-frequency cross amplitude spectrum. We follow Jenkins (1963, 1965) to choose w(n) in (5.2) so that the integrated squared error is minimized, and that (5.21) is small compared to some Epsilon (5.21) and (5.22) Here the ensemble cross covariances gamma_12(k) are replaced by the sample cross covariances Z_12(k). Since X_1(t) and X_2(t) are not phase shifted sine waves, but stationary time series, we must introduce a weight function lambda(k) as in the univariate spectral analysis. We then write the estimates for the co-spectra, quadrature and auto-spectra as (5.23) (5.24) (5.25) where alpha_12(k) and beta_12(k) are estimates of alpha_12 and beta_12 as defined earlier in (5.12) and (5.13). It has been pointed out by Jenkins (1965) that coherencies may sometimes be low because of the influences of the individual autospectra. One can take two independent time series and produce large coherence between them arising from peaks in the individual spectra or if the bandwidth is too small. However, we can calculate the autospectra first, then pre-filter each series to make their spectra move nearly uniform or white. This leads to a large increase in accuracy in the estimation of coherency. If one likes to think of an input function into the atmosphere o£ a star as some type of periodic wave, then following Munk and Cartwright (1966) we can separate the estimate of the energy into two parts, namely (5.26) and (5.27) The latter will be effectively noise energy, but may contain energy coherent with other input functions. VI. FILTERS Once the basic spectrum has been created and stabilized for a fixed m and Delta t it becomes important to examine further the processes that tend to distort the true nature of f(omega). We shall then show it is possible to construct a digital filter U_k so that convolution upon the data in the time domain will eliminate unwanted spectral energy at some specified frequency. If we generate a series which we shall term our filter, and given our light curve X_1(t), we then create a new series W_j(t) by the process described. We can write W_j(t) as (6.1) where p' being the number of terms by which the weight factors are advanced for unit increase in 1/(l' + 1) is a normalization factor. If we want weights that are symmetrical, then W_j assumes the following form (6.2) where We can construct also a frequency response function T(omega) to accept or reject any range of frequencies so that (6.3) The BOMM programme generates these series of moving weights either for use as lag windows or as filters. We have the choice of symmetrical filters or those fading to the left or right. If we want a high-pass or low-pass filter, then for some frequency f, the terms are multiplied by a factor which gives a filter whose transmission is 6 db (1/2 amplitude). Thus: -- (6.4) (6.5) If we want a low pass filter to be convolved in equation (6.1), then we write (6.6) and for a high pass filter (6.7) where Epsilon_j = 1/2 for j = 0 and Epsilon_j = 1 otherwise; delta(j) = 1 for j = 0 and 0 otherwise. As we stated earlier, the filter type is determined by the sharpness of cut-off and by the side band characteristics. The amplitude response of any filter can be obtained taking its cosine transform. It may be worthwhile explaining some of the conditions which necessitate the use of filters, such as the removal of trends, leakage of power from high frequencies due to aliasing. Often non-stationarity of time series may be due to the presence of a linear trend. In general, one can use a high-pass digital filter to suppress low frequency trends and then go on to perform a spectral analysis of the residual series. Another use of filters in time series analysis is where one suspects that a series consists of a mixture of stationary series, i.e. differing frequency bands may have different physical origins. In such a case, we can construct a series of digital band pass filters to separate the original spectrum into several distinct spectra. A more complete discussion of the use of filters of all types will come in the next section with a discussion of the errors involved in calculating the spectral density. VII. ERRORS IN SPECTRAL DENSITY ESTIMATES Munk, Snodgrass and Tucker (1959) have given several quantitative estimates of errors in spectral density. We shall deal with those errors that are due to random errors, the properties of the spectral window lambda(k), leakage of power due to aliasing, and give some treatment of systematic errors. Within the range of this examination is the problem of detecting signals in the presence of noise, i.e. estimation of the range of error allowable for any peak that appears in the mixed spectra. Parzen (1961) gives some design relations in terms of the signal to noise estimate, i.e. (7.1) We see the SNR[f(omega)] is the mean divided by the standard deviation (or the reciprocal) of the coefficient of variation of the random variable f(omega). If f(omega) is normally distributed and we seek to obtain a 95% confidence level for our spectral estimates, then we can estimate SNR[f(omega)] in terms of equivalent degrees of freedom. If X_1(t) is a random variable (i.e. Gaussian noise) which has a chi-squared distribution with nu degrees of freedom, then (7.2) and (7.3) If X_1(t) is a positive random variable, then nu is just the equivalent degree of freedom. Several authors put forward their views on design relations to maximize SNR[f(omega)] under various circumstances (Parzen, 1961; Priestley, 1962; 1965), but we shall have to outline this in more detail later because of the peculiar nature of the light curves of T Tauri stars and the fixed amount and nature of the data available. The assumption of a chi-squared distribution is approximately correct where f(omega) can be expressed as a weighted sum of chi^2-variables. It seems a similar method of treatment is to sum the squares of amplitudes of the components of a Fourier series. (7.4) where n = 0, 1, 2 ... , and the energy density varies slowly with the frequency separation 1/T of the harmonics. Then p_n and q_n, are normal and p_n^2 = q_n^2 = true energy density/T, i.e. (7.5) Thus for p harmonics, the number of degrees of freedom nu = 2p, and from thee chi-squared distribution "confidence limits" can be calculated. This is because the chi-squared distribution gives (7.6) The effective number of degrees of a record is approximately nu. These ideas can be used with tables given in Blackman and Tukey (1957) to calculate the random errors based on the assumptions above. Thus (7.7) It seems likely that one must consider both the bias and variance when carrying out practical estimates of the spectral density f(omega), but it is difficult to formulate quantitatively any practical numerical formulae due to the presence in f(omega) of non-random statistical errors, i.e. side band leakage for which f(omega) differs with the type of spectral window chosen, aliasing. The problems of power leakage from peaks at frequencies > omega_N confuse the estimate at neighbouring frequencies omega_N +- Delta omega. Parzen (1961) and Priestley (1962) both use a formula (7.8) to give an estimate at a particular frequency that does not have more than a 10% proportional error at the 95% confidence level. In this matter the variance is defined by (7.9) and the bias is given as (7.10) where lambda(theta-omega) is the weight function. Implementation of these formulae for eta(omega) and b(omega) rest critically on ones choice of the width of the weight function lambda(k). In fact since both n and b are fixed for the light curves, it is possible to calculate optimum lags m for a given spectra, if we accept the mean square error at frequency omega as (7.11) But it cannot be denied that other sources of error creep in as we have mentioned earlier, and the chi-squared error estimates only account for one source of error, those of a random nature. The difficulty is that both eta(omega) and b(omega) are asymptotically proportional to f(omega). The method we have used is to vary the resolution until increasing m any further does not reveal any more features in the spectra, while at the same time attempting to prewhiten the spectra as much as possible, as suggested by Blackman and Tukey (1957). This is done during Stage II of the estimation process, which is mostly exploratory. Pre-whitening is useful in eliminating the systematic errors that occur due to the curvature of f(omega). If f(omega) is flat, or varies linearly with frequency, then epsilon [f(omega)] = f(omega). If not, then spectral energy diffuses down toward lower spectral levels (Munk et al., 1959) and one must worry about leakage of power from higher frequencies to lower frequencies. We shall now discuss the question of leakage in the estimates of spectral density. The window we choose lambda(k) is usually some kind of polynomial which vanishes at a finite number of points. Its main lobe centering on a band of frequencies over which we wish to estimate f(omega) is inevitably accompanied by side bands or minor lobes, which allow leakage from the part of the spectrum outside the desired band to affect our estimate inside the band. This can be cured by taking two estimates, one with the side lobes positive, the other negative. One of the more critical problems recognised by Tukey is aliasing; or the problem of sampling at discrete intervals Delta t. It is equivalent to multiplying the record with spikes with a frequency 1 / Delta t, so that sum and difference frequencies are produced. An analysis for energy density at any frequency f will also include the densities near frequencies: 1 / Delta t - f_0, 1 / Delta t + f_0, 2 / Delta t + f_0, ...etc One of the major uses of filters comes in dealing with leakage of powers associated with aliasing caused by the loss of information at frequencies above the Nyquist frequency, namely when omega_N = pi / Delta t, or f = 1/2 Delta t cycles per day. Fig. 1 Fig. 2 Since there is no information available about X_1(t) between the data points, there is no means of estimating the amplitude of frequencies higher than the Nyquist frequency. In fact at frequency f_N we are unable to estimate, since at the Nyquist frequency, f(omega_N) is confused with all the frequencies that are indistinguishable from omega_N. If f*(omega) is the spectral density corresponding to X(t), then the spectral density of the sampled trace is (7.12) We see the sampled spectrum is obtained by folding the unsampled spectrum about even multiples (2 pi k)/(Delta t) of f and adding this contribution in the range (0, omega_N) Thus to be able to measure f*(omega) in (0, omega_N) we must hope that that f(omega)~~0 for omega > omega_N In our case it is certain that there is a need to use a low pass filter to cut any peaks that may occur near f_N. The details of this process and the practical results of it will be shown when we present the details of the power spectra for the particular T Tauri stars. The use of different Delta t's in calculating pilot spectra will tell us if we are in danger of confusing f(omega) due to higher energies at higher frequencies. REFERENCES Blackman, R. B. and Tukey, J. W., 1958, The Measurement of Power Spectra. Dover Publications. Bullard, E. C. et al, 1966, A User's Guide to BOMM. A Series of Programs for the Analysis of Time Series. Institute of Geophysics and Planetary Physics, La Jolla. Herbig, G. H., 1962, Adv. Astr. Astrophys. 1, 47. Jenkins, G. M., 1961, Technometrics, 3, 167. Jenkins, G. M., 1963, Technometrics, 5, 227. Jenkins, G. M., 1965, Applied Statistics, 14, 205. Munk, W. H., Snodgrass, F. E. and Tucker, M. J., 1959, Bull. Scripps Inst. Oceanography, 7, 283. Munk, W. H. and Cartwright, D. E., 1966, Phil. Trans. R. Soc. London, A 259, 533. Parzen, E., 1961, Technometrics, 3, 167. Priestley, M. B., 1962, Technometrics, 4, 551. Priestley, M. B., 1965, Applied Statistics, 14, 33. Tukey, J. W., 1961, Technometrics, 3, 191. Weiner, N., 1967, Time Series, Dover Publications. DISCUSSION Detre: How is it possible, that the analyses of the same observational material, by the same method, executed by two different teams - as I mentioned in my introductory report - lead to opposite results for mu Cephei? Plagemann: I have not read these contributions, but I shall study them soon. Herbig: Does your technique recover the periods of a few days found by Hoffmeister when you analyze the observations of southern RW Aurigae-type variables made by him ? Plagemann: I have not yet analyzed his data, although the material is already on computer cards. It will be done when I return to Cambridge. Penston: Do I understand that you find Kinman's 80 day period for 3C 345 is not statistically significant? Plagemann: That is correct -- my preliminary results do not show his 80d period.