timeseries.surrogates¶

Provides classes for analyzing spatially embedded complex networks, handling multivariate data and generating time series surrogates.

class pyunicorn.timeseries.surrogates.Surrogates(original_data, silence_level=1)[source]¶

Bases: object

Encapsulates structures and methods related to surrogate time series.

Provides data structures and methods to generate surrogate data sets from a set of time series and to evaluate the significance of various correlation measures using these surrogates.

More information on time series surrogates can be found in [Schreiber2000] and [Kantz2006].

AAFT_surrogates(original_data)[source]¶

Return surrogates using the amplitude adjusted Fourier transform method.

Reference: [Schreiber2000]

Parameters:: original_data (2D array [index, time]) – The original time series.
Return type:: 2D array [index, time]
Returns:: The surrogate time series.

static SmallTestData()[source]¶

Return Surrogates instance representing test a data set of 6 time series.

Return type:: Surrogates instance
Returns:: a Surrogates instance for testing purposes.

__init__(original_data, silence_level=1)[source]¶

Initialize an instance of Surrogates.

Note

The order of array dimensions is different from the standard of core. Here it is [index, time] for reasons of computational speed!

Parameters:

original_data (2D array [index, time]) – The original time series for surrogate generation.
silence_level (int) – The inverse level of verbosity of the object.

__str__()[source]¶: Returns a string representation.

__weakref__¶: list of weak references to the object

clear_cache()[source]¶: Clean up cache.

correlated_noise_surrogates(original_data)[source]¶

Return Fourier surrogates.

Generate surrogates by Fourier transforming the original_data time series (assumed to be real valued), randomizing the phases and then applying an inverse Fourier transform. Correlated noise surrogates share their power spectrum and autocorrelation function with the original_data time series.

The Fast Fourier transforms of all time series are cached to facilitate a faster generation of several surrogates for each time series. Hence, clear_cache() has to be called before generating surrogates from a different set of time series!

Note

The amplitudes are not adjusted here, i.e., the individual amplitude distributions are not conserved!

Examples:

The power spectrum is conserved up to small numerical deviations:

>>> ts = Surrogates.SmallTestData().original_data
>>> surrogates = Surrogates.                SmallTestData().correlated_noise_surrogates(ts)
>>> all(r(np.abs(np.fft.fft(ts,         axis=1))[0,1:10]) ==                 r(np.abs(np.fft.fft(surrogates, axis=1))[0,1:10]))
True

However, the time series amplitude distributions differ:

>>> all(np.histogram(ts[0,:])[0] == np.histogram(surrogates[0,:])[0])
False

Parameters:: original_data (2D array [index, time]) – The original time series.
Return type:: 2D array [index, time]
Returns:: The surrogate time series.

embed_time_series_array(time_series_array, dimension, delay)[source]¶

Return a delay embedding of all time series.

Note

Only works for scalar time series!

Example:

>>> ts = Surrogates.SmallTestData().original_data
>>> Surrogates.SmallTestData().embed_time_series_array(
...     time_series_array=ts, dimension=3, delay=2)[0,:6,:]
array([[ 0.        ,  0.61464833,  1.14988147],
       [ 0.31244015,  0.89680225,  1.3660254 ],
       [ 0.61464833,  1.14988147,  1.53884177],
       [ 0.89680225,  1.3660254 ,  1.6636525 ],
       [ 1.14988147,  1.53884177,  1.73766672],
       [ 1.3660254 ,  1.6636525 ,  1.76007351]])

Parameters:

time_series_array (2D array [index, time]) – The time series array to be normalized.
dimension (int) – The embedding dimension.
delay (int) – The embedding delay.

Return type:

3D array [index, time, dimension]

Returns:

the embedded time series.

static eval_fast_code(function, original_data, surrogates)[source]¶

Evaluate performance of fast and slow versions of algorithms.

Designed for evaluating fast and dirty C code against cleaner code using Blitz arrays. Does some profiling and returns the total error between the results.

Parameters:

function (Python function) – The function to be evaluated.
original_data (2D array [index, time]) – The original time series.
surrogates (2D array [index, time]) – The surrogate time series.

Return float:

The total squared difference between resulting matrices.

static normalize_time_series_array(time_series_array)[source]¶

Normalize an array of time series to zero mean and unit variance individually for each individual time series.

Modifies the given array in place!

Examples:

>>> ts = Surrogates.SmallTestData().original_data
>>> Surrogates.SmallTestData().normalize_time_series_array(ts)
>>> r(ts.mean(axis=1))
array([ 0., 0., 0., 0., 0., 0.])
>>> r(ts.std(axis=1))
array([ 1., 1., 1., 1., 1., 1.])

Parameters:: time_series_array (2D array [index, time]) – The time series array to be normalized.

original_data¶: The original time series for surrogate generation.

original_distribution(test_function, original_data, n_bins=100)[source]¶

Return a normalized histogram of a similarity measure matrix.

The absolute value of the similarity measure is used, since only the degree of similarity was of interest originally.

Parameters:

test_function (Python function) – The function implementing the similarity measure.
original_data (2D array [index, time]) – The original time series.
n_bins (int) – The number of bins for estimating prob. distributions.

Return type:

tuple of 1D arrays ([bins],[bins])

Returns:

the similarity measure histogram and lower bin boundaries.

recurrence_plot(embedding, threshold)[source]¶

Return the recurrence plot from an embedding of a time series.

Uses supremum norm.

Parameters:

embedding (2D array [time, dimension]) – The embedded time series.
threshold (float) – The recurrence threshold.

Return type:

2D array [time, time]

Returns:

the recurrence matrix.

refined_AAFT_surrogates(original_data, n_iterations, output='true_amplitudes')[source]¶

Return surrogates using the iteratively refined amplitude adjusted Fourier transform method.

A set of AAFT surrogates (AAFT_surrogates()) is iteratively refined to produce a closer match of both amplitude distribution and power spectrum of surrogate and original data.

Reference: [Schreiber2000]

Parameters:

original_data (2D array [index, time]) – The original time series.
n_iterations (int) – Number of iterations / refinement steps
output (str) – Type of surrogate to return. “true_amplitudes”: surrogates with correct amplitude distribution, “true_spectrum”: surrogates with correct power spectrum, “both”: return both outputs of the algorithm.

Return type:

2D array [index, time]

Returns:

The surrogate time series.

silence_level¶: (string) - The inverse level of verbosity of the object.

static test_mutual_information(original_data, surrogates, n_bins=32)[source]¶

Return a test matrix of mutual information (zero lag).

The test matrix’s entry \((i,j)\) contains the mutual information between original time series i and surrogate time series j at zero lag. The resulting matrix is useful for significance tests based on the mutual information matrix of the original data.

Note

Assumes, that original_data and surrogates are already normalized.

Parameters:

original_data (2D array [index, time]) – The original time series.
surrogates (2D Numpy array [index, time]) – The surrogate time series.
n_bins (int) – Number of bins for estimating prob. distributions.

Return type:

2D array [index, index]

Returns:

the mutual information test matrix.

static test_pearson_correlation(original_data, surrogates)[source]¶

Return a test matrix of the Pearson correlation coefficient (zero lag).

The test matrix’s entry \((i,j)\) contains the Pearson correlation coefficient between original time series i and surrogate time series j at lag zero. The resulting matrix is useful for significance tests based on the Pearson correlation matrix of the original data.

Note

Assumes, that original_data and surrogates are already normalized.

Parameters:

original_data (2D array [index, time]) – The original time series.
surrogates (2D array [index, time]) – The surrogate time series.

Return type:

2D array [index, index]

Returns:

the Pearson correlation test matrix.

test_threshold_significance(surrogate_function, test_function, realizations=1, n_bins=100, interval=(-1, 1))[source]¶

Return a test distribution for a similarity measure.

Perform a significance test on the values of a correlation measure based on original_data time series and surrogate data. Returns a density estimate (histogram) of the absolute value of the correlation measure over all realizations.

The resulting distribution of the values of similarity measure from original and surrogate time series is of use for testing the statistical significance of a selected threshold value for climate network generation.

Parameters:

surrogate_function (Python function) – The function implementing the surrogates.
test_function (Python function) – The function implementing the similarity measure.
realizations (int) – The number of surrogates to be created for each time series.
n_bins (int) – The number of bins for estimating probability distribution of test similarity measure.
interval ((float, float)) – The range over which to estimate similarity measure distribution.

Return type:

tuple of 1D arrays ([bins],[bins])

Returns:

similarity measure test histogram and lower bin boundaries.

twin_surrogates(original_data, dimension, delay, threshold, min_dist=7)[source]¶

Return surrogates using the twin surrogate method.

Scalar twin surrogates are created by isolating the first component (dimension) of the twin surrogate trajectories.

Twin surrogates share linear and nonlinear properties with the original time series, since they correspond to realizations of trajectories of the same dynamical systems with different initial conditions.

References: [Thiel2006] [*], [Marwan2007].

The twin lists of all time series are cached to facilitate a faster generation of several surrogates for each time series. Hence, clear_cache() has to be called before generating twin surrogates from a different set of time series!

Parameters:

original_data (2D array [index, time]) – The original time series.
dimension (int) – The embedding dimension.
delay (int) – The embedding delay.
threshold (float) – The recurrence threshold.
min_dist (number) – The minimum temporal distance for twins.

Return type:

2D array [index, time]

Returns:

the twin surrogates.

twins(embedding_array, threshold, min_dist=7)[source]¶

Return list of the twins of each state vector for all time series.

Two state vectors are said to be twins if they share the same recurrences, i.e., if the corresponding rows or columns in the recurrence plot are identical.

References: [Thiel2006], [Marwan2007].

Parameters:

embedding_array (3D array [index, time, dimension]) – The embedded time series array.
threshold (float) – The recurrence threshold.
min_dist (number) – The minimum temporal distance for twins.

Return type:

[[number]]

Returns:

the list of twins for each state vector in the time series.

white_noise_surrogates(original_data)[source]¶

Return a shuffled copy of a time series array.

Each time series is shuffled individually. The surrogates correspond to realizations of white noise consistent with the original_data time series’ amplitude distribution.

Example (Distributions of white noise surrogates should the same as for the original data):

>>> ts = Surrogates.SmallTestData().original_data
>>> surrogates = Surrogates.                SmallTestData().white_noise_surrogates(ts)
>>> np.allclose(np.histogram(ts[0,:])[0],
...             np.histogram(surrogates[0,:])[0])
True

Parameters:: original_data (2D array [index, time]) – The original time series.
Return type:: 2D array [index, time]
Returns:: The surrogate time series.