timeseries.surrogates¶

Provides classes for analyzing spatially embedded complex networks, handling multivariate data and generating time series surrogates.

class pyunicorn.timeseries.surrogates.Surrogates(original_data, silence_level=1)[source]¶

Bases: Cached

Encapsulates structures and methods related to surrogate time series.

Provides data structures and methods to generate surrogate data sets from a set of time series and to evaluate the significance of various correlation measures using these surrogates.

More information on time series surrogates can be found in [Schreiber2000] and [Kantz2006].

AAFT_surrogates()[source]¶

Return surrogates using the amplitude adjusted Fourier transform method.

Reference: [Schreiber2000]

Return type:: 2D array [index, time]
Returns:: The surrogate time series.

static SmallTestData()[source]¶

Return Surrogates instance representing test a data set of 6 time series.

Return type:: Surrogates instance
Returns:: a Surrogates instance for testing purposes.

__cache_state__() → Tuple[Hashable, ...][source]¶

Hashable tuple of mutable object attributes, which will determine the instance identity for ALL cached method lookups in this class, in addition to the built-in object id(). Returning an empty tuple amounts to declaring the object immutable in general. Mutable dependencies that are specific to a method should instead be declared via @Cached.method(attrs=(…)).

NOTE: A subclass is responsible for the consistency and cost of this state descriptor. For example, hashing a large array attribute may be circumvented by declaring it as a property, with a custom setter method that increments a dedicated mutation counter.

__init__(original_data, silence_level=1)[source]¶

Initialize an instance of Surrogates.

Note

The order of array dimensions is different from the standard of core. Here it is [index, time] for reasons of computational speed!

Parameters:

original_data (2D array [index, time]) – The original time series for surrogate generation.
silence_level (int) – The inverse level of verbosity of the object.

__str__()[source]¶: Returns a string representation.

_embedding¶: The embedded times series

correlated_noise_surrogates()[source]¶

Return Fourier surrogates.

Generate surrogates by Fourier transforming the original_data time series (assumed to be real valued), randomizing the phases and then applying an inverse Fourier transform. Correlated noise surrogates share their power spectrum and autocorrelation function with the original_data time series.

Note

The amplitudes are not adjusted here, i.e., the individual amplitude distributions are not conserved!

Examples:

The power spectrum is conserved up to small numerical deviations:

>>> ts = Surrogates.SmallTestData()
>>> surrogates = ts.correlated_noise_surrogates()
>>> all(np.abs(np.fft.fft(
...         ts.original_data, axis=1))[0,1:10]).round(4) ==
...     np.abs(np.fft.fft(
...         surrogates,       axis=1))[0,1:10]).round(4))
True

However, the time series amplitude distributions differ:

>>> all(np.histogram(ts[0,:])[0] == np.histogram(surrogates[0,:])[0])
False

Parameters:: original_data (2D array [index, time]) – The original time series.
Return type:: 2D array [index, time]
Returns:: The surrogate time series.

static embed_time_series_array(time_series_array, dimension, delay, silence_level=1)[source]¶

Return a delay embedding of all time series.

Example:

>>> ts = Surrogates.SmallTestData().original_data
>>> Surrogates.embed_time_series_array(
...     time_series_array=ts, dimension=3, delay=2)[0,:6,:]
array([[ 0.        ,  0.61464833,  1.14988147],
       [ 0.31244015,  0.89680225,  1.3660254 ],
       [ 0.61464833,  1.14988147,  1.53884177],
       [ 0.89680225,  1.3660254 ,  1.6636525 ],
       [ 1.14988147,  1.53884177,  1.73766672],
       [ 1.3660254 ,  1.6636525 ,  1.76007351]])

Parameters:

time_series_array (2D array [index, time]) – The time series array to be normalized.
dimension (int) – The embedding dimension.
delay (int) – The embedding delay.

Return type:

3D array [index, time, dimension]

Returns:

the embedded time series.

property embedding: ndarray¶: The embedded time series / phase space trajectory (time, embedding dimension).

static eval_fast_code(function, original_data, surrogates)[source]¶

Evaluate performance of fast and slow versions of algorithms.

Designed for evaluating fast and dirty C code against cleaner code using Blitz arrays. Does some profiling and returns the total error between the results.

Parameters:

function (Python function) – The function to be evaluated.
original_data (2D array [index, time]) – The original time series.
surrogates (2D array [index, time]) – The surrogate time series.

Return float:

The total squared difference between resulting matrices.

normalize_original_data()[source]¶

Normalize the original data to zero mean and unit variance individually for each individual time series.

Examples:

>>> ts = Surrogates.SmallTestData()
>>> ts.normalize_original_data()
>>> r(ts.original_data.mean(axis=1))
array([ 0., 0., 0., 0., 0., 0.])
>>> r(ts.original_data.std(axis=1))
array([ 1., 1., 1., 1., 1., 1.])

original_data¶: The original time series for surrogate generation.

original_data_fft()[source]¶

Return one-dimensional discrete Fourier Transform via numpy.fft.rfft()

Return type:: 2D array [index, frequency]
Returns:: The original time series’ FFT.

original_distribution(test_function, n_bins=100)[source]¶

Return a normalized histogram of a similarity measure matrix.

The absolute value of the similarity measure is used, since only the degree of similarity was of interest originally.

Parameters:

test_function (Python function) – The function implementing the similarity measure.
n_bins (int) – The number of bins for estimating prob. distributions.

Return type:

tuple of 1D arrays ([bins],[bins])

Returns:

the similarity measure histogram and lower bin boundaries.

static recurrence_plot(embedding, threshold, silence_level=1)[source]¶

Return the recurrence plot from an embedding of a time series.

Uses supremum norm.

Example:

>>> ts = Surrogates.SmallTestData().original_data
>>> embedding = Surrogates.         ...     embed_time_series_array(ts, dimension=3, delay=2)
>>> Surrogates.recurrence_plot(embedding[0], threshold=.8)[:5, :5]
array([[1, 1, 0, 0, 0],
       [1, 1, 1, 0, 0],
       [0, 1, 1, 1, 0],
       [0, 0, 1, 1, 1],
       [0, 0, 0, 1, 1]], dtype=int8)

Parameters:

embedding (2D array [time, dimension]) – The embedded time series.
threshold (float) – The recurrence threshold.

Return type:

2D array [time, time]

Returns:

the recurrence matrix.

refined_AAFT_surrogates(n_iterations, output='true_amplitudes')[source]¶

Return surrogates using the iteratively refined amplitude adjusted Fourier transform method.

A set of AAFT surrogates (AAFT_surrogates()) is iteratively refined to produce a closer match of both amplitude distribution and power spectrum of surrogate and original data.

Reference: [Schreiber2000]

Parameters:

n_iterations (int) – Number of iterations / refinement steps
output (str) – Type of surrogate to return. “true_amplitudes”: surrogates with correct amplitude distribution, “true_spectrum”: surrogates with correct power spectrum, “both”: return both outputs of the algorithm.

Return type:

2D array [index, time]

Returns:

The surrogate time series.

silence_level¶: (string) - The inverse level of verbosity of the object.

static test_mutual_information(original_data, surrogates, n_bins=32)[source]¶

Return a test matrix of mutual information (zero lag).

The test matrix’s entry \((i,j)\) contains the mutual information between original time series i and surrogate time series j at zero lag. The resulting matrix is useful for significance tests based on the mutual information matrix of the original data.

Note

Assumes, that original_data and surrogates are already normalized.

Parameters:

original_data (2D array [index, time]) – The original time series.
surrogates (2D Numpy array [index, time]) – The surrogate time series.
n_bins (int) – Number of bins for estimating prob. distributions.

Return type:

2D array [index, index]

Returns:

the mutual information test matrix.

static test_pearson_correlation(original_data, surrogates)[source]¶

Return a test matrix of the Pearson correlation coefficient (zero lag).

The test matrix’s entry \((i,j)\) contains the Pearson correlation coefficient between original time series i and surrogate time series j at lag zero. The resulting matrix is useful for significance tests based on the Pearson correlation matrix of the original data.

Note

Assumes, that original_data and surrogates are already normalized.

Parameters:

original_data (2D array [index, time]) – The original time series.
surrogates (2D array [index, time]) – The surrogate time series.

Return type:

2D array [index, index]

Returns:

the Pearson correlation test matrix.

test_threshold_significance(surrogate_function, test_function, realizations=1, n_bins=100, interval=(-1, 1))[source]¶

Return a test distribution for a similarity measure.

Perform a significance test on the values of a correlation measure based on original_data time series and surrogate data. Returns a density estimate (histogram) of the absolute value of the correlation measure over all realizations.

The resulting distribution of the values of similarity measure from original and surrogate time series is of use for testing the statistical significance of a selected threshold value for climate network generation.

Parameters:

surrogate_function (Python function) – The function implementing the surrogates.
test_function (Python function) – The function implementing the similarity measure.
realizations (int) – The number of surrogates to be created for each time series.
n_bins (int) – The number of bins for estimating probability distribution of test similarity measure.
interval ((float, float)) – The range over which to estimate similarity measure distribution.

Return type:

tuple of 1D arrays ([bins],[bins])

Returns:

similarity measure test histogram and lower bin boundaries.

twin_surrogates(dimension, delay, threshold, min_dist=7)[source]¶

Return surrogates using the twin surrogate method.

Scalar twin surrogates are created by isolating the first component (dimension) of the twin surrogate trajectories.

Twin surrogates share linear and nonlinear properties with the original time series, since they correspond to realizations of trajectories of the same dynamical systems with different initial conditions.

References: [Thiel2006] [*], [Marwan2007].

Parameters:

dimension (int) – The embedding dimension.
delay (int) – The embedding delay.
threshold (float) – The recurrence threshold.
min_dist (number) – The minimum temporal distance for twins.

Return type:

2D array [index, time]

Returns:

the twin surrogates.

twins(threshold, min_dist=7)[source]¶

Return list of the twins of each state vector for all time series.

Two state vectors are said to be twins if they share the same recurrences, i.e., if the corresponding rows or columns in the recurrence plot are identical.

References: [Thiel2006], [Marwan2007].

Parameters:

embedding_array (3D array [index, time, dimension]) – The embedded time series array.
threshold (float) – The recurrence threshold.
min_dist (number) – The minimum temporal distance for twins.

Return type:

[[number]]

Returns:

the list of twins for each state vector in the time series.

white_noise_surrogates()[source]¶

Return a shuffled copy of a time series array.

Each time series is shuffled individually. The surrogates correspond to realizations of white noise consistent with the original_data time series’ amplitude distribution.

Example (Distributions of white noise surrogates should the same as for the original data):

>>> ts = Surrogates.SmallTestData().original_data
>>> surrogates = Surrogates.        ...     SmallTestData().white_noise_surrogates()
>>> np.allclose(np.histogram(ts[0,:])[0],
...             np.histogram(surrogates[0,:])[0])
True

Return type:: 2D array [index, time]
Returns:: The surrogate time series.