core.data

Provides classes for analyzing spatially embedded complex networks, handling multivariate data and generating time series surrogates.

class pyunicorn.core.data.Data(observable: ndarray, grid: GeoGrid, observable_name: str = None, observable_long_name: str = None, window: dict | None = None, silence_level: int = 0)[source]

Bases: object

Encapsulates general spatio-temporal data.

Also contains methods to load data from various file formats (currently NetCDF and ASCII).

Mainly an abstract class.

classmethod Load(file_name, observable_name, file_type, dimension_names=None, window=None, vertical_level=None, silence_level=0)[source]

Initialize an instance of Data.

Supported file types file_type are:
  • “NetCDF” for regular (rectangular) grids

  • “iNetCDF” for irregular (e.g. geodesic) grids or station data.

The spatio-temporal window is described by the following dictionary:

window = {"time_min": 0., "time_max": 0., "lat_min": 0.,
          "lat_max": 0., "lon_min": 0., "lon_max": 0.}

Note

It is assumed that the NetCDF file to be loaded uses the following dimension names: lat, lon, time (e.g., as is the case for NCEP/NCAR reanalysis 1 data). These standard dimension names can be modified using the dimension_names argument. Alternatively, the standard class constructor __init__() needs to be used after loading the data manually, e.g., employing netcdf4-python or scipy.io.netcdf functionality.

Parameters:
  • file_name (str) – The name of the data file.

  • observable_name (str) – The short name of the observable within data file (particularly relevant for NetCDF).

  • file_type (str) – The type of the data file.

  • dimension_names (dict) – The names of the dimensions as used in the NetCDF file. Default: {“lat”: “lat”, “lon”: “lon”, “time”: “time”}

  • window (dict) – Spatio-temporal window to select a view on the data.

  • vertical_level (int) – The vertical level to be extracted from the data file. Is ignored for horizontal data sets. If None, the first level in the data file is chosen.

  • silence_level (int) – The inverse level of verbosity of the object.

static SmallTestData()[source]

Return test data set of 6 time series with 10 sampling points each.

Example:

>>> Data.SmallTestData().observable()
array([[  0.00000000e+00,   1.00000000e+00,   1.22464680e-16,
         -1.00000000e+00,  -2.44929360e-16,   1.00000000e+00],
       [  3.09016994e-01,   9.51056516e-01,  -3.09016994e-01,
         -9.51056516e-01,   3.09016994e-01,   9.51056516e-01],
       [  5.87785252e-01,   8.09016994e-01,  -5.87785252e-01,
         -8.09016994e-01,   5.87785252e-01,   8.09016994e-01],
       [  8.09016994e-01,   5.87785252e-01,  -8.09016994e-01,
         -5.87785252e-01,   8.09016994e-01,   5.87785252e-01],
       [  9.51056516e-01,   3.09016994e-01,  -9.51056516e-01,
         -3.09016994e-01,   9.51056516e-01,   3.09016994e-01],
       [  1.00000000e+00,   1.22464680e-16,  -1.00000000e+00,
         -2.44929360e-16,   1.00000000e+00,   3.67394040e-16],
       [  9.51056516e-01,  -3.09016994e-01,  -9.51056516e-01,
          3.09016994e-01,   9.51056516e-01,  -3.09016994e-01],
       [  8.09016994e-01,  -5.87785252e-01,  -8.09016994e-01,
          5.87785252e-01,   8.09016994e-01,  -5.87785252e-01],
       [  5.87785252e-01,  -8.09016994e-01,  -5.87785252e-01,
          8.09016994e-01,   5.87785252e-01,  -8.09016994e-01],
       [  3.09016994e-01,  -9.51056516e-01,  -3.09016994e-01,
          9.51056516e-01,   3.09016994e-01,  -9.51056516e-01]])
Return type:

Data instance

Returns:

a Data instance for testing purposes.

__init__(observable: ndarray, grid: GeoGrid, observable_name: str = None, observable_long_name: str = None, window: dict | None = None, silence_level: int = 0)[source]

Initialize an instance of Data.

The spatio-temporal window is described by the following dictionary:

window = {"time_min": 0., "time_max": 0., "lat_min": 0.,
          "lat_max": 0., "lon_min": 0., "lon_max": 0.}
Parameters:
  • observable (2D array [time, index]) – The array of time series to be represented by the Data instance.

  • grid (GeoGrid instance) – The GeoGrid representing the spatial coordinates associated to the time series and their temporal sampling.

  • observable_name (str) – A short name for the observable.

  • observable_long_name (str) – A long name for the observable.

  • window (dict) – Spatio-temporal window to select a view on the data.

  • silence_level (int) – The inverse level of verbosity of the object.

__str__()[source]

Return a string representation of the object.

__weakref__

list of weak references to the object

classmethod _get_netcdf_data(file_name, file_type, observable_name, dimension_names, vertical_level=None, silence_level=0)[source]

Import data from a NetCDF file with a regular and rectangular grid.

Supported file types file_type are:
  • “NetCDF” for regular (rectangular) grids

  • “iNetCDF” for irregular (e.g. geodesic) grids or station data

Parameters:
  • file_name (str) – The name of the data file.

  • file_type (str) – The format of the data file.

  • observable_name (str) – The short name of the observable within data file (particularly relevant for NetCDF).

  • dimension_names (dict) – The names of the dimensions as used in the NetCDF file. E.g., dimension_names = {“lat”: “lat”, “lon”: “lon”, “time”: “time”}.

  • vertical_level (int) – The vertical level to be extracted from the data file. Is ignored for horizontal data sets. If None, the first level in the data file is chosen.

  • silence_level (int) – The inverse level of verbosity of the object.

classmethod _load_data(file_name, file_type, observable_name, dimension_names, vertical_level=None, silence_level=0)[source]

Load data into a Numpy array and create a corresponding GeoGrid object.

Supported file types file_type are:
  • “NetCDF” for regular (rectangular) grids

  • “iNetCDF” for irregular (e.g. geodesic) grids or station data

Parameters:
  • file_name (str) – The name of the data file.

  • file_type (str) – The format of the data file.

  • observable_name (str) – The short name of the observable within data file (particularly relevant for NetCDF).

  • dimension_names (dict) – The names of the dimensions as used in the NetCDF file. E.g., dimension_names = {“lat”: “lat”, “lon”: “lon”, “time”: “time”}.

  • vertical_level (int) – The vertical level to be extracted from the data file. Is ignored for horizontal data sets. If None, the first level in the data file is chosen.

  • silence_level (int) – The inverse level of verbosity of the object.

_observable

Current spatio-temporal view on the data.

static cos_window(data, gamma)[source]

Return a cosine window fitting the shape of the data argument.

The window is one for most of the time and goes to zero at the boundaries of each time series in the data array.

The width of the cosine shaped decay region is controlled by the shape parameter gamma:

  • Gamma=1 means, that each of the two decay regions extends over half of the time series.

  • Gamma=0 means, that the decay regions vanish and the window transformation becomes the identity.

Example:

>>> ts = np.arange(24).reshape(12,2)
>>> Data.cos_window(data=ts, gamma=0.75)
array([[ 0.        ,  0.        ], [ 0.14644661,  0.14644661],
       [ 0.5       ,  0.5       ], [ 0.85355339,  0.85355339],
       [ 1.        ,  1.        ], [ 1.        ,  1.        ],
       [ 1.        ,  1.        ], [ 1.        ,  1.        ],
       [ 0.85355339,  0.85355339], [ 0.5       ,  0.5       ],
       [ 0.14644661,  0.14644661], [ 0.        ,  0.        ]])
Parameters:
  • data (2D Numpy array [time, index]) – The data array to be fitted by cosine window.

  • gamma (number (float)) – The cosine window shape parameter.

Return type:

2D Numpy array [time, index]

Returns:

the cosine window fitting data array.

grid

The GeoGrid object associated with the data.

static next_power_2(i)[source]

Return the power of two 2^n, that is greater or equal than i.

Example:

>>> Data.next_power_2(253)
256
Parameters:

i (number (float)) – Some real number.

Return type:

number (float)

Returns:

the power of two greater of equal than a given value.

static normalize_time_series_array(time_series_array)[source]

Normalize an array of time series to zero mean and unit variance individually for each individual time series.

Works also for complex valued time series.

Modifies the given array in place!

Example:

>>> ts = np.arange(16).reshape(4,4).astype("float")
>>> Data.normalize_time_series_array(ts)
>>> ts.mean(axis=0)
array([ 0.,  0.,  0.,  0.])
>>> ts.std(axis=0)
array([ 1.,  1.,  1.,  1.])
>>> ts[:,0]
array([-1.34164079, -0.4472136 ,  0.4472136 ,  1.34164079])
Parameters:

time_series_array (2D Numpy array [time, index]) – The time series array to be normalized.

observable()[source]

Return the current spatio-temporal view on the data.

Example:

>>> Data.SmallTestData().observable()[0,:]
array([  0.00000000e+00,   1.00000000e+00,   1.22464680e-16,
        -1.00000000e+00,  -2.44929360e-16,   1.00000000e+00])
Return type:

2D Numpy array [time, space]

Returns:

the current spatio-temporal view on the data.

observable_long_name

(str) - The long name of the observable within data file.

observable_name

(str) - The short name of the observable within data file (particularly relevant for NetCDF).

print_data_info()[source]

Print information on the data encapsulated by the Data object.

static rescale(array, var_type)[source]

Rescale an array to a given data type.

Returns the tuple (scaled_array, scale_factor, add_offset, actual_range). Allows flexible handling of final amount of used storage volume for the file.

Parameters:
  • array

  • var_type (str) – Determines the desired final data type of the array.

set_global_window()[source]

Set the view on the whole data set.

Select the full data set and creates a data array as well as a corresponding GeoGrid object to access this window from outside.

Example (Set smaller window and subsequently restore global window):

>>> data = Data.SmallTestData()
>>> data.set_window(window={"time_min": 0., "time_max": 4.,
...                 "lat_min": 10., "lat_max": 20., "lon_min": 5.,
...                 "lon_max": 10.})
>>> data.grid.grid()["lat"]
array([ 10.,  15.], dtype=float32)
>>> data.set_global_window()
>>> data.grid.grid()["lat"]
array([  0.,   5.,  10.,  15.,  20.,  25.], dtype=float32)
set_silence_level(silence_level)[source]

Set the silence level.

Includes dependent objects such as grid.

Parameters:

silence_level (number (int)) – The inverse level of verbosity of the object.

set_window(window)[source]

Select a rectangular spatio-temporal region from the data set.

Create a data array as well as a corresponding GeoGrid object to access this window.

The time axis of the underlying raw data is assumed to be ordered and increasing. The latitude and longitude sequences can be arbitrarily chosen, i.e., no ordering and no regular grid is required.

The spatio-temporal window is described by the following dictionary:

window = {"time_min": 0., "time_max": 0., "lat_min": 0.,
          "lat_max": 0., "lon_min": 0., "lon_max": 0.}

If the temporal boundaries are equal, the data’s full time range is selected. If any of the two corresponding spatial boundaries are equal, the data’s full spatial extension is included.

Example:

>>> data = Data.SmallTestData()
>>> data.set_window(window={
...     "time_min": 0., "time_max": 4., "lat_min": 10.,
...     "lat_max": 20., "lon_min": 5., "lon_max": 10.})
>>> data.observable()
array([[  1.22464680e-16,  -1.00000000e+00],
       [ -3.09016994e-01,  -9.51056516e-01],
       [ -5.87785252e-01,  -8.09016994e-01],
       [ -8.09016994e-01,  -5.87785252e-01],
       [ -9.51056516e-01,  -3.09016994e-01]])
Parameters:

window (dictionary) – A spatio-temporal window to select a view on the data.

silence_level

(int) - The inverse level of verbosity of the object.

window()[source]

Return the current spatio-temporal window.

Examples:

>>> Data.SmallTestData().window()["lon_min"]
2.5
>>> Data.SmallTestData().window()["lon_max"]
15.0
Return type:

dictionary

Returns:

the current spatio-temporal window.

static zero_pad_data(data)[source]

Return zero padded data, such that the length of individual time series is a power of 2.

Example:

>>> ts = np.arange(20).reshape(5,4)
>>> Data.zero_pad_data(ts)
array([[  0.,   0.,   0.,   0.], [  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.], [  8.,   9.,  10.,  11.],
       [ 12.,  13.,  14.,  15.], [ 16.,  17.,  18.,  19.],
       [  0.,   0.,   0.,   0.], [  0.,   0.,   0.,   0.]])
Parameters:

data (2D Numpy array [time, index]) – The data array to be zero padded.

Return type:

2D Numpy array [time, index]

Returns:

the zero padded data array.