savitzky_golay

sofia_redux.toolkit.convolve.kernel.savitzky_golay(*args, order=2, error=1, mask=None, do_error=False, robust=0, eps=0.01, maxiter=100, model=False, scale=False, ignorenans=True, **kwargs)[source]

Apply a least-squares (Savitzky-Golay) polynomial filter

Convenience wrapper for toolkit.convolve.kernel.SavgolConvolve

Any NaN or False “mask” values will be interpolated over before filtering. Error will be propagated correctly and will account for interpolation over masked/NaN values or outliers removed during “robust” iteration.

Arguments are supplied in the following order: [x0, x1, … xn], values, window, [optional parameters]

where x are dimension coordinates supplied in the same order as the axes of values. Therefore, if you supplied arguments in the order x, y, z then you should expect the zeroth axis of z to relate to the x-coordinates and the first axis of z to relate to the y-coordinates. This is contrary to standard numpy (row (y), col (x)) ordering. For consistency with numpy, one would supply the arguments in the y, x, z order.

The x0, x1,… independent values do not need to be supplied, and instead, the features of values will be used to generate a regular grid of coordinates. Be aware however that the independent variables are used to rescale window widths if “scale” is set to True, and also used for linear interpolation over masked, NaN, or identified outliers (if “robust”).

Parameters:
args(ndim + 2)-tuple or 2-tuple
  • args[-1] contains the filtering window width. This should be a positive integer greater than order and less than the size of the image.

  • args[-2] contains the dependent variables as an array of shape (shape). If independent variables were supplied, (shape) must match across args[:ndim + 1]. Otherwise, the dimension and shape is determined by args[-2].

  • args[:ndim] contain the independent variables. These are used for interpolation purposes or “scale” and should be of size (shape). However, they do not need to be specified if interpolation is being carried out on a regular grid. In this case, the user should just supply the dependent variables as an array of the correct dimension and shape.

orderint, optional

Order of polynomial to use in designing the filter. Higher orders produce sharper features.

errorfloat or array_like of float (shape), optional

Dependent variable errors to propagate

maskarray_like or bool (shape), optional

Mask indicating good (True) and bad (False) samples. Bad values will not be included in the fit and will be replaced with linear interpolation prior to filtering.

do_errorbool, optional

If True, return the propagated error in addition to the filtered data.

robustint or float, optional

If > 0, taken as the sigma threshold above which to identify outliers. Outliers are those identified as abs(x_i - x_med) / MAD > “robust” where x is the residual of (data - fit) x_med is the median, MAD is the median absolute deviation defined as 1.482 * median(abs(x_i - x_med)). The fit will be iterated upon until the set of identified outliers does not change, or any change in the relative RMS is less than “eps”, or “maxiter” iterations have occured.

epsfloat, optional

The limit of fractional check in RMS if “robust” is > 0

maxiterint, optional

The maximum number iterations if “robust” is > 0

modelbool, optional

If set, return the fitting model with lots of nice little methods and statistics to play with.

scalebool, optional

If True, scale “window” to the average spacing between samples over each dimension. Note that this replaces “width” in the old IDL version. This option should not be used if working in multiple non-orthogonal features, as average spacing per dimension is taken as the average separation between ordered dimensional coordinates.

ignorenansbool, optional

If True (default and highly recommended), NaNs will be removed and interpolated over.

kwargsdict, optional.

Optional keywords and values that would be passed into scipy.signal.savgol_filter.

Returns:
filtered_datanumpy.ndarray of numpy.float64 (shape) or Model

The filtered data or an instance of toolkit.base.Model.

Notes

Edge conditions are handled differently according to whether the data is one dimensional or not. If it is, then extrapolation is permitted. However, extrapolation is not permitted in features of > 2. In these cases, everything should work as expected unless the corner samples are NaN. Interpolation is not possible and NaNs will be applied to the filtered array according to width/2 of the window centered on the corner NaN.

Also note, the IDL version “robustsg” was only 1-dimensional and savitzky_golay will perform exactly the same in this case.

Examples

Generate coefficients for cubic-polynomial with a window of 5

>>> import numpy as np
>>> x = np.arange(5)
>>> y = [0, 0, 1, 0, 0]
>>> filtered = savitzky_golay(x, y, 5, order=2)
>>> assert np.allclose(filtered, [-3/35, 12/35, 17/35, 12/35, -3/35])

2-dimensional example

>>> (y, x), z = np.mgrid[:11, :11], np.zeros((11, 11))
>>> z[5, 5] = 1
>>> x = x / 2

Coordinates are inferred if not supplied

>>> filtered1 = savitzky_golay(x, y, z, 5, order=2)
>>> filtered2 = savitzky_golay(z, 5, order=2)
>>> assert np.allclose(filtered1, filtered2)
>>> filtered3 = savitzky_golay(x, y, z, 5, order=2, scale=True)
>>> print(np.sum(filtered1 != 0))  # window = (5, 5) x, y pixels
25
>>> print(np.sum(filtered3 != 0))  # window = (11, 5) x, y, pixels
55

Note that in the last case, scale scaled the x window width to 11 pixels since the average spacing in x is 0.5. i.e. a window width of 1 is equal to 2 x-pixels. Therefore, the window is expanded to 2 * 5 = 10, and then an extra pixel is added on to ensure that the filter is centered (odd number of pixels).