grig package

Subpackages

Submodules

grig.clean_image module

grig.clean_image.clean_image(image, error=None, mask=None, window=None, order=1, fix_order=True, robust=None, negthresh=None, mode=None, leaf_size=None, **kwargs)[source]

Uses ResamplePolynomial to correct NaNs in image and/or supplied in mask.

Parameters:
imagearray_like of float

(M, N) array of data values. NaN values will automatically be considered ‘bad’ and replaced.

errorarray_like of float, optional

(M, N) array of error (1-sigma) values associated with the data array. error will be used to weight fits and also be propagated to the output error values.

maskarray_like of bool, optional

(M, N) array of bool where True indicates a valid data point that can be included the fitting and False indicates data points that should be excluded from the fit and replaced. Masked points will be reflected in the output counts array.

windowarray_like or float or int, optional

(2,) array or single float value specifying the maximum euclidean distance of a data sample from a resampling point such that it can be included in a local fit. window may be declared for each feature. For example, when fitting 2-dimensional (x, y) data, a window of 1.0 would create a circular fitting window around each resampling point, whereas a window of (1.0, 0.5) would create an eliptical fitting window with a semi-major axis of 1.0 in x and semi-minor axis of 0.5 in y. If not supplied, window is calculated based on an estimate of the median population density of the data for each feature.

orderarray_like or int, optional

(2,) array or single integer value specifying the polynomial fit order for each feature.

fix_orderbool, optional

In order for local polynomial fitting to occur, the default requirement is that nsamples >= (order + 1) ** 2, where nsamples is the number of data samples within window. If fix_order is True and this condition is not met, then local fitting will be aborted for that point and a value of cval will be returned instead. If fix_order is False, then order will be reduced to the maximum value where this condition can be met. NOTE: this is only available if order is symmetrical. i.e. it was passed in as a single integer to be applied across both x and y. Otherwise, it is unclear as to which dimension order should be reduced to meet the condition.

robustfloat, optional

Specifies an outlier rejection threshold for data. A data point is identified as an outlier if:

|x_i - x_med|/MAD > robust,

where x_med is the median, and MAD is the Median Absolute Deviation defined as:

1.482 * median(|x_i - x_med|).
negthreshfloat, optional

Specifies a negative value rejection threshold such that data < (-stddev(data) * negthresh) will be excluded from the fit.

modestr, optional

The type of check to perform on whether the sample distribution for each resampling point is adequate to derive a polynomial fit. Depending on order and fix_order, if the distribution does not meet the criteria for mode, either the fit will be aborted, returning a value of cval or the fit order will be reduced. Available modes are:

  • ‘edges’: Require that there are order samples in both the negative and positive directions of each feature from the resampling point.

  • ‘counts’: Require that there are (order + 1) ** 2 samples within the window of each resampling point.

  • ‘extrapolate’: Attempt to fit regardless of the sample distribution.

Note that ‘edges’ is the most robust mode as it ensures that no singular values will be encountered during the least-squares fitting of polynomial coefficients.

leaf_sizeint, optional

Number of points at which to switch to brute-force during the ball tree query algorithm. See sklearn.neighbours.BallTree for further details.

Returns:
cleaned_image, [variance_out], [counts]n_tuple of numpy.ndarray (M, N)

See grig.ResamplePolynomial

grig.resample module

grig.resample.Resample

alias of ResamplePolynomial

grig.resample.resampler(coordinates, data, *locations, error=None, mask=None, window=None, order=1, fix_order=True, robust=None, negthresh=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=False, smoothing=0.0, relative_smooth=False, adaptive_threshold=None, adaptive_algorithm='scaled', fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', order_algorithm='bounded', error_weighting=True, estimate_covariance=False, is_covar=False, jobs=None, get_error=False, get_counts=False, get_weights=False, get_distance_weights=False, get_rchi2=False, get_cross_derivatives=False, get_offset_variance=False, **distance_kwargs)

ResamplePolynomial data using local polynomial fitting.

Initializes and then calls the ResamplePolynomial class. For further details on all available parameters, please see ResamplePolynomial.__init__() and ResamplePolynomial.__call__().

Parameters:
coordinates
data
locations
error
mask
window
order
fix_order
robust
negthresh
window_estimate_bins
window_estimate_percentile
window_estimate_oversample
leaf_size
large_data
smoothing
relative_smooth
adaptive_threshold
adaptive_algorithm
fit_threshold
cval
edge_threshold
edge_algorithm
order_algorithm
error_weighting
estimate_covariance
is_covar
jobs
get_error
get_counts
get_weights
get_distance_weights
get_rchi2
get_cross_derivatives
get_offset_variance
distance_kwargs
Returns:
resultsfloat or numpy.ndarray or n-tuple of (float or numpy.ndarray)

If a fit is performed at a single location, the output will consist of int or float scalar values. Multiple fits result in numpy arrays. The exact output shape depends on the number of data sets, number of fitted points, dimensions of the fit locations. Assuming that all get_* keywords are set to True, the output order is:

results[0] = fitted values results[1] = error on the fit results[2] = sample counts for each fit results[3] = total weight of all samples in fit results[4] = total distance weight sum of all samples in fit results[5] = reduced chi-squared statistic of the fit results[6] = derivative mean squared cross products results[7] = offset variance of fit from sample distribution

grig.resample_base module

class grig.resample_base.ResampleBase(coordinates, data, error=None, mask=None, window=None, robust=None, negthresh=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=None, check_memory=True, memory_buffer=None, **distance_kwargs)[source]

Bases: object

Attributes:
features

int : number of data features (dimensions)

fit_settings

dict : Fit reduction settings applied during last call

fit_tree

Return the fitting tree representative of points to fit.

grid_class

Return the grid class of the resampler

multi_set

bool : True if solving for multiple data sets

n_samples

int : The number of samples in each data set.

n_sets

int : The number of data sets to fit.

window

numpy.ndarray (n_features,) : Window radius in each dimension.

Methods

__call__(*args[, fit_threshold, cval, ...])

Resample data defined during initialization onto new coordinates.

block_loop(sample_values, error, mask, ...)

Perform resampling reduction in parallel or series.

calculate_minimum_points()

Return the minimum number of points for a fit.

combine_blocks(blocks, n_sets, n_fits, ...)

Combines the results from multiple reductions into one set.

estimate_feature_windows(coordinates[, ...])

Estimates the radius of the fitting window for each feature.

estimate_max_bytes(coordinates[, window, ...])

Estimate the maximum number of bytes required by a reduction.

get_grid_class()

Return the appropriate grid class for the resampler.

global_resampling_values()

Return the global resampling values.

pre_fit(settings, *args)

Perform pre-fitting steps and build the fitting tree.

process_block(args, block)

Run solve_fits() on each block.

process_blocks(args, kwargs, settings, ...)

Wrapper for handling block resampling in a multiprocessing environment.

reduction_settings([error_weighting, ...])

Define a set of reduction instructions based on user input.

set_sample_tree(coordinates[, radius, ...])

Build the sample tree from input coordinates.

sufficient_memory_for(memory)

Return whether there is sufficient available memory for the task.

classmethod block_loop(sample_values, error, mask, fit_tree, sample_tree, settings, iteration, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True, jobs=None)[source]

Perform resampling reduction in parallel or series.

Utility function to allow the resampling algorithm to process blocks of data in series or parallel, recombining the data once complete. Please see ResamplePolynomial.__call__() for descriptions of the arguments.

Parameters:
sample_valuesnumpy.ndarray
errornumpy.ndarray
masknumpy.ndarray
fit_treeBaseTree
sample_treeBaseTree
settingsdict
iterationint
get_errorbool, optional
get_countsbool, optional
get_weightsbool, optional
get_distance_weightsbool, optional
get_rchi2bool, optional
get_cross_derivativesbool, optional
get_offset_variancebool, optional
jobsint, optional
Returns:
combined_results8-tuple of numpy.ndarray
In order: fit, error, counts, weights, distance weights,

reduced chi-squared, MSCP derivatives, distribution offset.

calculate_minimum_points()[source]

Return the minimum number of points for a fit.

Parameters:
argsn-tuple

Input arguments used to determine the number of points required for a fit.

Returns:
minimum_pointsint

The minimum number of points for a fit

static combine_blocks(blocks, n_sets, n_fits, n_dimensions, cval, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True)[source]

Combines the results from multiple reductions into one set.

The resampling reduction may be performed in serial or parallel over multiple “blocks”, where each block contains a set of spatially close fit coordinates and all samples necessary to perform a fit over all points.

Parameters:
blocksn-tuple of processed reductions for n blocks.
n_setsint

The number of data sets in the reduction. Each set is contains the same sample coordinates as all other sets, but the sample values may vary.

n_fitsint

The number of fitting points over all blocks.

n_dimensionsint

The number of coordinate dimensions.

cvalfloat

The fill value for missing data in the output fit value arrays.

get_errorbool, optional

If True, indicates that errors on the fit were calculated.

get_countsbool, optional

If True, indicates that the number of samples used for each fit should be returned.

get_weightsbool, optional

If True, indicates that the total weight sum of all samples used in each fit should be returned.

get_distance_weightsbool, optional

If True, indicates that the distance weight sum of all samples used in each fit should be returned.

get_rchi2bool, optional

If True, indicates that the reduced chi-squared statistic for each fit should be returned.

get_cross_derivativesbool, optional

If True, indicates that the derivative MSCP should be returned.

get_offset_variancebool, optional

If True, indicates that the offset variance of the fit from the sample distribution should be returned.

Returns:
results8-tuple of numpy.ndarray

results[0] = fitted values results[1] = error on the fit results[2] = counts results[3] = total weight sums results[4] = total distance weight sums results[5] = reduced chi-squared statistic results[6] = derivative MSCP results[7] = offset variance

Notes

The return value is always an 8-tuple, and the get_* keywords indicate whether the calculated values in the block reductions are valid. If False, the corresponding output array will have the correct number of axes, but be of zero size.

estimate_feature_windows(coordinates, feature_bins=10, percentile=50, oversample=2.0)[source]

Estimates the radius of the fitting window for each feature.

The estimate of the window is given as the minimum required to theoretically allow for a polynomial fit based on the number of samples within such a window. Since the window is a constant of the resampling algorithm, it is difficult to calculate a precise value that will allow all fits to occur at every point within (or close to) the sample distribution.

Therefore, the sample distribution is divided up into feature_bins n-dimensional boxes over each feature. For example, if feature_bins=10 for 2-dimensional data, we would divide the sample distribution into a total of 100 (10 * 10) equal boxes before counting the number of samples inside each box.

The number of samples used to then calculate the final window radius is determined from a given percentile (default = 50) of the box counts.

For a fit of polynomial order \(p\), the minimum number of samples required for fitting is:

\[N = (p + 1)^K\]

for \(K\) features if the order is “symmetrical” (i.e., the same for each feature), or

\[N_{required} = \prod_{k=1}^{K}{(p_k + 1)}\]

if the fit order varies by feature. Coordinates are then scaled so that:

\[x_k^{\prime} = \text{feature\_bins} \times \frac{x_k - min(x_k)} {\beta_k}\]

where the scaling factor \(\beta_k = max(x_k) - min(x_k)\). In this scheme, the coordinates have been normalized such that the width of the bin in each dimension is 1, and has a volume equal to one. Therefore, the sample density (\(\rho\)) of the bin is equal to the number of samples it contains. The volume of a unit radius spheroid is then calculated as:

\[V_{r=1} = \frac{\pi^{K/2}}{\Gamma(\frac{n}{2} + 1)}\]

and the expected number of samples expected to fall inside the spheroid is given as:

\[N_{r=1} = \rho V_{r=1}\]

We can then set the radius of the spheroid to give the required number of points as:

\[r_{scaled} = \left( \frac{N_{required} \nu}{N_{r=1}} \right)^{\frac{1}{K}} + \epsilon\]

where \(\nu\) is the oversample factor and \(\epsilon\) is added to ensure that if resampling on a uniform grid of samples, fitting at a point between two samples will always result in enough samples available to perform a fit. \(\epsilon\) is given as:

\[\epsilon = 0.5 \rho^{\frac{-1}{K}}\]

or half the average spacing between samples. We can then define the final window radius for dimension \(k\) as:

\[\Omega_k = \frac{\beta_k r_{scaled}}{\text{feature\_bins}}\]
Parameters:
coordinatesnumpy.ndarray (n_features, n_coordinates)

The coordinates of all samples to be fit.

feature_binsint, optional

The number of bins to divide the sample coordinates into, per feature, when determining the sample density (default = 10).

percentileint or float, optional

The percentile used to define a representative value for samples per bin. The default (50), gives the median of all bin populations.

oversampleint or float, optional

The oversampling factor for the window region. A value of one will result in a window that should provide the exact number of samples required for a polynomial fit of the given order assuming uniform density of the samples.

Returns:
windownumpy.ndarray (n_features,)

The principle axes of an ellipsoid used to create a fitting region around each resampling point.

classmethod estimate_max_bytes(coordinates, window=1, n_sets=1, leaf_size=40, full_tree=True, **kwargs)[source]

Estimate the maximum number of bytes required by a reduction.

Assumes that many factors are precalculated for processing time reduction and that the ball-tree is constructed in the least efficient configuration.

Parameters:
coordinatesnumpy.ndarray

The coordinates for the tree of shape (n_dimensions, n)

windowint or float, optional

The size of the tree windows

n_setsint, optional

The number of data sets to reduce sharing coordinates.

leaf_sizeint, optional

The number of leaves used to construct the ball-tree.

full_treebool, optional

Calculate the maximum number of bytes if the full ball-tree is pre-calculated. Otherwise, calculates the size of a single neighborhood sized ball-tree.

kwargsdict, optional

Optional keyword arguments to pass into the calculation for subclasses of the Reduction. Will include order (int) for polynomial resampling.

Returns:
max_sizeint

An upper limit for the maximum number of bytes used to construct the reduction including pre-calculation and all possible inputs.

property features

int : number of data features (dimensions)

property fit_settings

dict : Fit reduction settings applied during last call

property fit_tree

Return the fitting tree representative of points to fit.

Returns:
BaseTree
get_grid_class()[source]

Return the appropriate grid class for the resampler.

Returns:
BaseGrid subclass
classmethod global_resampling_values()[source]

Return the global resampling values.

The global resampling values are of main importance when performing multiprocessing, and allows each process to gain fast access to the necessary data.

Returns:
dict
property grid_class

Return the grid class of the resampler

Returns:
BaseGrid subclass
property multi_set

bool : True if solving for multiple data sets

property n_samples

int : The number of samples in each data set.

property n_sets

int : The number of data sets to fit.

pre_fit(settings, *args)[source]

Perform pre-fitting steps and build the fitting tree.

Parameters:
settingsdict

Settings calculated via reduction_settings to be applied if necessary.

argsn-tuple

The call input arguments.

Returns:
None
classmethod process_block(args, block)[source]

Run solve_fits() on each block.

Utility function that parses the settings and tree objects to something usable by the numba JIT compiled resampling functions. This is not meant to be called directly.

Parameters:
args2-tuple

A tuple of form (filename, iteration) where the filename is a string pointing towards a previously saved pickle file containing the relevant information for the reduction if required. If set to None, the arguments are retrieved from the _global_resampling_values global parameter.

blockint

The block index to process.

Returns:
results9-tuple of numpy.ndarray

The first element contains the fit point indices to be fit. For the remaining elements, please see solve_fits() return values.

classmethod process_blocks(args, kwargs, settings, sample_tree, fit_tree, jobs, iteration)[source]

Wrapper for handling block resampling in a multiprocessing environment.

Parameters:
argsn-tuple

The arguments to pass into multitask().

kwargsdict or None

The keyword arguments to pass into multitask().

settingsdict

Reduction settings.

sample_treeBaseTree

The resampling tree in sample space.

fit_treeBaseTree

The fitting tree.

jobsint

The number of jobs to perform in parallel.

iterationint

The current resampling iteration. This is simply used as a marker to distinguish pickled files on each resampling run.

Returns:
blockslist

A list of the return values from the process_block method for each block.

reduction_settings(error_weighting=True, fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', jobs=None, use_threading=None, use_processes=None, **kwargs)[source]

Define a set of reduction instructions based on user input.

This method is responsible for determining, formatting, and checking a number variables required for the resampling algorithm based on user input. For detailed descriptions of user options, please see ResamplePolynomial.__call__().

Parameters:
error_weightingbool, optional
fit_thresholdfloat, optional
cvalfloat, optional
edge_thresholdfloat or array_like (n_features,), optional
edge_algorithmstr, optional
jobsint, optional
use_threadingbool, optional

If True, force use of threads during multiprocessing.

use_processesbool, optional

If True, force use of sub-processes during multiprocessing.

kwargsdict

Optional keyword arguments to the reduction settings.

Returns:
settingsdict

The reduction settings. Also, stored as ResamplePolynomial.fit_settings().

set_sample_tree(coordinates, radius=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=None, check_memory=True, memory_buffer=None, memory_kwargs=None, **distance_kwargs)[source]

Build the sample tree from input coordinates.

Parameters:
coordinatesnumpy.ndarray (float)

The input coordinates of shape (n_features, n_samples).

radiusfloat or sequence (float), optional

The radius of the window around each fitting point used to determine sample selection for fit. If not supplied, will be estimated using ResamplePolynomial.estimate_feature_windows().

window_estimate_binsint, optional

Used to estimate the window if not supplied using ResamplePolynomial.estimate_feature_windows().

window_estimate_percentileint or float, optional

Used to estimate the window if not supplied using ResamplePolynomial.estimate_feature_windows().

window_estimate_oversampleint or float, optional

Used to estimate the window if not supplied using ResamplePolynomial.estimate_feature_windows().

leaf_sizeint, optional

Number of points at which to switch to brute-force during the ball tree query algorithm. See sklearn.neighbours.BallTree for further details.

large_databool or None, optional

If True, indicates that this resampling algorithm will run on a large set of data, and the ball tree should be created on subsets of the data.

check_memorybool, optional

If True, check the memory requirements for resampling the supplied data.

memory_bufferfloat, optional

A fraction (positive or negative) with which to modify the memory estimates for the process memory requirements. Memory estimates are scaled by the factor 1 + memory_buffer.

memory_kwargsdict, optional
distance_kwargsdict, optional

Optional keyword arguments passed into sklearn.neighbors.DistanceMetric(). The default is to use the “minkowski” definition with p=2, i.e., the Euclidean definition. This is important in determining which samples lie inside the window region of a resampling point, and when deriving distance weighting factors.

Returns:
None
classmethod sufficient_memory_for(memory)[source]

Return whether there is sufficient available memory for the task.

Parameters:
memoryint or None

The memory required for the task in bytes

Returns:
perform_taskbool
property window

numpy.ndarray (n_features,) : Window radius in each dimension.

grig.resample_kernel module

class grig.resample_kernel.ResampleKernel(coordinates, data, kernel, error=None, mask=None, robust=None, negthresh=None, kernel_spacing=None, kernel_offsets=None, kernel_weights=None, limits=None, degrees=3, smoothing=0.0, knots=None, knot_estimate=None, eps=1e-08, fix_knots=None, tolerance=0.001, max_iteration=20, exact=False, reduce_degrees=False, imperfect=False, leaf_size=40, large_data=False, **distance_kwargs)[source]

Bases: ResampleBase

Attributes:
degrees

Return the degrees of the spline fit to the kernel.

exit_code

Return the exit code of the spline fit.

exit_message

Return the spline exit message.

features

int : number of data features (dimensions)

fit_settings

dict : Fit reduction settings applied during last call

fit_tree

Return the fitting tree representative of points to fit.

grid_class

Return the grid class of the resampler

kernel

Return the resampling kernel.

kernel_offsets

Return the coordinate offsets for the kernel.

kernel_spacing

Return the spacing between kernel grid points.

multi_set

bool : True if solving for multiple data sets

n_samples

int : The number of samples in each data set.

n_sets

int : The number of data sets to fit.

spline

Return the spline object for the kernel fit.

window

numpy.ndarray (n_features,) : Window radius in each dimension.

Methods

__call__(*args[, cval, edge_threshold, ...])

Resample data defined during initialization onto new coordinates.

block_loop(sample_values, error, mask, ...)

Perform resampling reduction in parallel or series.

calculate_minimum_points()

Return the minimum number of points for a fit.

combine_blocks(blocks, n_sets, n_fits, ...)

Combines the results from multiple reductions into one set.

estimate_feature_windows(*args, **kwargs)

Estimates the radius of the fitting window for each feature.

estimate_max_bytes(coordinates[, window, ...])

Estimate the maximum number of bytes required by a reduction.

get_grid_class()

Return the appropriate grid class for the resampler.

global_resampling_values()

Return the global resampling values.

pre_fit(settings, *args)

Perform pre-fitting steps and build the fitting tree.

process_block(args, block)

Run solve_fits() on each block.

process_blocks(args, kwargs, settings, ...)

Wrapper for handling block resampling in a multiprocessing environment.

reduction_settings([error_weighting, ...])

Define a set of reduction instructions based on user input.

set_kernel(kernel[, kernel_spacing, ...])

Set the kernel for subsequent fitting.

set_sample_tree(coordinates[, leaf_size, ...])

Build the sample tree from input coordinates.

sufficient_memory_for(memory)

Return whether there is sufficient available memory for the task.

classmethod block_loop(sample_values, error, mask, fit_tree, sample_tree, settings, iteration, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True, jobs=None, **kwargs)[source]

Perform resampling reduction in parallel or series.

Utility function to allow the resampling algorithm to process blocks of data in series or parallel, recombining the data once complete. Please see ResamplePolynomial.__call__() for descriptions of the arguments.

Parameters:
sample_valuesnumpy.ndarray
errornumpy.ndarray
masknumpy.ndarray
fit_treeresampling.tree.PolynomialTree object
sample_treeresampling.tree.PolynomialTree object
settingsdict
iterationint
get_errorbool, optional
get_countsbool, optional
get_weightsbool, optional
get_distance_weightsbool, optional
get_rchi2bool, optional
get_offset_variancebool, optional
jobsint, optional
kwargsdict, optional

For consistency with the resampler base only.

Returns:
combined_results8-tuple of numpy.ndarray
In order: fit, error, counts, weights, distance weights,

reduced chi-squared, MSCP derivatives, distribution offset.

static combine_blocks(blocks, n_sets, n_fits, n_dimensions, cval, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True, **kwargs)[source]

Combines the results from multiple reductions into one set.

The resampling reduction may be performed in serial or parallel over multiple “blocks”, where each block contains a set of spatially close fit coordinates and all samples necessary to perform a fit over all points.

Parameters:
blocksn-tuple of processed reductions for n blocks.
n_setsint

The number of data sets in the reduction. Each set is contains the same sample coordinates as all other sets, but the sample values may vary.

n_fitsint

The number of fitting points over all blocks.

n_dimensionsint

The number of coordinate dimensions.

cvalfloat

The fill value for missing data in the output fit value arrays.

get_errorbool, optional

If True, indicates that errors on the fit were calculated.

get_countsbool, optional

If True, indicates that the number of samples used for each fit should be returned.

get_weightsbool, optional

If True, indicates that the total weight sum of all samples used in each fit should be returned.

get_distance_weightsbool, optional

If True, indicates that the distance weight sum of all samples used in each fit should be returned.

get_rchi2bool, optional

If True, indicates that the reduced chi-squared statistic for each fit should be returned.

get_offset_variancebool, optional

If True, indicates that the offset variance of the fit from the sample distribution should be returned.

kwargsdict, optional

For consistency with ResampleBase only (not used).

Returns:
results7-tuple of numpy.ndarray

results[0] = fitted values results[1] = error on the fit results[2] = counts results[3] = total weight sums results[4] = total distance weight sums results[5] = reduced chi-squared statistic results[6] = offset variance

Notes

The return value is always an 8-tuple, and the get_* keywords indicate whether the calculated values in the block reductions are valid. If False, the corresponding output array will have the correct number of axes, but be of zero size.

property degrees

Return the degrees of the spline fit to the kernel.

Returns:
numpy.ndarray (int)
estimate_feature_windows(*args, **kwargs)[source]

Estimates the radius of the fitting window for each feature.

The window for the resampling algorithm will be set to encompass the kernel extent over all dimensions. Unlike the standard resampling algorithm, the coordinates of the data samples are irrelevant.

The window radius is based on the kernel extent in this implementation.

Since the resampling algorithm uses an ellipsoid window to determine possible candidates for fitting, the window is determined as an ellipsoid which circumscribes the cuboid kernel array. Although there are an infinite number of possible ellipsoids, this particular method uses the constraints that principle axes of the ellipsoid and cuboid widths are at a constant ratio. For example, in two dimensions where a and b are the principle axes of an ellipsoid, and w and h are the widths of the kernel:

a/w = b/h; (w / 2a)^2 + (h / 2b)^2 = 1.

This leads to us setting the principle axes as:

a_i = sqrt(n / 4) * w_i + delta

where w_i is the width of the kernel in dimension i in n-dimensions, and delta = spacing/1e6 so that any edge cases can be included safely. This is also valid in one dimension (a_i = w_i / 2). Note that kernels are assumed to be exactly centered, so the width of a kernel in any dimension will be:

w_i = spacing_i * (kernel.shape[i] - 1)

Note that this implementation takes the width as 2 * the maximum absolute offset of the kernel for each dimension, so that irregular kernels may be safely used.

Returns:
windownumpy.ndarray (n_features,)

The principle axes of an ellipsoid used to create a fitting region around each resampling point.

property exit_code

Return the exit code of the spline fit.

Please see the Spline class for further details on the meanings of each code.

Returns:
int
property exit_message

Return the spline exit message.

Returns:
str
property kernel

Return the resampling kernel.

Returns:
numpy.ndarray (float)
property kernel_offsets

Return the coordinate offsets for the kernel.

Returns:
numpy.ndarray (float)
property kernel_spacing

Return the spacing between kernel grid points.

Returns:
numpy.ndarray (float)
classmethod process_block(args, block)[source]

Run solve_fits() on each block.

Utility function that parses the settings and tree objects to something usable by the numba JIT compiled resampling functions. This is not meant to be called directly.

Parameters:
args2-tuple

A tuple of form (filename, iteration) where the filename is a string pointing towards a previously saved pickle file containing the relevant information for the reduction if required. If set to None, the arguments are retrieved from the _global_resampling_values global parameter.

blockint

The block index to process.

Returns:
results9-tuple of numpy.ndarray

The first element contains the fit point indices to be fit. For the remaining elements, please see solve_fits() return values.

reduction_settings(error_weighting=True, absolute_weight=None, fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', is_covar=False, jobs=None, use_threading=None, use_processes=None, **kwargs)[source]

Define a set of reduction instructions based on user input.

This method is responsible for determining, formatting, and checking a number variables required for the resampling algorithm based on user input. For detailed descriptions of user options, please see ResamplePolynomial.__call__().

Parameters:
error_weightingbool, optional

If True (default), weight polynomial fitting by the error values of each sample.

absolute_weightbool, optional

If the kernel weights are negative, can lead to almost zero-like divisions in many of the algorithms. If set to True, the sum of the absolute weights are used for normalization.

fit_thresholdfloat, optional

Not implemented for kernel fitting.

cvalfloat, optional

During fitting, any fit that does not meet the order_algorithm requirement will be set to this value. This will be NaN by default.

edge_thresholdfloat or array_like or float

If set to a value > 0, edges of the fit will be masked out according to edge_algorithm. Values close to zero will result in a low degree of edge clipping, while values close to 1 clip edges to a greater extent. The exact definition of edge_threshold depends on the algorithm. For further details, please see resampling.resample_utils.check_edges().

edge_algorithmstr, optional

Describes how to clip edges if edge_threshold is non-zero. The available algorithms are:

  • ‘distribution’ (default): Statistics on sample distributions are calculated, and if the resampling point is > 1/threshold standard deviations away from the sample mean, it will be clipped.

  • ‘ellipsoid’: If the samples used to fit a resampling point deviate from the resampling point location by more than this amount, it will be clipped.

  • ‘box’: If the flattened 1-dimensional distribution of samples center-of-mass deviates from the resampling point location in any dimension, it will be clipped.

  • ‘range’: Over each dimension, check the distribution of points is greater than edge_threshold to the “left” and “right” of the resampling point.

is_covarbool, optional

If True, the input data is treated as a covariance instead of a flux, and is propagated as if through a weighted mean.

jobsint, optional

Specifies the maximum number of concurrently running jobs. An attempt will be made to parallel process using a thread-pool if available, but will otherwise revert to the “loky” backend. Values of 0 or 1 will result in serial processing. A negative value sets jobs to n_cpus + 1 + jobs such that -1 would use all cpus, and -2 would use all but one cpu.

use_threadingbool, optional

If True, force use of threads during multiprocessing.

use_processesbool, optional

If True, force use of sub-processes during multiprocessing.

kwargsdict

Optional keyword arguments to the reduction settings.

Returns:
settingsdict

The reduction settings. Also, stored as ResamplePolynomial.fit_settings().

set_kernel(kernel, kernel_spacing=None, kernel_offsets=None, weights=None, limits=None, degrees=3, smoothing=0.0, knots=None, knot_estimate=None, eps=1e-08, fix_knots=None, tolerance=0.001, max_iteration=20, exact=False, reduce_degrees=False, imperfect=False)[source]

Set the kernel for subsequent fitting.

During this process, a spline will be fit to describe the kernel at all intermediate points.

Parameters:
kernelnumpy.ndarray (float)

The kernel to set. Must have n_features dimensions.

kernel_spacingfloat or numpy.ndarray (float), optional

The spacing between each kernel element in units of the coordinates. Either supplied as a single value for all features, or as an array of shape (n_features,) giving the kernel spacing for each feature.

kernel_offsetstuple or array_like, optional

If the kernel is regular, should be an n-dimensional tuple containing the grid indices in each dimension. Otherwise, should be an array of shape (n_dimensions, kernel.size).

weightsnumpy.ndarray, optional

Optional weights to supply to the spline fit for each data point. Should be the same shape as the supplied kernel.

limitsnumpy.ndarray (float), optional

An array of shape (n_dimensions, 2) that may be supplied to set the minimum and maximum coordinate values used during the spline fit. For example, limits[1, 0] sets the minimum knot value in the second dimensions and limits[1, 1] sets the maximum knot value in the second dimension. By default this is set to the minimum and maximum values of the coordinates in each dimension.

degreesint or numpy.ndarray (int), optional

The degree of spline to fit in each dimension. Either a scalar can be supplied pertaining to all dimensions, or an array of shape (n_dimensions,) can be used.

smoothingfloat, optional

Used to specify the smoothing factor. If set to None, the smoothing will be determined based on user settings or input data. If exact is True, smoothing will be disabled (zero). If exact is False, smoothing will be set to n - sqrt(2 * n) where n is the number of data values. If supplied, smoothing must be greater than zero. See above for further details. Note that if smoothing is zero, and the degrees are not equal over each dimension, smoothing will be set to eps due to numerical instabilities.

knotslist or tuple or numpy.ndarray, optional

A set of starting knot coordinates for each dimension. If a list or tuple is supplied it should be of length n_dimensions where element i is an array of shape (n_knots[i]) for dimension i. If an array is supplied, it should be of shape (n_dimension, max(n_knots)). Note that there must be at least 2 * (degree + 1) knots for each dimension. Unused or invalid knots may be set to NaN, at the end of each array. Knot coordinates must also be monotonically increasing in each dimension.

knot_estimatenumpy.ndarray (int), optional

The maximum number of knots required for the spline fit in each dimension and of shape (n_dimensions,). If not supplied, the knot estimate will be set to int((n / n_dimensions) ** n_dimensions^(-1)) or n_knots if knots were supplied and fixed.

epsfloat, optional

A value where 0 < eps < 1. This defines the magnitude used to identify singular values in the spline observation matrix (A). If any row of A[:, 0] < (eps * max(A[:,0])) it will be considered singular.

fix_knotsbool, optional

If True, do not attempt to modify or add knots to the spline fit. Only the initial supplied user knots will be used.

tolerancefloat, optional

A value in the range 0 < tolerance < 1 used to determine the exit criteria for the spline fit. See above for further details.

max_iterationint, optional

The maximum number of iterations to perform when solving for the spline fit.

exactbool, optional

If True, the initial knots used will coincide with the actual input coordinates and smoothing will be set to zero. No knots should be supplied by the user in this instance.

reduce_degreesbool, optional

Only relevant if exact is True. If set to True, the maximum allowable degree in each dimension will be limited to (len(unique(x)) // 2) - 1 where x are the coordinate values in any dimension.

imperfectbool, optional

If a spline fit to the kernel is allowed to be imperfect (True), will only raise an error on spline fitting if a major error was encountered. Otherwise, fits will be permitted so long as a solution was reached, even if that solution did not meet expectations.

Returns:
None
set_sample_tree(coordinates, leaf_size=40, large_data=False, **distance_kwargs)[source]

Build the sample tree from input coordinates.

Parameters:
coordinatesnumpy.ndarray (float)

The input coordinates of shape (n_features, n_samples).

leaf_sizeint, optional

Number of points at which to switch to brute-force during the ball tree query algorithm. See sklearn.neighbours.BallTree for further details.

large_databool, optional

If True, indicates that this resampling algorithm will run on a large set of data, and the ball tree should be created on subsets of the data.

distance_kwargsdict, optional

Optional keyword arguments passed into sklearn.neighbors.DistanceMetric(). The default is to use the “minkowski” definition with p=2, i.e., the Euclidean definition. This is important in determining which samples lie inside the window region of a resampling point, and when deriving distance weighting factors.

Returns:
None
property spline

Return the spline object for the kernel fit.

Returns:
Spline

grig.resample_kernel_utils module

grig.resample_kernel_utils.apply_mask_to_kernel_set_arrays(mask, data, error, weights, counts=None)[source]

Set certain arrays to a fixed size based on a mask array.

Parameters:
masknumpy.ndarray of bool (N,)

Mask where True values indicate the associated element should be kept, and False will result in exclusion from the output arrays.

datanumpy.ndarray (N,)

The data array.

errornumpy.ndarray

An array of shape (1,) or (N,). If an array of size 1 is supplied, it will be expanded to an array of counts size.

weightsnumpy.ndarray

An array of shape (1,) or (N,). If an array of size 1 is supplied, it will be expanded to an array of counts size.

countsint, optional

The number of True values in the mask, optionally passed in for speed. Determines the output size of all arrays.

Returns:
data_out, error_out, weight_out3-tuple of numpy.ndarray.

Resized arrays in which the last axis is of size counts.

grig.resample_kernel_utils.calculate_kernel_weights(coordinates, reference, knots, n_knots, coefficients, degrees, panel_mapping, panel_steps, knot_steps, nk1, spline_mapping, eps=1e-08)[source]

Calculate values of a kernel centered on a reference for given coordinates.

The kernel must have been transformed into a spline representation. A value of zero on the spline knots represents the center of the kernel.

Parameters:
coordinatesnumpy.ndarray (float)

The coordinates for which to calculate weights of shape (n_dimensions, n).

referencenumpy.ndarray (float)

The location of the kernel center of shape (n_dimensions,).

knotsnumpy.ndarray (float)

The kernel spline knots of shape (n_dimensions, >max(n_knots)).

n_knotsnumpy.ndarray (int)

The number of spline knots in each dimension of shape (n_dimensions,).

coefficientsnumpy.ndarray (float)

The spline coefficients of shape (n_coefficients,).

degreesnumpy.ndarray (int)

The spline degrees in each dimension of shape (n_dimensions,).

panel_mappingnumpy.ndarray (int)

An array containing the panel mapping (flat to n-D) indices. This is created by passing the panel shape (n_knots - (2 * degrees) - 1) into flat_index_mapping(). Should be an array of shape (n_dimensions, n_panels).

panel_stepsnumpy.ndarray (int)

The flat index mapping steps in panel-space of shape (n_dimensions,). These are returned by passing the shape Spline.panel_shape into flat_index_mapping().

knot_stepsnumpy.ndarray (int)

The flat index mapping steps in knot-space of shape (n_dimensions,). These are returned by passing the shape (n_knots - degrees - 1) into flat_index_mapping().

nk1numpy.ndarray (int)

An array of shape (n_dimensions,) containing the values n_knots - k1 where n_knots are the number of knots in each dimension, and k1 are the spline degrees + 1 in each dimension.

spline_mappingnumpy.ndarray (int)

An array containing the spline mapping (flat to n-D) indices. This is created by passing the spline shape (degrees + 1) into flat_index_mapping(). Should be an array of shape (n_dimensions, n_spline_coefficients).

epsfloat, optional

Due to rounding errors, sometimes a value is flagged as invalid if it is exactly on the edge of the kernel. This value allows a certain tolerance to those incorrect results.

Returns:
weightsnumpy.ndarray (float)

The interpolated weights of shape (n,).

grig.resample_kernel_utils.solve_kernel_fit(window_coordinates, window_values, window_error, window_mask, kernel_weights, fit_coordinate, is_covar=False, cval=nan, error_weighting=True, absolute_weight=False, edge_algorithm_idx=1, edge_threshold=None, get_error=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True)[source]

Solve for a kernel convolution at a single coordinate.

Generally, the kernel convolution value is of the form:

\[f(x) = \sum_{j}{d_j k_j} / \sum_{j}{k_j}\]

where \(x\) is the coordinate at the fit point, \(j\) are the indices of all samples within the fitting window, \(d\) are the sample values, and \(k\) are the kernel weights.

EDGE CHECKING

The first of the on-the-fly calculation is the “edge check”. Generally, polynomial fits are not well-defined away from the sample distribution from which they were derived. This is especially true for higher order fits that may fit the sample distribution well, but start to deviate wildly when evaluated outside of the distribution. The edge check step defines a border around the distribution, outside of which the fit will be aborted. There are a number of algorithms available which vary in robustness and speed. Please see check_edges() for details on available algorithms.

FITTING

If the above checks pass, a fit can be attempted. There are two types of fit that may be performed. The first is the standard kernel fit described above. However, if is_covar was set to True, the window_values are considered covariances to propagate, and a fit will derived by propagating a weighted variance.

Parameters:
window_coordinatesnumpy.ndarray (n_dimensions, n_samples)

The independent coordinates within the window region of the fitting coordinate.

window_valuesnumpy.ndarray (n_samples,)

The dependent values of the samples.

window_errornumpy.ndarray (n_samples,)

The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (1,) in which case all samples in a set will share the same associated error value. If the shape is set to (0,) this indicates that no error values are available for the samples.

window_masknumpy.ndarray (n_samples,)

A mask where False indicates that the associated sample should be excluded from the fit.

kernel_weightsnumpy.ndarray (n_samples,)

The kernel weighting factors applied to each sample in the fit.

fit_coordinatenumpy.ndarray (n_dimensions,)

The coordinate of the fitting point.

is_covarbool, optional

If True, indicates that window_values contains covariance values that should be propagated through algorithm. If this is the case, polynomial fitting is disabled, and a weighted variance is calculated instead.

cvalfloat, optional

In a case that a fit is unable to be calculated at certain location, cval determines the returned fit value.

error_weightingbool, optional

If True, weight the samples in the fit by the inverse variance (1 / window_error^2) in addition to distance weighting.

absolute_weightbool, optional

If the kernel weights are negative, can lead to almost zero-like divisions in many of the algorithms. If set to True, the sum of the absolute weights are used for normalization.

edge_algorithm_idxint, optional

Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see check_edges() for further information. The default (1), is always the most robust of the available algorithms.

edge_thresholdnumpy.ndarray (n_dimensions,)

A threshold parameter determining how close an edge should be to the center of the distribution during check_edges(). Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.

get_errorbool, optional

If True, return the error on the fit.

get_weightsbool, optional

If True, return the sum of all sample weights used in determining the fit.

get_distance_weightsbool, optional

If True, return the sum of only the distance weights used in determining the fit.

get_rchi2bool, optional

If True, return the reduced chi-squared statistic for each of the fits.

get_offset_variancebool optional

If True, return the offset of the fitting point from the sample distribution. See offset_variance() for further information.

Returns:
fit_result7-tuple
fit_result[0]: Fitted value (float).

Set to cval on fit failure.

fit_result[1]: Error on the fit (float).

Set to NaN on fit failure.

fit_result[2]: Number of samples included in the fit (int).

Set to 0 on fit failure.

fit_result[3]: Weight sum (float).

Set to 0.0 on fit failure.

fit_result[4]: Distance weight sum (float).

Set to 0.0 on fit failure.

fit_result[5]: Reduced chi-squared statistic (float).

Set to NaN on fit failure.

fit_result[6]: Offset variance from the distribution center (float).

Set to NaN on fit failure.

grig.resample_kernel_utils.solve_kernel_fits(sample_indices, sample_coordinates, sample_data, sample_error, sample_mask, fit_coordinates, knots, coefficients, degrees, panel_mapping, panel_steps, knot_steps, nk1, spline_mapping, n_knots, is_covar=False, cval=nan, error_weighting=True, absolute_weight=False, edge_algorithm_idx=1, edge_threshold=None, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True)[source]

Solve all fits within one intersection block.

This function is a wrapper for solve_kernel_fit() over all data sets and fit points. The main computations here involve:

  1. Creating and populating the output arrays.

  2. Selecting the correct samples within the region of each fitting window.

  3. Calculating the full weighting factors for the fits.

For further details on the actual fitting, please see solve_kernel_fit().

Parameters:
sample_indicesnumba.typed.List

A list of 1-dimensional numpy.ndarray (dtype=int) of length n_fits. Each list element sample_indices[i], contains the indices of samples within the “window” region of fit_indices[i].

sample_coordinatesnumpy.ndarray (n_dimensions, n_samples)

The independent coordinates for each sample in n_dimensions

sample_datanumpy.ndarray (n_sets, n_samples)

The dependent values of the samples for n_sets, each containing n_samples.

sample_errornumpy.ndarray (n_sets, n_samples)

The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (n_sets, 1) in which case all samples in a set will share the same associated error value. If the shape is set to (n_sets, 0), this indicates that no error values are available for the samples.

sample_masknumpy.ndarray (n_sets, n_samples)

A mask where False indicates that the associated sample should be excluded from all fits.

fit_coordinatesnumpy.ndarray (n_dimensions, n_fits)

The independent variables at each fit coordinate in d_dimensions.

knotslist or tuple or numpy.ndarray, optional

A set of starting knot coordinates for each dimension. If a list or tuple is supplied it should be of length n_dimensions where element i is an array of shape (n_knots[i]) for dimension i. If an array is supplied, it should be of shape (n_dimension, max(n_knots)). Note that there must be at least 2 * (degree + 1) knots for each dimension. Unused or invalid knots may be set to NaN, at the end of each array. Knot coordinates must also be monotonically increasing in each dimension.

coefficientsnumpy.ndarray (float)

The spline coefficients of shape (n_coefficients,).

degreesnumpy.ndarray (int)

The degrees of the spline in each dimension (n_dimensions,).

panel_mappingnumpy.ndarray (int)

An array containing the panel mapping (flat to n-D) indices. This is created by passing the panel shape (n_knots - (2 * degrees) - 1) into flat_index_mapping(). Should be an array of shape (n_dimensions, n_panels).

panel_stepsnumpy.ndarray (int)

The flat index mapping steps in panel-space of shape (n_dimensions,). These are returned by passing the shape Spline.panel_shape into flat_index_mapping().

knot_stepsnumpy.ndarray (int)

The flat index mapping steps in knot-space of shape (n_dimensions,). These are returned by passing the shape (n_knots - degrees - 1) into flat_index_mapping().

nk1numpy.ndarray (int)

An array of shape (n_dimensions,) containing the values n_knots - k1 where n_knots are the number of knots in each dimension, and k1 are the spline degrees + 1 in each dimension.

spline_mappingnumpy.ndarray (int)

An array containing the spline mapping (flat to n-D) indices. This is created by passing the spline shape (degrees + 1) into flat_index_mapping(). Should be an array of shape (n_dimensions, n_spline_coefficients).

n_knotsnumpy.ndarray (int)

The number of knots in each dimension (n_dimensions,).

is_covarbool, optional

If True, indicates that sample_data contains covariance values that should be propagated through algorithm.

cvalfloat, optional

In a case that a fit is unable to be calculated at certain location, cval determines the fill value for the output fit array at those locations.

error_weightingbool, optional

If True, weight the samples in the fit by the inverse variance (1 / sample_error^2) in addition to distance weighting.

absolute_weightbool, optional

If the kernel weights are negative, can lead to almost zero-like divisions in many of the algorithms. If set to True, the sum of the absolute weights are used for normalization.

edge_algorithm_idxint, optional

Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see check_edges() for further information. The default (1), is always the most robust of the available algorithms.

edge_thresholdnumpy.ndarray (n_dimensions,)

A threshold parameter determining how close an edge should be to the center of the distribution during check_edges(). Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.

get_errorbool, optional

If True, return the error on the fit.

get_countsbool, optional

If True, return the number of samples used when determining the fit at each fitting point.

get_weightsbool, optional

If True, return the sum of all sample weights used in determining the fit at each point.

get_distance_weightsbool, optional

If True, return the sum of only the distance weights used in determining the fit at each point.

get_rchi2bool, optional

If True, return the reduced chi-squared statistic for each of the fitted points.

get_offset_variancebool optional

If True, return the offset of the fitting point from the sample distribution. See offset_variance() for further information.

Returns:
fit_results7-tuple of numpy.ndarray

fit_results[0]: Fitted values. fit_results[1]: Error on the fit. fit_results[2]: Number of samples in each fit. fit_results[3]: Weight sums. fit_results[4]: Distance weight sums. fit_results[5]: Reduced chi-squared statistic. fit_results[6]: Offset variances from the sample distribution center.

All arrays have the shape (n_sets, n_fits) or (0, 0) depending on whether get_<name> is True or False respectively.

grig.resample_polynomial module

class grig.resample_polynomial.ResamplePolynomial(coordinates, data, error=None, mask=None, window=None, robust=None, negthresh=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, order=1, fix_order=True, large_data=None, check_memory=True, memory_buffer=0.25, **distance_kwargs)[source]

Bases: ResampleBase

Attributes:
features

int : number of data features (dimensions)

fit_settings

dict : Fit reduction settings applied during last call

fit_tree

Return the fitting tree representative of points to fit.

grid_class

Return the grid class of the resampler

multi_set

bool : True if solving for multiple data sets

n_samples

int : The number of samples in each data set.

n_sets

int : The number of data sets to fit.

order

Return the order of polynomial fit.

window

numpy.ndarray (n_features,) : Window radius in each dimension.

Methods

__call__(*args[, smoothing, ...])

Resample data defined during initialization to new coordinates.

block_loop(sample_values, error, mask, ...)

Perform resampling reduction in parallel or series.

calculate_adaptive_smoothing(settings)

Calculate the adaptive distance weighting kernel.

calculate_minimum_points()

Return the minimum number of points for a polynomial fit.

check_order(order, n_features, n_samples)

Check the order is of correct format and enough samples exist.

combine_blocks(blocks, n_sets, n_fits, ...)

Combines the results from multiple reductions into one set.

estimate_feature_windows(coordinates[, ...])

Estimates the radius of the fitting window for each feature.

estimate_max_bytes(coordinates[, window, ...])

Estimate the maximum number of bytes required by a reduction.

get_grid_class()

Return the appropriate grid class for the resampler.

global_resampling_values()

Return the global resampling values.

pre_fit(settings, *args[, ...])

Perform pre-fitting steps and build the fitting tree.

process_block(args, block)

Run solve_fits() on each block.

process_blocks(args, kwargs, settings, ...)

Wrapper for handling block resampling in a multiprocessing environment.

reduction_settings([smoothing, ...])

Define a set of reduction instructions based on user input.

set_sample_tree(coordinates[, radius, ...])

Build the sample tree from input coordinates.

sufficient_memory_for(memory)

Return whether there is sufficient available memory for the task.

calculate_adaptive_smoothing(settings)[source]

Calculate the adaptive distance weighting kernel.

Calculates a weighting kernel for each sample in the reduction. i.e, each sample will have a different weighting factor based on its distance (and optionally, direction) from each point at which a fit is required. This is done by performing a fit centered on each sample, and measuring how that fit deviates from the actual observed sample value. The first step is to decide upon the distance weighting factor used to perform the initial fit.

It is assumed that the coordinate error is known (or approximately known), and supplied as a \(1 \sigma\) error as the ‘alpha’ keyword in settings. For example, the error in astronomical observations taken using a Gaussian beam with known FWHM is:

\[\sigma = \frac{\text{FWHM}}{2 \sqrt{2 \ln{2}}}\]

However, adaptive weighting may be calculated over select dimensions, leaving some fixed. In this case, the alpha keyword will always apply a weighting factor in a “fixed” dimension \(k\) as:

\[w_k = exp \left( \frac{-\Delta x_k^2}{\alpha_{fixed, k} \Omega_k^2} \right)\]

where \(\Delta x_k\) is the distance from the resampling point to the sample in question, and \(\Omega_k\) is the window principle axis in dimension \(k\). Note that in the weighting function, \(\sigma\) is related to \(\alpha\) by:

\[\alpha_k = 2 \sigma_k^2\]

and weighting is applied using the definition of \(w_k\) above. To signify that \(\alpha_{fixed, k}\) should be used instead of \(\sigma\) for dimension \(k\), the “adaptive_threshold” array in settings should be set to zero for the corresponding dimension. For example, if we had:

settings['alpha'] = [0.3, 0.3]
settings['adaptive_threshold'] = [1, 0]

the first dimension would have \(\alpha_0=0.18\) (\(2 \times 0.3^2\)), and the second would have \(\alpha_1=0.3\). In this example, \(\alpha_0\) would be allowed to vary per sample, while \(\alpha_1\) would be fixed for each sample at 0.3. An initial fit is then performed at each sample coordinate using a test distance weighting parameter:

\[ \begin{align}\begin{aligned}\sigma_{test, k} = \frac{\pi \sigma_k}{\sqrt{2 ln{2}}}\\\alpha_{test, k} = 2 \sigma_{test, k}^2\end{aligned}\end{align} \]

Using this initial weighting parameter, the adaptive kernels are derived using either resample_utils.shaped_adaptive_weight_matrix() or resample_utils.scaled_adaptive_weight_matrix() depending on whether the “shaped” keyword in settings is set to True or False respectively.

The weighting kernels are stored in the “adaptive_alpha” keyword value in settings.

Parameters:
settingsdict

Reduction settings, as returned by ResamplePolynomial.reduction_settings().

Returns:
None
calculate_minimum_points()[source]

Return the minimum number of points for a polynomial fit.

Returns:
minimum_pointsint

The minimum number of points for a polynomial fit

static check_order(order, n_features, n_samples)[source]

Check the order is of correct format and enough samples exist.

Parameters:
orderint or array_like of int (n_features,)

The polynomial order to fit, either supplied as an integer to be applied over all dimensions, or as an array to give the order in each dimension.

n_featuresint

The number of features (dimensions) of the sample coordinates.

n_samplesint

The number of samples.

Returns:
orderint or numpy.ndarray of int (n_features,)

The formatted polynomial order.

Raises:
ValueError

If there are too few samples for the desired order, or order is not formatted correctly.

property fit_tree

Return the fitting tree representative of points to fit.

Returns:
PolynomialTree
property order

Return the order of polynomial fit.

Returns:
orderint or numpy.ndarray (int)

A symmetrical order, or the order for each feature.

pre_fit(settings, *args, adaptive_region_coordinates=None)[source]

Perform pre-fitting steps and build the fitting tree.

Parameters:
settingsdict

Settings calculated via reduction_settings to be applied if necessary.

argsn-tuple

The call input arguments.

adaptive_region_coordinatesnumpy.ndarray (float), optional

The coordinates determined from a previous adaptive smoothing algorithm.

Returns:
None
classmethod process_block(args, block)[source]

Run solve_fits() on each block.

Utility function that parses the settings and tree objects to something usable by the numba JIT compiled resampling functions. This is not meant to be called directly.

Parameters:
args2-tuple

A tuple of form (filename, iteration) where the filename is a string pointing towards a previously saved pickle file containing the relevant information for the reduction if required. If set to None, the arguments are retrieved from the _global_resampling_values global parameter.

blockint

The block index to process.

Returns:
results9-tuple of numpy.ndarray

The first element contains the fit point indices to be fit. For the remaining elements, please see solve_fits() return values.

reduction_settings(smoothing=0.0, relative_smooth=False, adaptive_threshold=None, adaptive_algorithm='scaled', error_weighting=True, fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', order_algorithm='bounded', is_covar=False, estimate_covariance=False, jobs=None, adaptive_region_coordinates=None, use_threading=None, use_processes=None)[source]

Define a set of reduction instructions based on user input.

This method is responsible for determining, formatting, and checking a number variables required for the resampling algorithm based on user input. For detailed descriptions of user options, please see ResamplePolynomial.__call__().

Parameters:
smoothingfloat or array_like (n_features,), optional
relative_smoothbool, optional
adaptive_thresholdfloat or array_like (n_features,), optional
adaptive_algorithmstr, optional
error_weightingbool, optional
fit_thresholdfloat, optional
cvalfloat, optional
edge_thresholdfloat or array_like (n_features,), optional
edge_algorithmstr, optional
order_algorithmstr, optional
is_covarbool, optional
estimate_covariancebool, optional
jobsint, optional
adaptive_region_coordinatesarray_like, optional
use_threadingbool, optional

If True, force use of threads during multiprocessing.

use_processesbool, optional

If True, force use of sub-processes during multiprocessing.

Returns:
settingsdict

The reduction settings. Also, stored as ResamplePolynomial.fit_settings().

set_sample_tree(coordinates, radius=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=None, check_memory=True, memory_buffer=0.25, **distance_kwargs)[source]

Build the sample tree from input coordinates.

Parameters:
coordinatesnumpy.ndarray (float)

The input coordinates of shape (n_features, n_samples).

radiusfloat or sequence (float), optional

The radius of the window around each fitting point used to determine sample selection for fit. If not supplied, will be estimated using ResamplePolynomial.estimate_feature_windows().

window_estimate_binsint, optional

Used to estimate the window if not supplied using ResamplePolynomial.estimate_feature_windows().

window_estimate_percentileint or float, optional

Used to estimate the window if not supplied using ResamplePolynomial.estimate_feature_windows().

window_estimate_oversampleint or float, optional

Used to estimate the window if not supplied using ResamplePolynomial.estimate_feature_windows().

leaf_sizeint, optional

Number of points at which to switch to brute-force during the ball tree query algorithm. See sklearn.neighbours.BallTree for further details.

large_databool, optional

If True, indicates that this resampling algorithm will run on a large set of data, and the ball tree should be created on subsets of the data.

check_memorybool, optional

If True, check the memory requirements for resampling the supplied data.

memory_bufferfloat, optional

A fraction (positive or negative) with which to modify the memory estimates for the process memory requirements. Memory estimates are scaled by the factor 1 + memory_buffer.

distance_kwargsdict, optional

Optional keyword arguments passed into sklearn.neighbors.DistanceMetric(). The default is to use the “minkowski” definition with p=2, i.e., the Euclidean definition. This is important in determining which samples lie inside the window region of a resampling point, and when deriving distance weighting factors.

Returns:
None
grig.resample_polynomial.resamp(coordinates, data, *locations, error=None, mask=None, window=None, order=1, fix_order=True, robust=None, negthresh=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=False, smoothing=0.0, relative_smooth=False, adaptive_threshold=None, adaptive_algorithm='scaled', fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', order_algorithm='bounded', error_weighting=True, estimate_covariance=False, is_covar=False, jobs=None, get_error=False, get_counts=False, get_weights=False, get_distance_weights=False, get_rchi2=False, get_cross_derivatives=False, get_offset_variance=False, **distance_kwargs)[source]

ResamplePolynomial data using local polynomial fitting.

Initializes and then calls the ResamplePolynomial class. For further details on all available parameters, please see ResamplePolynomial.__init__() and ResamplePolynomial.__call__().

Parameters:
coordinates
data
locations
error
mask
window
order
fix_order
robust
negthresh
window_estimate_bins
window_estimate_percentile
window_estimate_oversample
leaf_size
large_data
smoothing
relative_smooth
adaptive_threshold
adaptive_algorithm
fit_threshold
cval
edge_threshold
edge_algorithm
order_algorithm
error_weighting
estimate_covariance
is_covar
jobs
get_error
get_counts
get_weights
get_distance_weights
get_rchi2
get_cross_derivatives
get_offset_variance
distance_kwargs
Returns:
resultsfloat or numpy.ndarray or n-tuple of (float or numpy.ndarray)

If a fit is performed at a single location, the output will consist of int or float scalar values. Multiple fits result in numpy arrays. The exact output shape depends on the number of data sets, number of fitted points, dimensions of the fit locations. Assuming that all get_* keywords are set to True, the output order is:

results[0] = fitted values results[1] = error on the fit results[2] = sample counts for each fit results[3] = total weight of all samples in fit results[4] = total distance weight sum of all samples in fit results[5] = reduced chi-squared statistic of the fit results[6] = derivative mean squared cross products results[7] = offset variance of fit from sample distribution

grig.resample_utils module

grig.resample_utils.apply_mask_to_set_arrays(mask, data, phi, error, weights, counts)[source]

Set certain arrays to a fixed size based on a mask array.

Parameters:
masknumpy.ndarray of bool (N,)

Mask where True values indicate the associated element should be kept, and False will result in exclusion from the output arrays.

datanumpy.ndarray (N,)

The data array.

phinumpy.ndarray (n_terms, N)

The polynomial terms of the fit equation.

errornumpy.ndarray

An array of shape (1,) or (N,). If an array of size 1 is supplied, it will be expanded to an array of counts size.

weightsnumpy.ndarray

An array of shape (1,) or (N,). If an array of size 1 is supplied, it will be expanded to an array of counts size.

countsint

The number of True values in the mask. Determines the output size of all arrays.

Returns:
data_out, phi_out, error_out, weight_out4-tuple of numpy.ndarray.

Resized arrays in which the last axis is of size counts.

grig.resample_utils.array_sum(mask)[source]

Return the sum of an array.

Utility function for fast numba calculation of the sum of a 1-D numpy array.

Parameters:
masknumpy.ndarray (N,)

The input array.

Returns:
countint or float

The sum of values.

grig.resample_utils.calculate_adaptive_distance_weights_scaled(coordinates, reference, adaptive_alpha)[source]

Returns distance weights based on offsets and scaled adaptive weighting.

Given a set of \(K\) dimensional coordinates (\(x\)), a single reference position (\(x_{ref}\)), and scaling factors adaptive_alpha (\(A\)), returns the weighting factor:

\[w(x) = exp \left( -\sum_{i=1}^{K} \sum_{j=1}^{K} {{\Delta x}_i A_{i,j} {\Delta x}_j} \right)\]

where \({\Delta x}_k = x_{ref, k} - x_k\) for dimension \(k\).

Unlike calculate_distance_weights_from_matrix(), this function applies the function over multiple sets. In this context, a single set will contain the same independent values (coordinates) as all other sets in a reduction, but the dependent data values may vary between sets. Therefore, it is necessary for the adaptive weighting values to also vary between sets.

The algorithm is equivalent to calculate_adaptive_distance_weights_shaped() except when no rotation is applied, and therefore only a 1-dimensional array, rather than a matrix, is required to perform the transform. The third axis of adaptive_alpha is set to one to allow numba to compile the function successfully, also indicating that there is no need to store the off-diagonal elements since they are all zero. All stretching will therefore occur along the dimensional axes.

Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
referencenumpy.ndarray (n_dimensions,)
adaptive_alphanumpy.ndarray

(n_coordinates, n_sets, 1, n_dimensions) array containing the scaled adaptive weighting factors.

Returns:
weightsnumpy.ndarray (n_sets, n_coordinates)

The distance weighting factors.

grig.resample_utils.calculate_adaptive_distance_weights_shaped(coordinates, reference, shape_matrices)[source]

Returns distance weights based on offsets and shaped adaptive weighting.

Given a set of \(K\) dimensional coordinates (\(x\)), a single reference position (\(o\)), and the symmetric matrix shape_matrices (\(A\)), returns the weighting factor:

\[w(x) = exp(-\Delta x^T A^{-1} \Delta x)\]

where \({\Delta x}_k = o_k - x_k\) for dimension \(k\).

This function is applied to multiple coordinates over multiple sets. In this context, a single set will contain the same independent values (coordinates) as all other sets in a reduction, but the dependent data values may vary between sets. Therefore, it is necessary for the adaptive weighting values to also vary between sets.

Unlike calculate_adaptive_distance_weights_scaled(), the matrix \(A\) allows for the kernel weighting function to be stretched along arbitrarily rotated orthogonal axes.

Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
referencenumpy.ndarray (n_dimensions,)
shape_matricesnumpy.ndarray

(n_coordinates, n_sets, n_dimensions, n_dimensions) array containing the shaped adaptive weighting matrix for all coordinates and sets.

Returns:
weightsnumpy.ndarray (n_sets, n_coordinates)

The distance weighting factors.

grig.resample_utils.calculate_distance_weights(coordinates, reference, alpha)[source]

Returns a distance weighting based on coordinate offsets.

Given a set of \(K\) dimensional coordinates (\(x\)), a single reference position (\(x_{ref}\)), and the scaling factor alpha (\(\alpha\)), returns the weighting factor:

\[w(x) = exp \left( \sum_{k=1}^{K}{\frac{-(x_{ref, k} - x_k)^2}{\alpha_k}} \right)\]
Parameters:
coordinatesnumpy.ndarray (n_dimensions, N)

An array of N coordinates in n_dimensions.

referencenumpy.ndarray (n_dimensions,)

The reference position from which to determine distance offsets for the weighting function.

alphanumpy.ndarray (1 or n_dimensions,)

The distance scaling factor. If an array of size 1 is supplied, it will be applied over all dimensions. Otherwise, a value must be provided for each dimension.

Returns:
weightsnumpy.ndarray (N,)

The distance weighting factors.

Notes

alpha relates to the standard deviation (\(\sigma\)) in a normal distribution as \(\alpha = 2\sigma^2\).

grig.resample_utils.calculate_distance_weights_from_matrix(coordinates, reference, alpha_matrix)[source]

Returns distance weights based on coordinate offsets and matrix operation.

Given a set of \(K\) dimensional coordinates (\(x\)), a single reference position (\(o\)), and the symmetric matrix alpha_matrix (\(A\)), returns the weighting factor:

\[w(x) = exp(-\Delta x^T A \Delta x)\]

where \({\Delta x}_k = o_k - x_k\) for dimension \(k\).

Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_coordinates)

An array of N coordinates in n_dimensions.

referencenumpy.ndarray (n_dimensions,)

The reference position from which to determine distance offsets for the weighting function.

alpha_matrixnumpy.ndarray (n_dimensions, n_dimensions)

Defines the matrix operation to perform on the coordinate offsets. Note that in this implementation, it should be a symmetric matrix such that \(A = A^T\). As such, only the upper triangle is iterated over and any off-diagonal elements in the lower triangle are ignored.

Returns:
weightsnumpy.ndarray (N,)

The distance weighting factors.

grig.resample_utils.calculate_fitting_weights(errors, weights, error_weighting=True)[source]

Calculate the final weighting factor based on errors and other weights.

If error_weighting is applied, the return value is weights / error`^2. Otherwise, simply returns `weights.

Parameters:
errorsnumpy.ndarray (N,)

1-sigma error values to apply to weighting.

weightsnumpy.ndarray (N,)

Other weighting factors, not including any type of error weighting.

error_weightingbool

If False, returns weights, otherwise returns weights / errors^2.

Returns:
fit_weightingnumpy.ndarray (N,)

The final fitting weights.

Notes

The square root of the weight is used in the polynomial system of equations rather than the actual weight due to how the least-squares solution is derived.

For the linear system of equations A.x = B, we are solving (A^T.W.A).C = A^T.W.B, where C are the coefficients we wish to find. If we set X = sqrt(W).A, and Y = sqrt(W).B, this is the same as (X^T.X).C = X^T.Y, which can be solved easily. Another way of thinking about it is that we are minimizing the squared residuals (y - f(x))^2.

grig.resample_utils.check_edge_with_box(coordinates, reference, mask, threshold)[source]

Defines a hyperrectangle edge around a coordinate distribution.

Given a distribution (\(X\)) of \(N\) samples in \(K\) dimensions, the center of mass for dimension \(k\) is:

\[\bar{X}_{k} = \frac{1}{N} \sum_{i=1}^{N}{X_{ik}}\]

The hypercube center is at \(\bar{X}\) and its width in each dimension \(k\) is \(2\beta_k\) where \(\beta = 1 - \text{threshold}\). Note that the resampling algorithm scales all coordinates to a window parameter such that \(|X_k| \leq 1\), and the threshold parameter therefore defines a fraction of the window in the range \(0 < \text{threshold} < 1\). If threshold[k] = 0 or 1, then no edge will be defined for dimension \(k\).

If this definition the “edge” for dimension \(k\) is at \(\bar{X}_k \pm \beta_k\) and reference (\(X_{ref}\)) is considered inside the edge if:

\[| \bar{X}_k - X_{ref, k} | \leq \beta_k, \, \forall k\]
Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_samples)

The coordinates of the sample distribution.

referencenumpy.ndarray (n_dimensions,)

The reference coordinate.

masknumpy.ndarray of bool (n_samples,)

A mask where False values exclude the corresponding sample from the center-of-mass calculation.

thresholdnumpy.ndarray (n_dimensions,)

Defines the half-width dimensions (1 - threshold) of the hyperrectangle centered on the coordinate center of mass in units of the resampling window parameter. Must be in the range 0 < threshold < 1.

Returns:
insidebool

True if reference is inside the distribution “edge” and False otherwise.

grig.resample_utils.check_edge_with_distribution(coordinates, reference, mask, threshold)[source]

Defines an edge based on statistical deviation from a sample distribution.

Given a sample distribution (\(X\)) of coordinates, the deviation of a reference coordinate \(X_{ref}\) from the mean sample distribution \(\bar{X}\) is given as:

\[\sigma_{ref} = \sqrt{(X_{ref} - \bar{X})^{T} \Sigma^{-1} (X_{ref} - \bar{X})}\]

where \(\Sigma\) is the sample covariance of \(X\). In this definition, the “edge” of the distribution is defined at \(\beta = 1 / threshold\) so that reference locations where \(\sigma_{ref} \leq \beta\) are considered inside the distribution edge. For example, setting threshold = 2 will return a False value if \(\sigma_{ref} > 0.5\).

Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_samples)

The coordinates of the sample distribution.

referencenumpy.ndarray (n_dimensions,)

The reference coordinate.

masknumpy.ndarray of bool (n_samples,)

A mask where False values exclude the corresponding sample from any distribution statistics.

thresholdint or float

The “edge” of the distribution is given by 1 / threshold. If the deviation of the reference coordinate from the distribution mean is greater than the edge, a False value will be returned. Setting threshold to zero results in an edge at infinity, i.e., all reference coordinates will be considered inside the distribution edge.

Returns:
insidebool

True if reference is inside the distribution “edge” and False otherwise.

grig.resample_utils.check_edge_with_ellipsoid(coordinates, reference, mask, threshold)[source]

Defines an ellipsoid edge around a coordinate distribution.

Given a distribution (\(X\)) of \(N\) samples in \(K\) dimensions, the center of mass for dimension \(k\) is:

\[\bar{X}_{k} = \frac{1}{N} \sum_{i=1}^{N}{X_{ik}}\]

The ellipsoid center is at \(\bar{X}\) with principle axes given by \(\beta\), where \(\beta = 1 - \text{threshold}\). Note that the resampling algorithm scales all coordinates to a window parameter such that \(|X_k| \leq 1\), and the threshold parameter therefore defines a fraction of the window in the range \(0 < \text{threshold} < 1\). If threshold[k] = 0 or 1, then no edge will be defined for dimension \(k\), and the ellipsoid definition will only apply over remaining dimensions (dimension \(k\) will be ignored in all calculations).

A reference coordinate (\(X_{ref}\)) is considered inside the ellipsoid edge if:

\[\sum_{k=1}^{K}{\frac{(X_{ref, k} - \bar{X}_k)^2}{\beta_k^2}} \leq 1\]
Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_samples)

The coordinates of the sample distribution.

referencenumpy.ndarray (n_dimensions,)

The reference coordinate.

masknumpy.ndarray of bool (n_samples,)

A mask where False values exclude the corresponding sample from the center-of-mass calculation.

thresholdnumpy.ndarray (n_dimensions,)

Defines the principle axes (1 - threshold) of an ellipsoid centered on the coordinate center of mass in units of the resampling window parameter. Must be in the range 0 < threshold < 1.

Returns:
insidebool

True if reference is inside the distribution “edge” and False otherwise.

grig.resample_utils.check_edge_with_range(coordinates, reference, mask, threshold)[source]

Defines an edge based on the range of coordinates in each dimension.

Given a distribution of sample coordinates (\(X\)), and a reference position (\(X_{ref}\)) in \(K\) dimensions, check for each dimension \(k\):

\[\{-\beta_k, \beta_k\} \in [min(X_k - X_{ref, k}), max(X_k - X_{ref, k})] , \, \forall k\]

where \(\beta = 1 - \text{threshold}\). In order words, in each dimension \(k\), there must be at least one member of \(X_k\) at a distance of \(\beta_k\) from \(X_{ref, k}\) for both the positive and negative directions along \(k\).

Note that the resampling algorithm scales all coordinates to a window parameter such that \(|X_k| \leq 1\), and the threshold parameter therefore defines a fraction of the window in the range \(0 < \text{threshold} < 1\). If threshold[k] < 0 or theshold > 1, then no check will be performed for dimension \(k\).

Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_samples)

The coordinates of the sample distribution.

referencenumpy.ndarray (n_dimensions,)

The reference coordinate.

masknumpy.ndarray of bool (n_samples,)

A mask where False values exclude the corresponding sample from the range check.

thresholdnumpy.ndarray (n_dimensions,)

Defines the threshold. Must be in the range 0 < threshold < 1.

Returns:
insidebool

True if reference is inside the distribution “edge” and False otherwise.

grig.resample_utils.check_edges(coordinates, reference, mask, threshold, algorithm=1)[source]

Determine whether a reference position is within a distribution “edge”.

The purpose of this function is to allow the resampling algorithm to determine whether a fit should be performed at a reference location given a sample distribution of coordinates. If a fit is attempted too far from the mean sample distribution, it will lead to a misleading result which becomes more pronounced at higher fit orders.

Therefore, the sample distribution is assigned an “edge” based on one of four definitions. If the reference position is outside of this edge, a False value will be returned, and no fitting will occur.

For all edge definition algorithms, the threshold parameter will determine the distance of the edge from the sample mean. As threshold is increased, the edge approaches the sample mean resulting in a more severe clipping of fit locations away from the center of a distribution. If threshold = 0, no edge clipping will occur.

Since the main engine of the resampling algorithm relies on numba, the edge algorithm should be supplied as an integer. Please see the relevant function listed below for further details on how the “edge” is defined.

algorithm

Function

1

check_edge_with_distribution()

2

check_edge_with_ellipsoid()

3

check_edge_with_box()

4

check_edge_with_range()

Generally, algorithms are ordered from most to least robust, and slowest to fastest, so, the default (1) is considered the most robust (although slowest) of the available algorithms.

When dealing with more than one dimension, the check_edge_with_distribution() algorithm is recommended as it accounts for the shape of the distribution. If the sample distribution is unknown (as opposed to a set of uniformly spaced coordinates), there is a chance for some samples to be (or to approach) a collinear distribution. Attempting a fit at any location in a tangential direction away from the distribution would likely result in a very poor fit.

Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_samples)

The coordinates of the sample distribution.

referencenumpy.ndarray (n_dimensions,)

The reference coordinate to test.

masknumpy.ndarray of bool (n_samples,)

A mask where True values indicate a sample should be included in the edge determination.

thresholdnumpy.ndarray (n_dimensions,)

A threshold parameter determining how close an edge should be to the center of the distribution. Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.

algorithmint, optional

Integer specifying which edge definition to use. Please see above for the associated functions. Invalid choices will disable edge checking.

Returns:
insidebool

True if the reference coordinate is inside the edge of the sample distribution, and False otherwise.

grig.resample_utils.check_orders(orders, coordinates, reference, algorithm=1, mask=None, minimum_points=None, required=False, counts=-1)[source]

Checks the sample distribution is suitable for a polynomial fit order.

For a polynomial fit to be successful at a given order, one needs to ensure that the sample distribution is suitable for such a fit. At a minimum, there need to be enough samples to derive a fit. However, unless the samples are known to be distributed in such a way where a fit is always possible (such as regularly spaced samples), there is the possibility that the fit may be under-determined. For example, it is not possible to perform any other fit than the mean of sample values (order 0) if the samples all share the same coordinate.

This problem is compounded by the use of numba which does not allow any exception handling. If a fit fails at a single reference position, the whole algorithm will fail.

There are several algorithms available, ordered in decreasing robustness, but increasing speed. The chosen algorithm must be supplied using an integer label (because numba). Please see the relevant function listed in the table below for a more detailed description of each. The order_algorithm parameter is supplied to the main resampling algorithm during __init__ and is used to select the relevant algorithm.

algorithm

Function

order_algorithm

1

check_orders_with_bounds()

‘bounded’

2

check_orders_without_bounds()

‘extrapolate’

3

check_orders_with_counts()

‘counts’

Generally, check_orders_with_bounds() is the most robust and ensures that a fit will not only succeed, but is not likely to deviate widely from the expected result. The check_orders_without_bounds() function should also allow fits to succeed, but allow fits to be generated away from the sample distribution. This may be desirable, for example, if one is resampling an image and wishes to perform fits close to the edge. Finally, check_orders_with_counts() is fast, but there is a decent possibility that the fit will fail if the user cannot guarantee that the samples are distributed appropriately.

In rare cases, a distribution of collinear samples, not aligned along any dimensional axis may pass the check_orders test causing resampling algorithm to fail. In this case, please consider using check_edge_with_distribution() to perform such a check. This may be invoked by supplying edge_algorithm=’distribution’ during the initialization of the main resampling algorithm.

Parameters:
algorithmint

An integer specifying the order checking algorithm to use. If an invalid option is supplied (not listed in the table above), the return value of -1 will abort any subsequent fitting.

ordersnumpy.ndarray of int

The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions. This serves as an upper limit for the check. If the samples are distributed in a way that allows for a fit to be performed using orders, the return value will also be orders.

coordinatesnumpy.ndarray (n_dimensions, n_samples)

The coordinates of the sample distribution. Not used by the counts algorithm, but must still be supplied as an array with the correct dimensions.

referencenumpy.ndarray (n_dimensions,)

The coordinates of the point at which to perform a fit. Only required by the ‘bounded’ algorithm, but a value of the correct array shape must still be supplied.

masknumpy.ndarray of bool (n_samples,), optional

An optional mask where False values indicate the associated sample should not be included in determining the maximum order.

minimum_pointsint, optional

The minimum number of points required to perform a fit of the desired order, optionally passed in for speed if pre-calculated. Only used by the ‘counts’ algorithm.

requiredbool, optional

If required is False, the maximum available order given the distribution will be returned (up to a maximum of orders). If required is True, and the maximum available order is less than orders, the first element of the return value will be set to -1, indicating the criteria was not met.

countsint, optional

This is required by the ‘counts’ algorithm. If counts < 0, it will be determined by sum(mask). If counts is less than zero, and a mask is not supplied, not fit will be performed.

Returns:
maximum_ordersnumpy.ndarray

An array of shape (1,) or (n_dimensions,) based on whether a single orders was passed in for all dimensions, or each dimension has a separate order requirement. If required was set to True, and the sample distribution did not allow for the requested order, the first element will be set to -1. Otherwise, if required was False, the maximum order for each dimension will be returned. If a single orders was to be applied over all dimensions, the return value will also be of size 1, but contains the min(maximum_order) over all dimensions.

grig.resample_utils.check_orders_with_bounds(orders, coordinates, reference, mask=None, required=False)[source]

Checks maximum order for sample coordinates bounding a reference.

Given the coordinates of a sample distribution (\(X\)), a reference position (\(X_{ref}\)), and the desired orders of fit (\(o\)), returns the maximum available order of fit.

For dimension \(k\) define the sets of unique values:

\[\begin{split}s_k^- = \{ x \in X_k |\, x < X_{ref, k} \} \\ s_k^+ = \{ x \in X_k |\, x > X_{ref, k} \}\end{split}\]

The maximum order is then given as:

\[o_k^{max} = min\{ |s_k^-|, |s_k^+|, o_k \}\]

where \(|.|\) represents the cardinality (size) of the set.

For example, consider a 1-dimensional set of coordinates:

\[X = [1, 1, 1, 2, 3, 4, 5, 5, 5, 5, 6]\]

and we wish to perform an order=3 polynomial fit at \(X_{ref}=2.5\). There are 4 unique values of \(X > X_{ref}\) (\(\{3, 4, 5, 6\}\)), but only 2 unique values of \(X < X_{ref}\) (\(\{1, 2\}\)). The return value will be 2, indicating that only a 2nd order polynomial fit should be attempted. If a 1st order polynomial fit was requested, the return value would be 1 since there are enough points less than and greater than the reference.

Parameters:
ordersnumpy.ndarray of int

The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions. This serves as an upper limit for the check. If the samples are distributed in a way that allows for a fit to be performed using orders, the return value will also be orders.

coordinatesnumpy.ndarray (n_dimensions, n_coordinates)

The coordinates of the sample distribution.

referencenumpy.ndarray (n_dimensions,)

The reference coordinate.

masknumpy.ndarray of bool (n_coordinates,), optional

An optional mask where False values indicate the associated sample should not be included in determining the maximum order.

requiredbool, optional

If required is False, the maximum available order given the distribution will be returned (up to a maximum of orders). If required is True, and the maximum available order is less than orders, the first element of the return value will be set to -1, indicating the criteria was not met.

Returns:
maximum_ordersnumpy.ndarray

An array of shape (1,) or (n_dimensions,) based on whether a single orders was passed in for all dimensions, or each dimension has a separate order requirement. If required was set to True, and the sample distribution did not allow for the requested order, the first element will be set to -1. Otherwise, if required was False, the maximum order for each dimension will be returned. If a single orders was to be applied over all dimensions, the return value will also be of size 1, but contains the min(maximum_order) over all dimensions.

grig.resample_utils.check_orders_with_counts(orders, counts, mask=None, minimum_points=None, n_dimensions=None, required=False)[source]

Checks maximum order based only on the number of samples.

For \(N\) samples of \(K\) dimensional data, the minimum number of samples required to perform a polynomial fit with orders \(o\) is:

\[N_{min} = \prod_{k=1}^{K}{(o_k + 1)}\]

if \(o_k = o_0, \, \forall k\), then it is possible to suggest a lower order over all dimensions in the case where \(N < N_{min}\). This is given as:

\[o_k^{max} = min\{ floor(N ^ {1 / K} - 1), o_k \}\]

The suggested maximum order is returned by setting the required keyword to False. If the orders vary between dimensions or required is True, the value of \(o_0\) is set to -1 indicating a polynomial fit of the desired order is not possible.

Parameters:
ordersnumpy.ndarray of int

The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions. This serves as an upper limit for the check. If the samples are distributed in a way that allows for a fit to be performed using orders, the return value will also be orders.

countsint

The number of samples available for the fit.

masknumpy.ndarray of bool (n_coordinates,), optional

An optional mask where False values indicate the associated sample should not be included in determining the maximum order. The counts parameter will be ignored in favor of sum(mask).

minimum_pointsint, optional

The minimum number of points required to perform a fit of the desired order, optionally passed in for speed.

n_dimensionsint, optional

If orders was supplied as an array of size 1, but should be applied over multiple dimensions, the number of dimensions should be supplied. Otherwise, the number of dimensions is taken to be equal to the number of orders.

requiredbool, optional

If required is False, the maximum available order given the distribution will be returned (up to a maximum of orders). If required is True, and the maximum available order is less than orders, the first element of the return value will be set to -1, indicating the criteria was not met.

Returns:
maximum_ordersnumpy.ndarray

An array of shape (1,) or (n_dimensions,) based on whether a single orders was passed in for all dimensions, or each dimension has a separate order requirement. If required was set to True, and the sample distribution did not allow for the requested order, the first element will be set to -1. Unlike check_orders_with_bounds() or check_orders_without_bounds(), a suggested maximum order can only be returned by setting required to False if orders are equal in all dimensions. Otherwise, it is impossible to know which dimension the order should be reduced for. In this case, the first element of maximum_orders will be set to -1.

grig.resample_utils.check_orders_without_bounds(orders, coordinates, mask=None, required=False)[source]

Checks maximum order based on unique samples, irrespective of reference.

Given the coordinates of a sample distribution (\(X\)), and the desired orders of fit (\(o\)), returns the maximum available order of fit. Unlike check_orders_with_bounds(), the location of the fit is unimportant. All that is required is for enough unique sample coordinates to be available for fitting.

For dimension \(k\), the maximum fit order is given as:

\[o_k^{max} = min\{ |X_k| - 1, o_k \}\]

where \(|.|\) represents the cardinality (size) of the set.

For example, consider a 1-dimensional set of coordinates:

\[X = [1, 1, 1, 2, 3, 4]\]

The maximum order of fit would be 3 since there are 4 unique values (1, 2, 3, 4). Therefore, when orders >= 3, the return value would be 3. For orders < 3, orders would be returned.

If we had a 2-dimensional set of data:

\[X = [(1, 0), (1, 0), (1, 0), (2, 1), (3, 1), (4, 1)]\]

The maximum order of fit would be 3 in the first dimension, and 1 in the second.

Parameters:
ordersnumpy.ndarray of int

The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions. This serves as an upper limit for the check. If the samples are distributed in a way that allows for a fit to be performed using orders, the return value will also be orders.

coordinatesnumpy.ndarray (n_dimensions, n_coordinates)

The coordinates of the sample distribution.

masknumpy.ndarray of bool (n_coordinates,), optional

An optional mask where False values indicate the associated sample should not be included in determining the maximum order.

requiredbool, optional

If required is False, the maximum available order given the distribution will be returned (up to a maximum of orders). If required is True, and the maximum available order is less than orders, the first element of the return value will be set to -1, indicating the criteria was not met.

Returns:
maximum_ordersnumpy.ndarray

An array of shape (1,) or (n_dimensions,) based on whether a single orders was passed in for all dimensions, or each dimension has a separate order requirement. If required was set to True, and the sample distribution did not allow for the requested order, the first element will be set to -1. Otherwise, if required was False, the maximum order for each dimension will be returned. If a single orders was to be applied over all dimensions, the return value will also be of size 1, but contains the min(maximum_order) over all dimensions.

grig.resample_utils.convert_to_numba_list(thing)[source]

Converts a Python iterable to a Numba list for use in jitted functions.

Parameters:
thingiterable
Returns:
numba.typed.List()
grig.resample_utils.coordinate_covariance(coordinates, mean=None, mask=None, dof=1)[source]

Calculate the covariance of a distribution.

Given the sample distribution of \(N\) coordinates (\(X\)) in \(K\) dimensions, the sample covariance is given as:

\[\Sigma = E[(X - E[X])(X - E[X])^T]\]

where \(\Sigma\) is a \(K \times K\) matrix and \(E\) denotes the expected value. In the general case where the expected value of \(X\) is unknown and derived from the distribution itself, the covariance of the samples between dimension \(i\) and \(j\) is:

\[\Sigma_{ij} = \frac{1}{N - M} \sum_{k=1}^{N} {(X_{ki} - \bar{X}_i)(X_{kj} - \bar{X}_j)}\]

where \(M\) is the number of degrees of freedom lost (dof) in determining the mean (\(\bar{X}\)). If the mean is not provided, it will be calculated using coordinate_mean() in which case the default dof of 1 is appropriate.

Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_coordinates)

The coordinates of the distribution.

meannumpy.ndarray (n_dimensions,), optional

The mean of the coordinate distribution in each dimension. If not provided, the expected value in each dimension will be calculated using coordinate_mean().

masknumpy.ndarray (n_coordinates,), optional

An array of bool values where True indicates a coordinate should be included in the calculation, and False indicates that a coordinate should be ignored. By default, all coordinates are included.

dofint or float, optional

The lost degrees of freedom, typically 1 to indicate that the population mean is not known and is replaced by the sample mean.

Returns:
covariancenumpy.ndarray of numpy.float64 (n_dimensions, n_dimensions)

The covariance of the sample distribution.

grig.resample_utils.coordinate_mean(coordinates, mask=None)[source]

Returns the mean coordinate of a distribution.

Given a distribution of \(N\) coordinates in \(K\) dimensions, the mean coordinate in dimension \(k\) is given as:

\[\bar{x_k} = \frac{\sum_{i=1}^{N}{w_i x_{k,i}}}{\sum_{i=1}^{N}{w_i}}\]

where \(w_i = 0\) if mask[i] is False and \(w_i = 1\) if mask[i] is True.

Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_coordinates)

The coordinate distribution.

masknumpy.ndarray of bool (n_coordinates,), optional

An array of bool values where True indicates a coordinate should be included in the calculation, and False indicates that a coordinate should be ignored. By default, all coordinates are included.

Returns:
mean_coordinatenumpy.ndarray (n_dimensions,)

The mean coordinate of the distribution.

grig.resample_utils.covariance_matrix_inverse(amat, phi, error, weights, rank=None)[source]

Calculates the inverse covariance matrix inverse of the fit coefficients.

If the least-squares solution to a fit is given as \(y = \hat{c} \cdot \Phi\) when \(y = c \cdot \Phi + \epsilon\), the inverse covariance on the estimated fit coefficients \(\hat{c}\) is given as:

\[\Sigma^{-1} = \frac{N}{N - M} (\Phi^T W Var(y) \Phi)^{-1}\]

where \(N\) are the number of samples fit, \(M\) are the lost degrees of freedom, W are the fit weights, and \(Var(y)\) is related to the error (\(\sigma\)) by:

\[Var(y) = diag(1 / \sigma^2)\]

Note that during the fitting process, it is common for the inverse of the covariance matrix \(\Sigma\) to have already been calculated (not factoring in lost degrees of freedom) if inverse variance was included as a weighting factor for the fit. If so, it should be passed in as amat (\(A\)), and the final covariance is simply given by:

\[\Sigma^{-1} = \frac{N}{N - M}A^{-1}\]

If amat was not calculated, it should be supplied as an array of shape (0, 0) (because numba).

Parameters:
amatnumpy.ndarray (n_terms, n_terms)

The matrix A as described above. If the shape of amat is set to (0, 0), it will be calculated using the error, weights, and phi terms. This should only be done if A was weighted by both error and weights.

phinumpy.ndarray (n_terms, N)

The polynomial terms for each of the N data samples.

errornumpy.ndarray (N,)

The 1-sigma errors.

weightsnumpy.ndarray (N,)

The weighting factor applied to each sample when determining the least squares solution of the fit. Note that this must not factor in any error based weighting. Therefore, in the case of the resampling algorithm, it should refer to the distance weighting factor of each sample from the resampling point, or an array of ones (np.ones(N)) if distance weighting was not applied.

rankint or float, optional

The matrix rank of amat, optionally passed in for speed if pre-calculated.

Returns:
inverse_covariancenumpy.ndarray (nterms, nterms)

The covariance matrix inverse of fit coefficients.

grig.resample_utils.derivative_mscp(coefficients, phi_samples, derivative_map, sample_weights)[source]

Return the weighted mean-square-cross-product (mscp) of sample derivatives.

Given a polynomial equation of the form:

\[f(\Phi) = c \cdot \Phi\]

The derivative is calculated as:

\[\frac{\partial f}{\partial X_k} = \sum_{m=1}^{M} {h_{k, 0, m} \cdot c_{h_{k, 1, m}} \cdot \Phi_{h_{k, 2, m}}}\]

for an equation of \(M\) terms at the coordinate \(X\) in dimension \(k\), where \(h\) is the derivative_map and \(c\) are the coefficients. Please see polynomial_derivative_map() for a more complete description of the derivative calculation.

One the derivatives (\(g = \frac{df}{dX}\)) are calculated for all samples, they are averaged, and the cross-product is returned as:

\[\bar{g}^2 = \frac{1}{tr(W W^T)} g^T W W^T g\]

where \(W = diag(\text{weights})\).

For example, for polynomial fit of 2-dimensional data \(f(x, y)\), the returned matrix will be:

\[\begin{split}\bar{g}^2 = \begin{bmatrix} \frac{\partial f}{\partial x} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial x} \frac{\partial f}{\partial y} \\ \frac{\partial f}{\partial y} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \frac{\partial f}{\partial y} \end{bmatrix}\end{split}\]
Parameters:
coefficientsnumpy.ndarray (n_terms,)

The coefficients of a polynomial fit for each term.

phi_samplesnumpy.ndarray (n_terms, n_samples)

The polynomial terms of the sample coordinates. Please see polynomial_exponents() for a description of this variable.

derivative_mapnumpy.ndarray

An array of shape (n_dimensions, 3, n_valid_terms). Please see polynomial_derivative_map() for an explanation of this variable.

sample_weightsnumpy.ndarray (n_samples,)

The weighting to apply to each sample when determining the weighted mean (as a multiplier).

Returns:
mscpnumpy.ndarray (n_dimensions, n_dimensions)

An array where sscp[i, j] = derivative[i] * derivative[j]. where derivative is the weighted mean derivatives over all samples.

grig.resample_utils.distribution_variances(coordinates, mean=None, covariance=None, mask=None, sigma_inv=None, dof=1)[source]

Return variance at each coordinate based on coordinate distribution.

Given a normal sample distribution \(X\), returns the variance at each sample coordinate. For example, consider a population of zero mean (\(\bar{X} = 0\)) and a standard deviation of one (\(\sigma = 1\)). Samples located at \(\bar{X} \pm \sigma\) will return a variance of 1, while samples located at \(\bar{X} \pm 2\sigma\) will return a variance of 4.

By default, the distribution variance is derived using coordinate_covariance(), and the sample mean is derived using coordinate_mean() assuming the loss of 1 degree of freedom. However, the expected value (\(E[X]\)) may be supplied with the mean optional parameter along with the covariance, and the number of lost degrees of freedom (dof).

Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_samples)

The coordinates of the sample distribution.

meannumpy.ndarray (n_dimensions,), optional

The expected mean value of the distribution.

covariancenumpy.ndarray (n_dimensions, n_dimensions), optional

The covariance matrix (if known) for the sample distribution.

masknumpy.ndarray of bool (n_samples,), optional

An array of bool values where True indicates a sample should be included when calculating the mean and covariance, and False indicates that a sample should be ignored. By default, all samples are included. The output variance will still be calculated for all samples.

sigma_invnumpy.ndarray (n_dimensions, n_dimensions), optional

The inverse of the covariance matrix, optionally passed in for speed if the covariance is known, and it’s inverse has been pre-calculated.

dofint or float, optional

The lost degrees of freedom, typically 1 to indicate that the population mean is not known and is replaced by the sample mean.

Returns:
variancenumpy.ndarray (n_samples,)

The variance at each sample coordinate based on the sample distribution.

grig.resample_utils.estimated_covariance_matrix_inverse(phi, error, weights, rank=None)[source]

Calculates covariance matrix inverse of fit coefficients from mean error.

An estimate to the covariance of fit parameters is given by:

\[\Sigma = \frac{N}{N - M} \bar{\sigma}^2 (\Phi^T W \Phi)^{-1}\]

where N are the number of samples used in the fit, \(M\) are the number of lost degrees of freedom, \(W\) are the weights applied to the samples during the fit, and the estimated coefficients \(\hat{c}\) are used to fit \(y = \hat{c} \cdot \Phi + \sigma\) to the sample population \(y = c \cdot \Phi + \epsilon\). The weighted mean of the squared error (\(\bar{\sigma}^2\)) is given by:

\[\bar{\sigma}^2 = \frac{\sum_{i=1}^{N}{w_i e_i^2}} {\sum_{i=1}^{N}{w_i}}\]

The final returned matrix is \(\Sigma^{-1}\).

Parameters:
phinumpy.ndarray (nterms, N)

The polynomial terms for each of the N data samples.

errornumpy.ndarray (N,)

The 1-sigma errors or residuals to the fit.

weightsnumpy.ndarray (N,)

The weighting of each sample in the fit, not including any error weighting.

rankint or float, optional

The matrix rank of \(\Phi^T W \Phi\), optionally passed in for speed if pre-calculated.

Returns:
covariance_inversenumpy.ndarray (nterms, nterms)

The inverse covariance matrix.

grig.resample_utils.evaluate_derivative(coefficients, phi_point, derivative_map)[source]

Calculates the derivative of a polynomial at a single point.

Please see polynomial_derivative_map() for a full description of how the derivatives are calculated from a polynomial equation defined by polynomial_exponents(). These also explain how one should transform the independent variables to the “phi” (\(\Phi\)) terms (which may be done using polynomial_terms()).

The derivative at a point is calculated by:

\[\frac{\partial f}{\partial X_k} = \sum_{m=1}^{M} {h_{k, 0, m} \cdot c_{h_{k, 1, m}} \cdot \Phi_{h_{k, 2, m}}}\]

for a polynomial equation consisting of \(M\) terms at the coordinate \(X\) in dimension \(k\), where \(h\) is the derivative map.

Parameters:
coefficientsnumpy.ndarray (n_terms,)

The coefficients of the polynomial equation for each term.

phi_pointnumpy.ndarray (n_terms,)

The polynomial terms of the fittin equation at a single coordinate.

derivative_mapnumpy.ndarray

An array of shape (n_dimensions, 3, n_valid_terms). The second dimension (of size 3) gives a constant multiplier in the first element, the coefficient index in the second element, and the phi index in the second element. The third dimension will generally be of a smaller size than the number of terms in the polynomial equation as not all are required to calculate the derivative. Due to the fact that some dimensions may contain more valid terms than others, n_valid_terms is set to the maximum number of valid terms over all dimensions. Any invalid terms still remaining in the mapping array will have multipliers set to zero, and index pointers set to -1.

Returns:
derivativenumpy.ndarray (n_dimensions,)

The partial derivative of the polynomial equation with respect to each dimension.

grig.resample_utils.evaluate_derivatives(coefficients, phi_points, derivative_map)[source]

Calculates the derivative of a polynomial at multiple points.

Please see polynomial_derivative_map() for a full description of how the derivatives are calculated from a polynomial equation defined by polynomial_exponents(). These also explain how one should transform the independent variables to the “phi” (\(\Phi\)) terms (which may be done using polynomial_terms()).

The derivative at a point is calculated by:

\[\frac{\partial f}{\partial X_k} = \sum_{m=1}^{M} {h_{k, 0, m} \cdot c_{h_{k, 1, m}} \cdot \Phi_{h_{k, 2, m}}}\]

for a polynomial equation consisting of \(M\) terms at the coordinate \(X\) in dimension \(k\), where \(h\) is the derivative map.

Parameters:
coefficientsnumpy.ndarray (n_terms,)

The coefficients of the polynomial equation for each term.

phi_pointsnumpy.ndarray (n_terms, n_points)

The polynomial terms of the fitting equation at a multiple points.

derivative_mapnumpy.ndarray

An array of shape (n_dimensions, 3, n_valid_terms). The second dimension (of size 3) gives a constant multiplier in the first element, the coefficient index in the second element, and the phi index in the second element. The third dimension will generally be of a smaller size than the number of terms in the polynomial equation as not all are required to calculate the derivative. Due to the fact that some dimensions may contain more valid terms than others, n_valid_terms is set to the maximum number of valid terms over all dimensions. Any invalid terms still remaining in the mapping array will have multipliers set to zero, and index pointers set to -1.

Returns:
derivativesnumpy.ndarray (n_dimensions, n_points)

The partial derivative of the polynomial equation with respect to each dimension at each point.

grig.resample_utils.fasttrapz(y, x)[source]

Fast 1-D integration using Trapezium method.

Approximates the integration of a 1-D discrete valued function \(y_i = f(x_i)\) with \(N\) measurements as:

\[\int_a^b f(x) \approx \frac{1}{2} \sum_{i=1}^{N}{ \left( y_{i - 1} + y_i \right) \left( x_i - x_{i - 1} \right) }\]
Parameters:
ynumpy.ndarray (N,)

Dependent variable

xnumpy.ndarray (N,)

Independent variable

Returns:
areafloat

The integrated area

grig.resample_utils.fit_phi_value(phi, coefficients)[source]

Returns the dot product of phi and coefficients.

A utility function for use in calculating the polynomial fit based on the polynomial terms of the independent values (phi), and a set of calculated coefficients.

The return value for phi (\(\Phi\)) terms and coefficients (\(c\)) each consisting of \(L\) terms is:

\[y = \sum_{l=1}^{L}{c_l \Phi_l}\]

The polynomial terms \(\Phi\) are pre-calculated and used in place of regular independent values \(x\) in the resampling algorithm to avoid the unnecessary recalculation of terms in a polynomial equation. For example, if fitting

\[y = 5 x_0 x_1 + 6 x_0 x_3^2 + 7 x_1^3 x_4^2 + 8 x_0 x_1 x_2 x_3\]

we set

\[ \begin{align}\begin{aligned}c = [5,\, 6,\, 7,\, 8]\\\Phi = [x_0 x_1,\, x_0 x_3^2,\, x_1^3 x_4^2,\, x_0 x_1 x_2 x_3]\end{aligned}\end{align} \]

and then only need to perform the simple fast calculation

\[y = c \cdot \Phi\]
Parameters:
phinumpy.ndarray (n_coefficients,)

Polynomial terms of independent values.

coefficientsnumpy.ndarray (n_coefficients,)

Coefficients used to determine fit.

Returns:
fitfloat

The fitted value.

grig.resample_utils.fit_phi_variance(phi, inv_covariance)[source]

Calculates variance given the polynomial terms of a coordinate.

The output variance for given polynomial terms phi (\(\Phi\)) is given as:

\[V = \Phi^T Var(\hat{c}) \Phi\]

where \(Var(\hat{c})\) is the covariance matrix inverse of the fit coefficients (inv_covariance) such that \(Var(\hat{c})_{i, j}\) gives the covariance between the coefficients for terms \(i\) and \(j\), and the coefficients \(\hat{c}\) define the fit:

\[y = \hat{c} \cdot \Phi\]
Parameters:
phinumpy.ndarray (ncoeffs,)

The polynomial terms.

inv_covariancenumpy.ndarray (ncoeffs, ncoeffs)

The covariance matrix inverse of the fit coefficients.

Returns:
variancefloat

The calculated variance.

grig.resample_utils.fit_residual(data, phi, coefficients)[source]

Calculates the residual of a polynomial fit to data.

The residual is calculated using the matrix operation Y - CX where Y is the dataset, C are the coefficients and X is the phi polynomial terms.

Parameters:
datanumpy.ndarray (n_samples,)

Data from which coefficients were derived.

phinumpy.ndarray (n_terms, n_samples)

Polynomial terms of independent values of each sample.

coefficientsnumpy.ndarray (n_terms,)

Coefficient values.

Returns:
residualnumpy.ndarray (n_samples,)

The residual

grig.resample_utils.half_max_sigmoid(x, x_half=0.0, k=1.0, a=0.0, c=1.0, q=1.0, b=1.0, v=1.0)[source]

Evaluate a special case of the logistic function where f(x0) = 0.5.

The generalized logistic function is given as:

\[f(x) = A + \frac{K - A} {\left( C + Q e^{-B(x - x_0)} \right)^{1/\nu}}\]

and may be evaluated with logistic_curve().

We can manipulate this function so that \(f(x_{half}) = (K + A) / 2\) (the midpoint of the function) by setting the location of maximum growth (\(x_0\)) to occur at:

\[x_0 = x_{half} + \frac{1}{B} \ln{\left( \frac{2^\nu - C}{Q} \right)}\]

Since a logarithm is required, it is incumbent on the user to ensure that no logarithms are taken of any quantity \(\leq 0\), i.e., \((2^\nu - C) / Q > 0\).

Parameters:
xint or float or numpy.ndarray (shape)

The independent variable.

x_halfint or float or numpy.ndarray (shape), optional

The x value for which f(x) = 0.5.

kint or float or numpy.ndarray (shape), optional

The upper asymptote when c is one.

aint or float or numpy.ndarray (shape), optional

The lower asymptote.

cint or float or numpy.ndarray (shape), optional

Typically takes a value of 1. Otherwise, the upper asymptote is a + ((k - a) / c^(1/v)).

qint or float or numpy.ndarray (shape), optional

Related to the value of f(0). Fixes the point of inflection. In this implementation, q is completely factored out after simplifying and does not have any affec

bint or float or numpy.ndarray (shape), optional

The growth rate.

vint or float or numpy.ndarray (shape), optional

Must be greater than zero. Affects near which asymptote the maximum growth occurs (point of inflection).

Returns:
resultfloat or numpy.ndarray

The half-max sigmoid evaluated at x.

grig.resample_utils.logistic_curve(x, x0=0.0, k=1.0, a=0.0, c=1.0, q=1.0, b=1.0, v=1.0)[source]

Evaluate the generalized logistic function.

The generalized logistic function is given as:

\[f(x) = A + \frac{K - A} {\left( C + Q e^{-B(x - x_0)} \right)^{1/\nu}}\]

Taken from Wikipedia contributors. (2020, June 11). Generalised logistic function. In Wikipedia, The Free Encyclopedia. Retrieved 23:51, July 6, 2020, from https://en.wikipedia.org/w/index.php?title=Generalised_logistic_function&oldid=961965809

Parameters:
xint or float or numpy.ndarray (shape)

The independent variable.

x0int or float or numpy.ndarray (shape), optional

An offset applied to x.

kint or float or numpy.ndarray (shape), optional

The upper asymptote when c is one.

aint or float or numpy.ndarray (shape), optional

The lower asymptote.

cint or float or numpy.ndarray (shape), optional

Typically takes a value of 1. Otherwise, the upper asymptote is a + ((k - a) / c^(1/v)).

qint or float or numpy.ndarray (shape), optional

Related to the value of f(0).

bint or float or numpy.ndarray (shape), optional

The growth rate.

vint or float or numpy.ndarray (shape), optional

Must be greater than zero. Affects near which asymptote the maximum growth occurs.

Returns:
resultfloat or numpy.ndarray (shape)

The logistic function evaluated at x.

grig.resample_utils.multiple_polynomial_terms(coordinates, exponents)[source]

Derive polynomial terms for a coordinate set given polynomial exponents.

Raises multiple coordinates by a power and then calculates the product over all dimensions. For example, the output of an (x, y) vector with exponent=[[2, 3]] would be \(x^2y^3\).

Note that multiple sets of exponents are expected to be provided during this operation, so the exponents parameter should be a 2-dimensional array. The return value will be a 2-dimensional array with the size of the first dimension equal to the number of exponent sets, and the size of the second dimension equal to the number of vector sets.

Parameters:
coordinatesnumpy.ndarray (N, n_vectors)

Sets of vectors in N-dimensions.

exponentsnumpy.ndarray (n_exponents, N)

Sets of exponents by which to raise the vector.

Returns:
numpy.ndarray of numpy.float64 (n_exponents, n_vectors)

The product of the exponentiation of the vectors.

grig.resample_utils.multivariate_gaussian(covariance, coordinates, center=None, normalize=False)[source]

Return values of a multivariate Gaussian in K-dimensional coordinates.

The density of a multivariate Gaussian is given as:

\[f_{\mathbf X}(x_1, \ldots, x_k) = \frac {\exp\left(-\frac{1}{2} (x - \mu)^T \Sigma^{-1}(x - \mu) \right)} {\sqrt{(2 \pi)^K |\Sigma|}}\]

where the coordinates \(X\) are real \(K\) dimensional vectors, \(\Sigma\) is the covariance matrix with determinant \(|\Sigma|\), and the center of the distribution is given by \(\mu\).

Note that by default, the factor \({\sqrt{(2 \pi)^K |\Sigma|}}\) is not applied unless normalize is True.

Parameters:
covariancenumpy.ndarray (n_dimensions, n_dimensions)

The covariance matrix (\(\Sigma\)). Should be symmetric and positive definite.

coordinatesnumpy.ndarray (n_dimensions, n_coordinates)

The coordinates at which to evaluate the multivariate Gaussian.

centernumpy.ndarray (n_dimensions,), optional

The center of the distribution. If not supplied, the center is assumed to be zero in all dimensions.

normalizebool, optional

If True, normalize by dividing by \(\sqrt{(2 \pi)^k |\Sigma|}\).

Returns:
densitynumpy.ndarray (n_coordinates,)

The density evaluated at each coordinate.

grig.resample_utils.no_fit_solution(set_index, point_index, fit_out, error_out, counts_out, weights_out, distance_weights_out, rchi2_out, offset_variance_out, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True, cval=nan)[source]

Fill output arrays with set values on fit failure.

On fit failure, the output arrays are filled with certain values indicating that a fit is not possible. Count, and weight arrays contain zeros; the error, reduced chi-squared and offset variance are set to NaN; finally, the data array is set to cval, a user set float value.

Parameters:
set_indexint

An integer representing the data set for which a fit cannot be performed.

point_indexint

An integer representing the index of the fit coordinate at which the fit cannot be performed.

fit_outnumpy.ndarray (n_sets, n_coordinates)

The output fit values.

error_outnumpy.ndarray (n_sets, n_coordinates)

The output error values on the fit.

counts_outnumpy.ndarray (n_sets, n_coordinates)

The number of samples used to create the fit.

weights_outnumpy.ndarray (n_sets, n_coordinates)

The sum of full weights applied to samples in the fit.

distance_weights_outnumpy.ndarray (n_sets, n_coordinates)

The sum of only the distance weights applied to samples in the fit.

rchi2_outnumpy.ndarray (n_sets, n_coordinates)

The reduced chi-squared statistic of the fit.

offset_variance_outnumpy.ndarray (n_sets, n_coordinates)

The variance as derived from the offset of the fit coordinate from the sample distribution.

get_errorbool, optional

If False do not update error_out.

get_countsbool, optional

If ‘False’ do not update counts_out.

get_weightsbool, optional

If False do not update weights_out.

get_distance_weightsbool, optional

If False do not update distance_weights_out.

get_rchi2bool, optional

If False do not update rchi2_out.

get_offset_variancebool, optional

If False do not update offset_variance_out.

cvalfloat, optional

The fill value for data_out on fit failure.

Returns:
None

All arrays are updated in-place.

grig.resample_utils.offset_variance(coordinates, reference, mask=None, mean=None, sigma_inv=None, scale=1.0, dof=1)[source]

Variance at reference coordinate derived from distribution uncertainty.

Given a distribution of coordinates (\(X\)), calculate the variance at a reference coordinate (\(X_{ref}\)) based upon the uncertainty in the coordinate distribution. Firstly, the distribution covariance matrix (\(\Sigma\)) is calculated by coordinate_covariance() enabling the variance at the reference position to be given as:

\[V(X_{ref}) = (X_{ref} - E[X])^{T} \Sigma^{-1} (X_{ref} - E[X])\]

If the expected value of the distribution is known (\(E[X]\)) or pre-calculated, it can be passed in to the function using the mean optional parameter along with the lost degrees of freedom (dof) spent in determining \(E[X]\). If not, the default is to use \(\bar{X}\) and dof = 1.

The user may optionally specify a scale factor (\(\beta\)) such that:

\[V(X_{ref}) = (\beta(X_{ref} - E[X]))^{T} \Sigma^{-1} (\beta(X_{ref} - E[X]))\]

or:

\[V(X_{ref}) = \beta^2 (X_{ref} - E[X])^{T} \Sigma^{-1} (X_{ref} - E[X])\]
Parameters:
coordinatesnumpy.ndarray (n_dimensions, n_coordinates)

The coordinates of the distribution.

referencenumpy.ndarray (n_dimensions,)

The reference coordinate.

masknumpy.ndarray of bool (n_coordinates,), optional

An array of bool values where True indicates a coordinate should be included in the calculation, and False indicates that a coordinate should be ignored. By default, all coordinates are included.

meannumpy.ndarray (n_dimensions,), optional

The mean of the coordinate distribution in each dimension. If not provided, the expected value in each dimension will be calculated using coordinate_mean().

scaleint or float, optional

The scaling factor described above.

dofint or float, optional

The lost degrees of freedom, typically 1 to indicate that the population mean is not known and is replaced by the sample mean.

sigma_invnumpy.ndarray (n_dimensions, n_dimensions), optional

If the covariance matrix of the coordinate distribution has already been calculated, the matrix inverse may be passed in as sigma_inv for speed.

Returns:
variancefloat

The variance at the reference coordinate determined from the coordinate distribution.

grig.resample_utils.polynomial_derivative_map(exponents)[source]

Creates a mapping from polynomial exponents to derivatives.

Please see polynomial_exponents() for details on how a polynomial equation is defined within the resampling algorithm, and the use of the exponents array in defining the polynomial terms (\(\Phi\)).

Within the confines of the resampling algorithm, the polynomial exponents should have always been defined in a way that will always allow the derivative of a polynomial fit to be calculated from existing, pre-calculated \(\Phi\) terms.

For example, consider the 2-dimensional 2nd order polynomial equation and its derivatives in each dimension:

\[f(x, y) = c_1 + c_2 x + c_3 x^2 + c_4 y + c_5 x y + c_6 y^2\]
\[\frac{\partial f}{\partial x} = c_2 + 2 c_3 x + c_5 y\]
\[\frac{\partial f}{\partial y} = c_4 + c_5 x + 2 c_6 y\]

Converting \(f(x, y) \rightarrow f(\Phi)\) we get:

\[f(\Phi) = c_1 \Phi_1 + c_2 \Phi_2 + c_3 \Phi_3 + c_4 \Phi_4 + c_5 \Phi_5 + c_6 \Phi_6\]

It can then be seen that

\[\frac{\partial f}{\partial x} = c_2 \Phi_1 + 2 c_3 \Phi_2 + c_5 \Phi_4\]
\[\frac{\partial f}{\partial y} = c_4 \Phi_1 + c_5 \Phi_2 + 2 c_6 \Phi_4\]

Generalizing for a polynomial equation consisting of \(M\) terms of the independent variable \(X\) in \(K-\text{dimensions}\), a mapping (\(h\)) can be devised enabling calculation of the derivatives from pre-existing terms. For dimension \(k\):

\[\frac{\partial f}{\partial X_k} = \sum_{m=1}^{M} {h_{k, 0, m} \cdot c_{h_{k, 1, m}} \cdot \Phi_{h_{k, 2, m}}}\]

This allows the derivative to be calculated from the existing polynomial terms (\(\Phi\)) and coefficients (\(c\)). In addition, the mapping can be calculated prior to reduction and will therefore only need to be calculated once along with \(\Phi\). Once the coefficients are known, the derivatives can then be calculated using \(h\).

Derivatives may be evaluated using evaluate_derivative() and evaluate_derivatives().

Parameters:
exponentsnumpy.ndarray (n_terms, n_dimensions)

The exponents defining a polynomial equation.

Returns:
derivative_mapnumpy.ndarray of int

An array of shape (n_dimensions, 3, n_valid_terms). The second dimension (of size 3) gives a constant multiplier in the first element, the coefficient index in the second element, and the phi index in the second element. The third dimension will generally be of a smaller size than the number of terms in the polynomial equation as not all are required to calculate the derivative. Due to the fact that some dimensions may contain more valid terms than others, n_valid_terms is set to the maximum number of valid terms over all dimensions. Any invalid terms still remaining in the mapping array will have multipliers set to zero, and index pointers set to -1.

grig.resample_utils.polynomial_exponents(order, ndim=None, use_max_order=False)[source]

Define a set of polynomial exponents.

The resampling algorithm uses defines a set of polynomial exponents as an array of shape (dimensions, terms) for an equation of the form:

\[f( \Phi ) = \sum_{m=1}^{M}{c_m \Phi_m}\]

for \(M\) terms. Here, \(\Phi_m\) represents the product of independent variables, each raised to an appropriate power as defined by exponents. For example, consider the equation for 2-dimensional data with independent variables \(x\) and \(y\):

\[f(x, y) = c_1 + c_2 x + c_3 x^2 + c_4 y + c_5 x y + c_6 y^2\]

In this case:

exponents = [[0, 0],  # represents a constant or x^0 y^0
             [1, 0],  # represents x
             [2, 0],  # represents x^2
             [0, 1],  # represents y
             [1, 1],  # represents xy
             [0, 2]]  # represents y^2

The resampling algorithm solves for the coefficients (\(c\)) by converting \(f(X) \rightarrow f(\Phi)\) for \(K-\text{dimensional}\) independent variables (\(X\)) and exponents (\(p\)) by setting:

\[\Phi_m = \prod_{k=1}^{K}{X_{k}^{p_{m, k}}}\]

In most of the code, the \(\Phi\) terms are interchangable with “polynomial terms”, and in the above example \(\Phi_5 = xy\) since exponents[4] = [1, 1] representing \(x^1 y^1\).

Note that for all terms (\(m\)) in each dimension \(k\), \(\sum_{k=1}^{K}{p_{m, k}} \leq max(\text{order})\). In addition, if use_max_order is False (default), \(p_{m,k} \leq \text{order}[k]\).

Parameters:
orderint or array_like of int

Polynomial order for which to generate exponents. If an array will create full polynomial exponents over all len(order) dimensions.

ndimint, optional

If set, return Taylor expansion for ndim dimensions for the given order if order is not an array.

use_max_orderbool, optional

This keyword is only applicable for multi-dimensional data when orders are unequal across dimensions. When True, the maximum exponent for each dimension is equal to max(order). If False, the maximum available exponent for dimension k is equal to order[k].

Returns:
exponentsnumpy.ndarray

(n_terms, n_dimensions) array of polynomial exponents.

Examples

>>> polynomial_exponents(3)
array([[0],
       [1],
       [2],
       [3]])
>>> polynomial_exponents([1, 2])
array([[0, 0],
       [1, 0],
       [0, 1],
       [1, 1],
       [0, 2]])
>>> polynomial_exponents(3, ndim=2)
array([[0, 0],
       [1, 0],
       [2, 0],
       [3, 0],
       [0, 1],
       [1, 1],
       [2, 1],
       [0, 2],
       [1, 2],
       [0, 3]])
grig.resample_utils.polynomial_terms(coordinates, exponents)[source]

Derive polynomial terms given coordinates and polynomial exponents.

Raises a single coordinate or multiple coordinates by a power and then calculates the product over all dimensions. For example, the output of an (x, y) vector with exponent=[[2, 3]] would be \(x^2y^3\).

Note that multiple sets of exponents are expected to be provided during this operation, so the exponents parameter should be a 2-dimensional array. If a single N-dimensional vector is provided, the output will be a 1-dimensional array with a single value for each exponent set. If multiple vectors are provided, the output will be of shape (number of exponent sets, number of vectors).

Parameters:
coordinatesnumpy.ndarray (N, n_vectors) or (N,)

Sets of coordinates in N-dimensions or a single coordinate of N-dimensions.

exponentsnumpy.ndarray (n_exponents, N)

Sets of polynomial exponents to apply to coordinates.

Returns:
numpy.ndarray of numpy.float64 (n_exponents, n_vectors) or (n_exponents,)

The polynomial terms.

grig.resample_utils.relative_density(sigma, counts, weight_sum, tolerance=None, max_dim=4)[source]

Returns the relative density of samples compared to a uniform distribution.

The relative local density is defined as 1 for a uniform distribution, < 1 for a distribution that is sparse near the center, and > 1 when clustered around the center.

The sum of the distance_weights returned from a Gaussian weighting function on the samples is required for this calculation. The weighting function should be of the form:

\[w(\Delta x) = exp \left( -\sum_{k=1}^{K}{\frac{-\Delta x_k^2}{2 \sigma_k^2}} \right)\]

over \(K\) dimensions where \(\Delta x_k\) is the offset of a sample in dimension \(k\) from the point of interest, and \(\sigma\) must be supplied to relative_density as sigma, where distance_weights = \(\sum_{i=1}^{N}{w(x_i)}\) and counts = \(N\). Note that \(\sigma\) and \(x\) must be scaled such that the principle axis of an ellipsoid window containing all samples are equal to unity (principle axis in dimension \(k\) is \(\Omega_k = 1\) such that \(\prod_{k=1}^{K}{\Omega_k} = 1\) below).

The local relative density is then given as:

\[\rho = \frac{\rho(\text{measured})}{\rho(\text{uniform})}\]

where,

\[\rho(\text{uniform}) = N \frac{\Gamma \left( 1 + \frac{K}{2} \right)} {\pi^{K/2} \prod_{k=1}^{K}{\Omega_k}}\]
\[\rho(\text{measured}) = \frac {\sum_{i=1}^{N}{w_i}} {\int \cdots \int_{R} w(\mathbf{\Delta x}) \, {dx}_1 \cdots {dx}_K}\]

and region \(R\) satisfies the requirement \(\| \mathbf{\Delta x} \|_2 \leq 1\)

Parameters:
sigmanp.ndarray (n_dimensions,)

The standard deviation of the Gaussian weighting function used to calculate the distance_weights for each dimension.

countsint or float or numpy.ndarray (N,)

The number of data samples included in the sum of distance weights.

weight_sumint or float or numpy.ndarray (N,)

The sum of weights as returned from a Gaussian weighting function.

tolerancefloat, optional

Relative error tolerance passed to scipy.integrate.quad when determining the integral of the weighting function. The default of 10^(2*dim - 7) determined by testing, balancing precision with speed and convergence.

max_dimint, optional

If the number of dimensions is greater than max_dim, do not attempt to calculate the relative density since the integral calculation is unlikely to converge and will take a vast amount of time. The return output will be 1.0 or an array of ones (N,). The maximum recommended number of dimensions is 4 (default).

Returns:
float or numpy.ndarray of float64 (N,)

The relative density.

grig.resample_utils.scale_coordinates(coordinates, scale, offset, reverse=False)[source]

Apply scaling factors and offsets to N-dimensional data.

The two available transforms are controlled by the reverse. The transform functions apply the following functions:

Reverse

f(x)

False (default)

(x - offset) / scale

True

(x * scale) + offset

Parameters:
coordinatesnumpy.ndarray (N, M) or (N,)

Either a 1 or 2-dimensional array may be supplied. If a 1-dimensional array is supplied, it is assumed that it represents a single coordinates in N-dimensions. If a 2-dimensional array is supplied, it should be of shape (N, M) where N is the number of dimensions, and M is the number of coordinates.

scalenumpy.ndarray (N,)

The scaling factor to apply to each dimension.

offsetnumpy.ndarray (N,)

The offset to apply to each dimension.

reversebool, optional

If True, apply the reverse transform. The default is False.

Returns:
numpy.ndarray of numpy.float64 (N, M) or (N,)

The scaled coordinates array.

grig.resample_utils.scale_forward_scalar(coordinate, scale, offset)[source]

Applies the function f(x) = (x - offset) / scale to a single coordinate.

This is a numba jit compiled function.

Parameters:
coordinatenumpy.ndarray (N,)

An array where N is the number of dimensions.

scalenumpy.ndarray (N,)

The scaling factor to apply to each dimension.

offsetnumpy.ndarray (N,)

The offset to apply to each dimension.

Returns:
numpy.ndarray of numpy.float64 (N,)

The scaled coordinates array.

grig.resample_utils.scale_forward_vector(coordinates, scale, offset)[source]

Applies the function f(x) = (x - offset) / scale to a coordinate array.

This is a numba jit compiled function.

Parameters:
coordinatesnumpy.ndarray (N, M)

An array where N is the number of dimensions, and M is the number of coordinates.

scalenumpy.ndarray (N,)

The scaling factor to apply to each dimension.

offsetnumpy.ndarray (N,)

The offset to apply to each dimension.

Returns:
numpy.ndarray of numpy.float64 (N, M)

The scaled coordinates array.

grig.resample_utils.scale_reverse_scalar(coordinate, scale, offset)[source]

Applies the function f(x) = (x * scale) + offset to a single coordinate.

This is a numba jit compiled function.

Parameters:
coordinatenumpy.ndarray (N,)

An array where N is the number of dimensions.

scalenumpy.ndarray (N,)

The scaling factor to apply to each dimension.

offsetnumpy.ndarray (N,)

The offset to apply to each dimension.

Returns:
numpy.ndarray of numpy.float64 (N,)

The scaled coordinates array.

grig.resample_utils.scale_reverse_vector(coordinates, scale, offset)[source]

Applies the function f(x) = (x * scale) + offset to a coordinate array.

This is a numba jit compiled function.

Parameters:
coordinatesnumpy.ndarray (N, M)

An array where N is the number of dimensions, and M is the number of coordinates.

scalenumpy.ndarray (N,)

The scaling factor to apply to each dimension.

offsetnumpy.ndarray (N,)

The offset to apply to each dimension.

Returns:
numpy.ndarray of numpy.float64 (N, M)

The scaled coordinates array.

grig.resample_utils.scaled_adaptive_weight_matrices(sigma, rchi2_values, fixed=None)[source]

Wrapper for scaled_adaptive_weight_matrix over multiple values.

Please see scaled_adaptive_weight_matrix() for details on how the weighting kernel is modified using a single scaling factor. This function performs the calculation for multiple scaling factors (\(\chi_r^2\)).

Parameters:
sigmanumpy.ndarray (n_dimensions,)

The standard deviations of the Gaussian for each dimensional component used for the distance weighting of each sample in the initial fit.

rchi2_valuesnumpy.ndarray (n_data_sets, fit_shape)

The reduced chi-squared statistics of the fit for each data set. Here, fit_shape is an arbitrary array shape which depends upon the shape of the output fit coordinates defined by the user.

fixednumpy.ndarray of bool (n_dimensions,), optional

If supplied, True values indicate that the width of the Gaussian along the corresponding axis should not be altered in the output result.

Returns:
scaled_matricesnumpy.ndarray

The scaled weighting kernel with shape (n_data_sets, fit_shape, 1, n_dimensions) where fit_shape is determined by the shape of the output fit coordinates supplied by the user, and n_data_sets is the number of data sets to be fit. The third axis (of size 1), is a dummy dimension required for Numba to compile successfully. The last dimension contains the new scaled inverse \(\alpha_{scaled,k}^{-1}\) values as described in scaled_adaptive_weight_matrix().

grig.resample_utils.scaled_adaptive_weight_matrix(sigma, rchi2, fixed=None)[source]

Scales a Gaussian weighting kernel based on a prior fit.

In the standard resampling algorithm, a polynomial fit may weight each sample (\(x\)) according to its distance from the reference position at which the fit is derived (\(x_{ref}\)) such that samples closer to the reference position have more influence on the fit than those that are farther. The weighting function used is:

\[w(x) = exp \left( -\sum_{k=1}^{K}{\frac{(x_{ref, k} - x_k)^2}{2 \sigma_k^2}} \right)\]

in \(K\) dimensions where \(\sigma\) (supplied to this function via sigma) is a scaling factor, equivalent to the standard deviation of a normal distribution. Following a fit, it is also possible to generate a reduced chi-squared statistic (\(\chi_r^2\)) which measures the “goodness” of fit.

With this information we can rescale sigma in an attempt to get \(\chi_r^2 \rightarrow 1\) i.e., get a good fit within noise limitations. This function assumes that if \(\chi_r^2 < 1\), the samples have been over-fit, and therefore, the weighting function should be “widened” to allow more distant samples to have a stronger influence on the fit and subsequent \(\chi_r^2\) calculation. Likewise, if \(\chi_r^2 > 1\), this implies that the weighting function should be truncated so that the fit focuses more strongly on providing a good fit to nearby samples. i.e., there is likely structure away from the fit location that cannot be modelled well by a polynomial of the given order.

To accomplish this, the weighting kernel is rescaled such that:

\[\chi_r^2 \prod_{k=1}^{K}{\sigma_{scaled, k}^2} = \prod_{k=1}^{K}{\sigma_k^2}\]

The reason is that for a multivariate Gaussian:

\[\int_{R^K} exp \left( -\frac{1}{2} (x - x_{ref})^T \Sigma^{-1} (x - x_{ref}) \right) = (2 \pi)^{K/2} |\Sigma|^{1/2}\]

where \(\sigma^2 = diag(\Sigma)\):

\[|\Sigma| \propto \prod_{k=1}^{K}{\sigma_k^2} \propto \chi_r\]

Note that in this specific implementation, the shape of the weighting kernel remains unchanged and only the overall size is allowed to vary. Therefore, a single scaling factor (\(\beta\)) is applied over all dimensions such that:

\[\sigma_{scaled, k} = \frac{\sigma_k}{\sqrt{\beta}}\]

where

\[\beta = \chi_r^{1 / K}\]

To reduce subsequent calculations, a scaled \(\alpha\) value is passed out instead of \(\sigma_{scaled}\) where:

\[\alpha = 2 \sigma^2\]

Therefore, the final output value will be:

\[\alpha_{scaled, k}^{-1} = \frac{\beta}{2 \sigma_k^2}\]

Finally, scaling does not need to occur across all dimensions, and it is possible to fix the shape of the kernel in one or more dimensions by using the fixed parameter. If this is the case:

\[\beta = \chi_r^{\frac{1}{K - K_{fixed}}}\]

where \(K_{fixed}\) is the number dimensions in which scaling has been disabled.

Parameters:
sigmanumpy.ndarray (n_dimensions,)

The standard deviations of the Gaussian for each dimensional component used for distance weighting of each sample in the initial fit.

rchi2float

The reduced chi-squared statistic of the fit.

fixednumpy.ndarray of bool (n_dimensions,), optional

If supplied, True values indicate that the width of the Gaussian along the corresponding axis should not be altered in the output result.

Returns:
inverse_alphanumpy.ndarray (n_dimensions,)

The scaled sigma values converted to the inverse alpha array, required by calculate_adaptive_distance_weights_scaled() to create a set of weighting factors.

grig.resample_utils.shaped_adaptive_weight_matrices(sigma, rchi2_values, gradient_mscp, density=None, variance_offsets=None, fixed=None)[source]

Wrapper for shaped_adaptive_weight_matrix over multiple values.

Please see shaped_adaptive_weight_matrix() for details on how the weighting kernel is modified using a scale factor and measure of the derivatives of the fitting function. This function performs the calculation for multiple scaling factors (\(\chi_r^2\)) and derivative measures.

Parameters:
sigmanumpy.ndarray (n_dimensions,)

The standard deviations of the Gaussian for each dimensional component used for the distance weighting of each sample in the initial fit.

rchi2_valuesnumpy.ndarray (n_sets, shape)

The reduced chi-squared statistics of the fit for each data set. Here, shape is an arbitrary array shape which depends upon the shape of the output fit coordinates defined by the user.

gradient_mscpnumpy.ndarray (n_sets, shape, n_dimensions, n_dimensions)

An array where gradient_mscp[i, j] = derivative[i] * derivative[j] in dimensions i and j. Please see derivative_mscp() for further information. The last two dimensions must be Hermitian and real-valued (symmetric) for each fit set/coordinate.

densitynumpy.ndarray (n_sets, shape)

The local relative density of the samples around the fit coordinate. A value of 1 represents uniform distribution. Values greater than 1 indicate clustering around the fitting point, and values less than 1 indicate that samples are sparsely distributed around the fitting point. Please see relative_density() for further information.

variance_offsetsnumpy.ndarray (n_sets, shape)

The variance at the fit coordinate determined from the sample coordinate distribution. i.e., if a fit is performed at the center of the sample distribution, the variance is zero. If done at 2-sigma from the sample distribution center, the variance is 4.

fixednumpy.ndarray of bool (n_dimensions,), optional

If supplied, True values indicate that the width of the Gaussian along the corresponding axis should not be altered in the output result.

Returns:
shape_matricesnumpy.ndarray (n_sets, shape, n_dimensions, n_dimensions)

Shape matrices defined for each set/coordinate.

grig.resample_utils.shaped_adaptive_weight_matrix(sigma, rchi2, gradient_mscp, density=1.0, variance_offset=0.0, fixed=None, tolerance=None)[source]

Shape and scale the weighting kernel based on a prior fit.

In the standard resampling algorithm, a polynomial fit may weight each sample coordinate (\(x\)) according to its distance from the reference position at which the fit is derived (\(x_{ref}\)) such that samples closer to the reference position have more influence on the fit than those that are farther. The weighting function used is:

\[W = exp(-\Delta x^T A^{-1} \Delta x)\]

where \({\Delta x}_k = x_{ref, k} - x_k\) for dimension \(k\), and \(A\) is a symmetric positive definite matrix defining the “shape” of the weighting kernel. This effectively defines a multivariate Gaussian centered on the reference point where overall size, rotation, and stretch of each principle axis may be altered. The goal of shaping the weighting kernel \(A\), is to produce a fit where the reduced chi-squared statistic of the fit equal to one (\(\chi_r^2 = 1\)). To derive the shape \(A\), an initial fit must have first been performed using a square diagonal matrix \(A_0\), whose diagonal elements are the square of the sigma parameter (\(diag(A_0) = 2 \sigma^2\)). It is then easy to define \(A_0^{-1}\) in \(K\) dimensions as:

\[\begin{split}A_0^{-1} = \frac{1}{2} \begin{bmatrix} \frac{1}{\sigma_0^2} & 0 & \dots & 0 \\ 0 & \frac{1}{\sigma_1^2} & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \dots & 0 & \frac{1}{\sigma_K^2} \end{bmatrix}\end{split}\]

Following (or during) the initial fit with \(A_0\), the mean square cross products of the derivatives evaluated at the sample coordinates should be calculated using derivative_mscp(), where the returned matrix has values:

\[\bar{g}_{ij}^2 = \frac{\partial \bar{f}}{\partial X_i} \frac{\partial \bar{f}}{\partial X_j}\]

for dimensions \(X_i\) and \(X_j\) in K-dimensions, where \(\partial \bar{f} / \partial X_i\) is the weighted mean of the partial derivatives over all samples in the fit with respect to \(X_i\), with weighting defined by \(W\) (above) using \(A_0^{-1}\).

The matrix \(\bar{g}^2\) is only used to define the shape of new weighting kernel, not the overall size. Therefore, it is normalized such that \(|\bar{g}^2| = 1\).

We can then use singular value decomposition to factorize \(\bar{g}^2\) into:

\[\bar{g}^2 = U S V^T\]

Here, the matrices \(U\) and \(V^T\) represent rotations (since \(|\bar{g}^2| > 0\)), and the singular values (\(S\)) can be thought of as the magnitudes of the semi-axes of an ellipsoid in K-dimensional space. Naively, this provides us with a basis from which to determine the final “shape” matrix where:

\[A^{-1} = \beta\, \bar{g}^2\]

and \(\beta\) is an as yet undetermined scaling factor representing the overall size of the new kernel. The resulting weighting kernel has the shape of an ellipsoid with the smallest semi-axis oriented parallel to the gradient. In other words, the new weighting will result in a fit that is less sensitive to distant samples in directions where the gradient is high, and more sensitive to distant samples in directions where the gradient is low.

However, we must still keep in mind that our overall goal is to get \(\chi_r^2 \rightarrow 1\) when a fit is performed using the new kernel \(A\). For example, if \(\chi_r=1\) in the initial fit, there is no need to modify the kernel, and if \(\chi_r < 1\), then we do not want to get an even better fit.

Another factor to consider is that we cannot be completely confident in this new shape due to the distribution of samples. If we are fitting at a point that is away from the center of the sample distribution, it is unadvisable to use a highly shaped kernel due to increased uncertainty in the mean partial derivatives. Furthermore, even if we are fitting close to the center of the distribution, that does not mean that the derivative calculations were not skewed by a few nearby samples when fitting in a local depression of the sample density (for example, near the center of a donut-like distribution).

We model our confidence (\(\gamma\)) in the “shape” using a logistic function (see stretch_correction()) that factors in \(\chi_r^2\), a measure of the sample density profile (\(\rho\)) at the fit coordinate, and from the deviation of the fit coordinate from the center of the sample distribution (\(\sigma_d\)):

\[\gamma = \frac{2} {\left( {1 + (2^{exp(\sigma)} - 1)e^{\rho (1 - \chi_r^2)}} \right)^{1/exp(\sigma)}} - 1\]

The density \(\rho\) is calculated using relative_density() which sets \(\rho=1\) when the samples are uniformly distributed, \(\rho > 1\) when the distribution is concentrated on the fit coordinate, and \(0 < \rho < 1\) when the fitting in a local depression of the distribution density. The deviation (\(\sigma_d\)) is calculated using offset_variance().

The confidence parameter has asymptotes at \(\gamma = \pm 1\), is equal to zero at \(\chi_r^2=1\), is positive for \(\chi_r^2 > 1\), and is negative for \(\chi_r^2 < 1\). Also, the magnitude increases with \(\rho\), and decreases with \(\sigma_d\). The final shape matrix is then defined as:

\[A^{-1} = \beta\, U S^{\gamma} V^T\]

Note that as \(\gamma \rightarrow 0\), the kernel approaches that of a spheroid. For \(\chi_r^2 > 1\), the kernel approaches \(\bar{g}^2\). When \(\chi_r^2 < 1\), the kernel effectively rotates so that the fit becomes increasingly sensitive to samples along the direction of the derivative. The overall size of the kernel remains constant since \(|S^{\gamma}| = |\bar{g}^2| = 1\).

Finally, the only remaining factor to calculate is the scaling factor \(\beta\) which is given as:

\[\beta = \left( \frac{\chi_r}{|A_0|} \right)^{1/K}\]

Note that this scaling has the effect of setting

\[\frac{|A_0|}{|A|} = \chi_r\]

The user has the option of fixing the kernel in certain dimensions such that \({A_0}_{k, k} = A_{k, k}\). If this is the case, for any fixed dimension \(k\) we set:

\[ \begin{align}\begin{aligned}{U S^{\gamma} V^T}_{k, k} = 1\\{U S^{\gamma} V^T}_{k, i \neq k}^2 = 0\\{U S^{\gamma} V^T}_{i \neq k, k}^2 = 0\end{aligned}\end{align} \]

meaning the scaling is unaltered for dimensions \(k\), and no rotation will be applied to any other dimension with respect to \(k\). Since the overall size must be controlled through fewer dimensions, \(\beta\) must take the form of a diagonal matrix:

\[diag(\beta)_{i \in fixed} = \frac{1}{2 \sigma_i^2} ,\,\, diag(\beta)_{i \notin fixed} = \left( \frac{\chi_r |A_0|}{|{U S^{\gamma} V^T}|} \prod_{i \in fixed}{2 \sigma_i^2} \right)^{1 / (K - K_{fixed})}\]

Once again \(A^{-1} = \beta\, U S^{\gamma} V^T\).

Parameters:
sigmanumpy.ndarray (n_dimensions,)

The standard deviations of the Gaussian for each dimensional component used for the distance weighting of each sample in the initial fit.

rchi2float

The reduced chi-squared statistic of the fit.

gradient_mscpnumpy.ndarray (n_dimensions, n_dimensions)

An array where gradient_mscp[i, j] = derivative[i] * derivative[j] in dimensions i and j. Please see derivative_mscp() for further information. Must be Hermitian and real-valued (symmetric).

densityfloat, optional

The local relative density of the samples around the fit coordinate. A value of 1 represents uniform distribution. Values greater than 1 indicate clustering around the fitting point, and values less than 1 indicate that samples are sparsely distributed around the fitting point. Please see relative_density() for further information.

variance_offsetfloat, optional

The variance at the fit coordinate determined from the sample coordinate distribution. i.e., if a fit is performed at the center of the sample distribution, the variance is zero. If done at 2-sigma from the sample distribution center, the variance is 4.

fixednumpy.ndarray of bool (n_dimensions,), optional

If supplied, True values indicate that the width of the Gaussian along the corresponding axis should not be altered in the output result.

tolerancefloat, optional

The threshold below which SVD values are considered zero when determining the matrix rank of derivative_mscp. Please see numpy.linalg.matrix_rank() for further information.

Returns:
shape_matrixnumpy.ndarray (n_dimensions, n_dimensions)

A matrix defining the shape of the weighting kernel required by calculate_adaptive_distance_weights_shaped() to create a set of weighting factors.

grig.resample_utils.sigmoid(x, factor=1.0, offset=0.0)[source]

Evaluate a scaled and shifted logistic function.

The sigmoid function has the form:

\[f(x) = \frac{1}{1 + e^{\beta (x - \alpha)}}\]

where \(\beta\) is the scaling factor, and \(\alpha\) is an offset applied to x.

Parameters:
xint or float or numpy.ndarray (shape)

The independent variable. If an array is supplied, must be the same shape as factor and offset (if both/either are also arrays).

factorint or float or numpy.ndarray (shape)

The scaling factor applied to x. If an array is supplied, must be the same shape as x and offset (if both/either are also arrays).

offsetint or float or numpy.ndarray (shape)

The offset to applied to x. If an array is supplied, must be the same shape as x and factor (if both/either are also arrays).

Returns:
resultfloat or numpy.ndarray (shape)

The sigmoid function evaluated at x.

grig.resample_utils.single_polynomial_terms(coordinate, exponents)[source]

Derive polynomial terms for a single coordinate given polynomial exponents.

Raises a single coordinate by a power and then calculates the product over all dimensions. For example, the output of an (x, y) vector with exponent=[[2, 3]] would be \(x^2y^3\).

Note that multiple sets of exponents are expected to be provided during this operation, so the exponents parameter should be a 2-dimensional array. The return value will be a 1-dimensional array with size equal to the number of exponent sets provided.

Parameters:
coordinatenumpy.ndarray (N,)

The coordinate in each dimension.

exponentsnumpy.ndarray (n_exponents, N)

Sets of exponents to apply to the coordinate.

Returns:
numpy.ndarray of numpy.float64 (n_exponents,)

The polynomial terms for the coordinate.

grig.resample_utils.solve_amat_beta(phi, data, weights)[source]

Convenience function returning matrices suitable for linear algebra.

Given independent variables \(\Phi\), data \(y\), and weights \(W\), returns matrices \(A\) and \(B\) where:

\[ \begin{align}\begin{aligned}A = \Phi W \Phi^T\\B = \Phi W y\end{aligned}\end{align} \]
Parameters:
phinumpy.ndarray (n_terms, n_samples)

Polynomial terms of independent variables for each sample in the fit.

datanumpy.ndarray (n_samples,)

Sample data values.

weightsnumpy.ndarray (n_samples,)

Squared weights.

Returns:
A, Bnumpy.ndarray (n_terms, n_terms), numpy.ndarray (n_terms,)

The \(A\) and \(B\) terms described above.

grig.resample_utils.solve_coefficients(amat, beta)[source]

Find least squares solution of Ax=B and rank of A.

Parameters:
amatnumpy.ndarray (ncoeffs, ndata)

The coefficient array.

betanumpy.ndarray (ncoeffs,) or (ncoeffs, N)

Dependent values. If 2-dimensional, the least-squares solution is calculated for each column.

Returns:
rank, xint, numpy.ndarray (min(M, N),)

The rank of amat and least-squares solution of x.

grig.resample_utils.solve_fit(window_coordinates, window_phi, window_values, window_error, window_mask, window_distance_weights, fit_coordinate, fit_phi, order, is_covar=False, fit_threshold=0.0, mean_fit=False, cval=nan, error_weighting=True, estimate_covariance=False, order_algorithm_idx=1, term_indices=None, derivative_term_map=None, derivative_term_indices=None, edge_algorithm_idx=1, edge_threshold=None, minimum_points=None, get_error=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True)[source]

Solve for a fit at a single coordinate.

Solves a polynomial fit of the form:

\[f(\Phi) = \hat{c} \cdot \Phi\]

where \(\hat{c}\) are the derived polynomial coefficients for the \(\Phi\) terms. The \(\Phi\) terms are derived from the independent values of the samples within the window region of the fit coordinate, and from the fit coordinates themselves (see polynomial_terms() for further details).

The \(\Phi\) terms are pre-calculated early in the resampling algorithm as this is a relatively cheap calculation, and we do not want to repeat the same calculation multiple times. For example, if sample[1] is within the window region of point[1] and point[2], there should be no need to repeat the polynomial term calculation twice. Initially, one might think that the actual coordinates could then be discarded, but there are a number of calculations that depend on the sample coordinates relative to the fitting points, which must therefore be dealt with “on-the-fly”.

EDGE CHECKING

The first of the on-the-fly calculation is the “edge check”. Generally, polynomial fits are not well-defined away from the sample distribution from which they were derived. This is especially true for higher order fits that may fit the sample distribution well, but start to deviate wildly when evaluated outside of the distribution. The edge check step defines a border around the distribution, outside of which the fit will be aborted. There are a number of algorithms available which vary in robustness and speed. Please see check_edges() for details on available algorithms.

ORDER CHECKING

The next step is to determine if it is possible to perform a polynomial fit of the given order. For example, a 1-dimensional 2nd order polynomial fit can only be derived from a minimum of 3 samples. Additionally, if some samples share the same coordinate, the fit becomes underdetermined, or if dealing with multidimensional data, one needs to ensure that the samples are not colinear. If we also wish to propagate or derive valid errors, we should ensure the system is overdetermined. There are a number of order checking algorithms which vary in robustness and speed. Please see check_orders() for details on the available algorithms.

There are two available actions if the samples fail the order check. The first (default) is to simply abort fitting. The second option is to lower the order of fit until the samples meet the order check requirements. This is only possible if the fit order is equal across all dimensions. To allow for variable orders, set the order_term_indices (see parameter descriptions) to a valid value, and update window_phi and fit_phi accordingly.

FITTING

If the above checks pass, a fit can be attempted. There are actually three types of fit that may be performed. The first is the standard polynomial fit described above. The second is a weighted mean which may explicitly be performed by setting mean_fit to True, or may be performed on-the-fly if the order was lowered to zero during the order check. Finally, if is_covar was set to True, the window_values are considered covariances to propagate, and a fit will derived by propagating a weighted variance.

FINAL VALIDATION

If a polynomial fit was performed, a final check may be performed to confirm that the solution does not deviate to significantly from the expected values. This is done by evaluating the reduced chi-squared statistic of the fit (\(\chi_r^2\)). If \(\sqrt{\chi_r^2} > | \text{fit\_threshold} |\), the fit is not accepted, and is aborted if fit_threshold < 0, or set to the weighted mean of the samples if fit_threshold > 0. No validation will be performed if fit_threshold is set to zero (default). Note that window_error must be supplied in order for validation to be meaningful.

Parameters:
window_coordinatesnumpy.ndarray (n_dimensions, n_samples)

The independent coordinates within the window region of the fitting coordinate.

window_phinumpy.ndarray (n_terms, n_samples)

The polynomial terms of window_coordinates. Please see polynomial_terms() for further details.

window_valuesnumpy.ndarray (n_samples,)

The dependent values of the samples.

window_errornumpy.ndarray (n_samples,)

The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (1,) in which case all samples in a set will share the same associated error value. If the shape is set to (0,) this indicates that no error values are available for the samples.

window_masknumpy.ndarray (n_samples,)

A mask where False indicates that the associated sample should be excluded from the fit.

window_distance_weightsnumpy.ndarray (n_samples,)

The distance weighting factors applied to each sample in the fit.

fit_coordinatenumpy.ndarray (n_dimensions,)

The coordinate of the fitting point.

fit_phinumpy.ndarray (n_terms,)

The polynomial of fit_coordinate. Please see polynomial_terms() for further details.

ordernumpy.ndarray

The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions.

is_covarbool, optional

If True, indicates that window_values contains covariance values that should be propagated through algorithm. If this is the case, polynomial fitting is disabled, and a weighted variance is calculated instead.

fit_thresholdfloat, optional

If fit_threshold is non-zero, perform a check on the goodness of the fit. When the reduced-chi statistic is greater than abs(fit_threshold), the fit is determined to be a failure, and a replacement value is used. If fit_threshold < 0, failed fit values will be set to cval. If fit_threshold > 0, failed fit values will be replaced by the weighted mean.

mean_fitbool, optional

If True, a weighted mean is performed instead of calculating a polynomial fit.

cvalfloat, optional

In a case that a fit is unable to be calculated at certain location, cval determines the returned fit value.

error_weightingbool, optional

If True, weight the samples in the fit by the inverse variance (1 / window_error^2) in addition to distance weighting.

estimate_covariancebool, optional

If True, when determining the error on the fit and reduced chi-squared, calculate the covariance of the fit coefficients using estimated_covariance_matrix_inverse(). Otherwise, use covariance_matrix_inverse().

order_algorithm_idxint, optional

An integer specifying which polynomial order validation algorithm to use. The default (1), will always be the more robust of all available options. For further information, please see check_edges().

term_indicesnumpy.ndarray (> max(order) + 1,), optional

A 1-dimensional lookup array for use in determining the correct phi terms to use for a given polynomial order. The order validation algorithm ensures a fit of the requested order is possible. If not, and the orders are equal in all dimensions, it may also optionally return a suggested order. In this case, order_term_indices is used to select the correct window_phi and fit_phi for a given order (k), where terms are extracted via phi[order_term_indices[k]:order_term_indices[k+1]].

derivative_term_mapnumpy.ndarray, optional

A mapping array for the determination of derivatives from the coefficients of the fit, and available terms in “phi”. The shape of the array is (n_dimensions, 3, n_derivative_terms). This is only required if the gradient is required as an output. For a full description of the derivative map, please see polynomial_derivative_map().

derivative_term_indicesnumpy.ndarray (max(order) + 1,), optional

If the fit order is allowed to vary, gives the indices in derivative_term_map for a given symmetrical order. The correct derivative_term_map mapping for order k is given as derivative_term_map[:, :, indices[k]:indices[k + 2]].

edge_algorithm_idxint, optional

Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see check_edges() for further information. The default (1), is always the most robust of the available algorithms.

edge_thresholdnumpy.ndarray (n_dimensions,)

A threshold parameter determining how close an edge should be to the center of the distribution during check_edges(). Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.

minimum_pointsint, optional

Certain order validation algorithms check the number of available samples as a means to determine what order of fit is appropriate. If pre-calculated for the base order, it may be passed in here for a slight speed advantage.

get_errorbool, optional

If True, return the error on the fit.

get_weightsbool, optional

If True, return the sum of all sample weights used in determining the fit.

get_distance_weightsbool, optional

If True, return the sum of only the distance weights used in determining the fit.

get_rchi2bool, optional

If True, return the reduced chi-squared statistic for each of the fits.

get_cross_derivativesbool, optional

If True, return the derivative mean-squared-cross-products of the samples in the fit. See derivative_mscp() for further information.

get_offset_variancebool optional

If True, return the offset of the fitting point from the sample distribution. See offset_variance() for further information.

Returns:
fit_result8-tuple
fit_result[0]: Fitted value (float).

Set to cval on fit failure.

fit_result[1]: Error on the fit (float).

Set to NaN on fit failure.

fit_result[2]: Number of samples included in the fit (int).

Set to 0 on fit failure.

fit_result[3]: Weight sum (float).

Set to 0.0 on fit failure.

fit_result[4]: Distance weight sum (float).

Set to 0.0 on fit failure.

fit_result[5]: Reduced chi-squared statistic (float).

Set to NaN on fit failure.

fit_result[6]: Derivative mean-squared-cross-product (numpy.ndarray).

Set to shape (0, 0) on fit failure, and (n_dimensions, n_dimensions) otherwise.

fit_result[7]: Offset variance from the distribution center (float).

Set to NaN on fit failure.

grig.resample_utils.solve_fits(sample_indices, sample_coordinates, sample_phi_terms, sample_data, sample_error, sample_mask, fit_coordinates, fit_phi_terms, order, alpha, adaptive_alpha, is_covar=False, mean_fit=False, cval=nan, fit_threshold=0.0, error_weighting=True, estimate_covariance=False, order_algorithm_idx=1, order_term_indices=None, derivative_term_map=None, derivative_term_indices=None, edge_algorithm_idx=1, edge_threshold=None, minimum_points=None, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True)[source]

Solve all fits within one intersection block.

This function is a wrapper for solve_fit() over all data sets and fit points. The main computations here involve:

  1. Creating and populating the output arrays.

  2. Selecting the correct samples within the region of each fitting window.

  3. Calculating the full weighting factors for the fits.

For further details on the actual fitting, please see solve_fit().

Parameters:
sample_indicesnumba.typed.List

A list of 1-dimensional numpy.ndarray (dtype=int) of length n_fits. Each list element sample_indices[i], contains the indices of samples within the “window” region of fit_indices[i].

sample_coordinatesnumpy.ndarray (n_dimensions, n_samples)

The independent coordinates for each sample in n_dimensions

sample_phi_termsnumpy.ndarray (n_terms, n_samples)

The polynomial terms of sample_coordinates. Please see polynomial_terms() for further details.

sample_datanumpy.ndarray (n_sets, n_samples)

The dependent values of the samples for n_sets, each containing n_samples.

sample_errornumpy.ndarray (n_sets, n_samples)

The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (n_sets, 1) in which case all samples in a set will share the same associated error value. If the shape is set to (n_sets, 0), this indicates that no error values are available for the samples.

sample_masknumpy.ndarray (n_sets, n_samples)

A mask where False indicates that the associated sample should be excluded from all fits.

fit_coordinatesnumpy.ndarray (n_dimensions, n_fits)

The independent variables at each fit coordinate in d_dimensions.

fit_phi_termsnumpy.ndarray (n_terms, n_fits)

The polynomial terms of fit_coordinates. Please see polynomial_terms() for further details.

ordernumpy.ndarray

The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions.

alphanumpy.ndarray (n_dimensions,)

A distance weighting scaling factor per dimension. The weighting kernel is applied equally to all sets and samples. For further details, please see calculate_distance_weights(). Will be overridden by adaptive_alpha if adaptive_alpha.size > 0.

adaptive_alphanumpy.ndarray

Shape = (n_samples, n_sets, [1 or n_dimensions], n_dimensions). Defines a weighting kernel for each sample in each set. The function calculate_adaptive_distance_weights_scaled() will be used for kernels of shape (1, n_dimensions), and calculate_adaptive_distance_weights_shaped() will be used for kernels of shape (n_dimensions, n_dimensions). adaptive_alpha is a required parameter due to Numba constraints, and will override the alpha parameter unless it has a size of 0. Therefore, to disable, please set the size of any dimension to zero.

is_covarbool, optional

If True, indicates that sample_data contains covariance values that should be propagated through algorithm. If this is the case, polynomial fitting is disabled, and a weighted variance is calculated instead.

mean_fitbool, optional

If True, a weighted mean is performed instead of calculating a polynomial fit.

cvalfloat, optional

In a case that a fit is unable to be calculated at certain location, cval determines the fill value for the output fit array at those locations.

fit_thresholdfloat, optional

If fit_threshold is non-zero, perform a check on the goodness of the fit. When the reduced-chi statistic is greater than abs(fit_threshold), the fit is determined to be a failure, and a replacement value is used. If fit_threshold < 0, failed fit values will be set to cval. If fit_threshold > 0, failed fit values will be replaced by the weighted mean.

error_weightingbool, optional

If True, weight the samples in the fit by the inverse variance (1 / sample_error^2) in addition to distance weighting.

estimate_covariancebool, optional

If True, calculate the covariance of the fit coefficients using estimated_covariance_matrix_inverse(). Otherwise, use covariance_matrix_inverse().

order_algorithm_idxint, optional

An integer specifying which polynomial order validation algorithm to use. The default (1), will always be the more robust of all available options. For further information, please see check_edges().

order_term_indicesnumpy.ndarray (> max(order) + 1,), optional

A 1-dimensional lookup array for use in determining the correct phi terms to use for a given polynomial order. The order validation algorithm ensures a fit of the requested order is possible. If not, and the orders are equal in all dimensions, it may also optionally return a suggested order. In this case, order_term_indices is used to select the correct sample_phi_terms and fit_phi_terms for a given order (k), where terms are extracted via phi[order_term_indices[k]:order_term_indices[k + 2]].

derivative_term_mapnumpy.ndarray, optional

A mapping array for the determination of derivatives from the coefficients of the fit, and available terms in “phi”. The shape of the array is (n_dimensions, 3, n_derivative_terms). This is only required if the gradient is required as an output. For a full description of the derivative map, please see polynomial_derivative_map().

derivative_term_indicesnumpy.ndarray (max(order) + 1,), optional

If the fit order is allowed to vary, gives the indices in derivative_term_map for a given symmetrical order. The correct derivative_term_map mapping for order k is given as derivative_term_map[:, :, indices[k]:indices[k + 2]].

edge_algorithm_idxint, optional

Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see check_edges() for further information. The default (1), is always the most robust of the available algorithms.

edge_thresholdnumpy.ndarray (n_dimensions,)

A threshold parameter determining how close an edge should be to the center of the distribution during check_edges(). Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.

minimum_pointsint, optional

Certain order validation algorithms check the number of available samples as a means to determine what order of fit is appropriate. If pre-calculated for the base order, it may be passed in here for a slight speed advantage.

get_errorbool, optional

If True, return the error on the fit.

get_countsbool, optional

If True, return the number of samples used when determining the fit at each fitting point.

get_weightsbool, optional

If True, return the sum of all sample weights used in determining the fit at each point.

get_distance_weightsbool, optional

If True, return the sum of only the distance weights used in determining the fit at each point.

get_rchi2bool, optional

If True, return the reduced chi-squared statistic for each of the fitted points.

get_cross_derivativesbool, optional

If True, return the derivative mean-squared-cross-products of the samples for each of the fitted points. See derivative_mscp() for further information.

get_offset_variancebool optional

If True, return the offset of the fitting point from the sample distribution. See offset_variance() for further information.

Returns:
fit_results8-tuple of numpy.ndarray

fit_results[0]: Fitted values. fit_results[1]: Error on the fit. fit_results[2]: Number of samples in each fit. fit_results[3]: Weight sums. fit_results[4]: Distance weight sums. fit_results[5]: Reduced chi-squared statistic. fit_results[6]: Derivative mean-squared-cross-products. fit_results[7]: Offset variances from the sample distribution center.

All arrays except for fit_results[6] have the shape (n_sets, n_fits) or (0, 0) depending on whether get_<name> is True or False respectively. The derivative MSCP is of shape (n_sets, n_fits, n_dimensions, n_dimensions) if requested, and (1, 0, 0, 0) otherwise.

grig.resample_utils.solve_inverse_covariance_matrices(phi, error, residuals, weights, error_weighted_amat=None, rank=None, calculate_error=True, calculate_residual=True, estimate_covariance=False)[source]

Inverse covariance matrices on fit coefficients from errors and residuals.

A utility function to calculate the inverse covariance matrices of the fit coefficients (\(c\)) based on the \(1\sigma\) error values of the sample measurements and/or the residuals of the fit \(y - c \cdot \Phi\).

The function used to calculate the error covariance may be either estimated_covariance_matrix_inverse() or covariance_matrix_inverse(). However, estimated_covariance_matrix_inverse() will always be used to calculate the covariance matrix derived from residuals.

Parameters:
phinumpy.ndarray (nterms, N)

The polynomial terms for each of the N samples.

errornumpy.ndarray (N,)

The 1-sigma error values for each sample.

residualsnumpy.ndarray (N,)

The residuals of the fit y - c.phi.

weightsnumpy.ndarray (N,)

The weighting of each sample in the fit.

error_weighted_amatnumpy.ndarray (nterms, nterms), optional

The matrix \(A = \Phi^T W Var(y) W \Phi\), optionally passed in for speed if pre-calculated.

rankint or float, optional

The rank of error_weighted_amat, if provided, and it’s rank was pre-calculated. Otherwise, it will be solved for.

calculate_errorbool, optional

If True, calculate the covariance of the fit coefficients based upon the error values.

calculate_residualbool, optional

If True, calculate the covariance of the fit coefficients based upon residuals of the fit.

estimate_covariancebool, optional

If True, calculate the covariance of the fit coefficients from the error values using estimated_covariance_matrix_inverse(). Otherwise, use covariance_matrix_inverse().

Returns:
e_inv, r_invnumpy.ndarray, numpy.ndarray

The inverse covariance calculated from error, and the inverse covariance calculated from residuals. If calculate_error is True, the shape of e_cov will be (nterms, nterms) or (0, 0) otherwise. The same is true for calculate_residual and r_cov.

grig.resample_utils.solve_mean_fit(data, error, weight, weightsum=None, calculate_variance=True, calculate_rchi2=True)[source]

Return the weighted mean of data, variance, and reduced chi-squared.

For data (\(y\)), error (\(\sigma\)), and weights (\(w\)), the weighted mean is given as:

\[\bar{y} = \frac{\sum_{i=1}^{N}{w_i y_i}} {\sum_{i=1}^{N}{w_i}}\]

The returned variance (\(V\)) will depend on use_error. If use_error is True:

\[V = \frac{\sum_{i=1}^{N}{(w_i\sigma_i)^2}}{(\sum_{i=1}^{N}{w_i})^2}\]

If use_error is False:

\[V = \frac{1}{N - 1} \frac{\sum_{i=1}^{N}{w_i (y_i - \bar{y})^2}} {\sum_{i=1}^{N}{w_i}}\]

Finally, the reduced chi-squared statistic is given as:

\[\chi_r^2 = \frac{N}{N - 1} \frac{\sum_{i=1}^{N}{w_i (y_i - \bar{y})^2 / \sigma_i^2}} {\sum_{i=1}^{N}{w_i}}\]

Note that \(\chi_r^2 = 1\) is use_error is False.

Parameters:
datanumpy.ndarray (N,)

The data array consisting of N samples.

errornumpy.ndarray (N,)

The associated 1-sigma error values for each of the N data samples.

weightnumpy.ndarray (N,)

The weighting applied to each of the N data samples.

weightsumint or float, optional

The sum of the weights, optionally passed in for speed if pre-calculated.

calculate_variancebool, optional

If True, calculate the variance. Otherwise, variance will be returned as a float value of zero.

calculate_rchi2bool, optional

If True, calculate the reduced chi-squared statistic. Otherwise, it will be returned as a float value of zero.

Returns:
mean, variance, rchi2float, float, float

The weighted mean, variance and reduced chi-squared statistic.

grig.resample_utils.solve_polynomial_fit(phi_samples, phi_point, data, error, distance_weight, weight, derivative_term_map=None, calculate_variance=True, calculate_rchi2=True, calculate_derivative_mscp=True, error_weighting=True, estimate_covariance=False)[source]

Derive a polynomial fit from samples, then calculate fit at single point.

The fit to the sample distribution is given as

\[f(\Phi) = \hat{c} \cdot \Phi\]

where \(\Phi\) contains products of the sample coordinates for each coefficient term. The coefficients \(\hat{c}\) are solved for using least-squares fitting and then applied to calculate the fitted value at a single point \(f(\Phi_{fit})\).

It is also possible to return an on the fit as a variance. If a valid error is supplied, it will be propagated. If no valid errors are available, they will be calculated from residuals on the fit.

The reduced chi-squared (\(\chi^2\)) statistic may also be calculated, but is only really meaningful if valid errors were supplied. Otherwise, \(\chi^2 \equiv 1\).

Finally, the covariance of gradients between dimensions may also be returned. Note that these are the weighted mean of all sample gradients.

Parameters:
phi_samplesnumpy.ndarray (n_terms, n_samples)

The array of independent terms for each sample.

phi_pointnumpy.ndarray (n_terms,)

The array containing the independent terms at the fitting point.

datanumpy.ndarray (n_samples,)

The array of sample values.

errornumpy.ndarray (n_samples,)

The array of error values for each sample. Note that if errors are unavailable, an array of size 0 may be supplied. If this is the case, and an error on the fit is required, it will be derived from the residuals of the fit from the data. In addition, the reduced chi-squared statistic will always be 1.0 if derived from residuals.

distance_weightnumpy.ndarray (n_samples,)

The distance weighting factor (not including any error weighting) applied to each sample in the fit.

weightnumpy.ndarray (n_samples,)

The full weighting factor applied to each sample in the fit.

derivative_term_mapnumpy.ndarray, optional

A mapping array for the determination of derivatives from the coefficients of the fit, and available terms in “phi”. The shape of the array is (n_dimensions, 3, n_derivative_terms). This is only required if the gradient is required as an output. For a full description of the derivative map, please see polynomial_derivative_map().

calculate_variancebool, optional

If True, calculate the variance on the fit. The variance will be calculated irrespectively if a valid error was supplied, and the reduced chi-squared statistic is required as a return value.

calculate_rchi2bool, optional

If True, calculate the reduced chi-squared statistic of the fit. Note that valid errors must be supplied for this to be meaningful.

calculate_derivative_mscpbool, optional

If True, calculate the covariance of the derivatives at the fit point.

error_weightingbool, optional

If True, indicates that weights includes an error weighting factor of 1/sigma^2. This allows for a slight speed increase when performing the fit as some mathematical terms will not need to be recalculated.

estimate_covariancebool, optional

If True, uses estimated_covariance_matrix_inverse() instead of covariance_matrix_inverse() when determining the variance. This is suggested if the errors are not well-behaved.

Returns:
fit_value, variance, rchi2, gradientsfloat, float, float, numpy.ndarray

The value of the fit at the fit point. The variance and reduced chi-squared will only be calculated if calculate_variance and calculate_rchi2 are respectively set to True. The gradients matrix is an (n_dimensions, n_dimensions) array where gradients[i, j] = dx_i * dx_j.

grig.resample_utils.solve_rchi2_from_error(residuals, weights, errors, weightsum=None, rank=1)[source]

Return the reduced chi-squared given residuals and sample errors.

For weights \(w\), errors \(\sigma\), and residuals \(r\) where \(r = y - f(x)\), the reduced chi-squared is given as:

\[\chi_r^2 = \frac{N}{N - M} \frac{\sum_{i=1}^{N}{w_i r_i^2 / \sigma_i^2}} {\sum_{i=1}^{N}{w_i}}\]

where \(M\) is given by rank.

Parameters:
residualsnumpy.ndarray (N,)

The residuals to the fit, or y - f(x).

weightsnumpy.ndarray (N,)

The weights to each sample in the fit.

errorsnumpy.ndarray (N,)

The 1-sigma measurement errors for each sample in the fit.

weightsumint or float, optional

The sum of the sample weights, optionally passed in for speed.

rankint or float, optional

The degrees of freedom used in the reduced chi-squared value is taken as N - rank. The default is 1 and applies the Bessel correction. If N < rank, rank is automatically set to N - 1.

Returns:
rchi2float

The reduced chi-squared value for the fit.

grig.resample_utils.solve_rchi2_from_variance(residuals, weights, variance, weightsum=None, rank=1)[source]

Return the reduced chi-squared given residuals and constant variance.

For weights \(w\), variance \(V\), and residuals \(r\) where \(r = y - f(x)\), the reduced chi-squared is given as:

\[\chi_r^2 = \frac{1}{N - M} \frac{\sum_{i=1}^{N}{w_i r_i^2}} {V \sum_{i=1}^{N}{w_i}}\]

where \(M\) is given by rank.

Parameters:
residualsnumpy.ndarray (N,)

The residuals to the fit, or y - f(x).

weightsnumpy.ndarray (N,)

The weights to each sample in the fit.

varianceint or float

The constant variance of the fit.

weightsumint or float, optional

The sum of the sample weights, optionally passed in for speed.

rankint or float, optional

The degrees of freedom used in the reduced chi-squared value is taken as N - rank. The default is 1 and applies the Bessel correction. If N < rank, rank is automatically set to N - 1.

Returns:
rchi2float

The reduced chi-squared value for the fit.

grig.resample_utils.sscp(matrix, weight=None, normalize=False)[source]

Calculate the sum-of-squares-and-cross-products of a matrix.

For the matrix \(A\), calculates \(A^TA\). If weights (\(W\)) are provided.

\[sscp = WA^TAW^T\]

Note that the weight should only contain the diagonal elements of \(W\), and as such should be a 1-dimensional array.

If normalize=True:

\[sscp = \frac{WA^TAW^T}{trace(W^TW)}\]

where \(W = I\), the identity matrix if weight is not supplied.

Parameters:
matrixnumpy.ndarray (M, N)

Input matrix.

weightnumpy.ndarray (N,), optional

Weights to be applied during sscp.

normalizebool, optional

If True, scales result as described above. Default is False.

Returns:
numpy.ndarray of numpy.float64 (M, M)

Square output array containing the sum-of-squares-and-cross-products matrix.

grig.resample_utils.stretch_correction(rchi2, density, variance_offset)[source]

A sigmoid function used by the “shaped” adaptive resampling algorithm.

This sigmoid function is applied when determining the severity of stretch (\(s\)) applied to principle axes of a rotated weighting kernel. The correction term (\(\gamma\)) is applied as:

\[s_{corrected} = s^\gamma\]

Since the stretch values are determined from the singular values of a normalized Hermitian matrix \(A\) (see shaped_adaptive_weight_matrix()), where \(|A| \equiv 1\), then:

\[\prod_{k=1}^{K}{s_k} = \prod_{k=1}^{K}{s_{corrected, k}} = 1\]

in \(K\) dimensions. In other words, this does not affect the overall size (or volume) of the weighting kernel.

The correction factor is calculated using half_max_sigmoid() using \(c=1\), lower and upper asymptotes as -1 and 1, such that the midpoint is fixed at zero when \(x=0\). After making the necessary substitutions, the correction factor is given as:

\[\gamma = \frac{2} {\left( {1 + (2^{\nu} - 1)e^{B(1 - x)}} \right)^{1/\nu}} - 1\]

We then set the rate as \(B = \rho\) where \(\rho\) is the density as determined by relative_density(), and the point of inflection as \(exp(\sigma)\) where \(\sigma^2\) is the variance_offset as determined by offset_variance(). Finally, setting \(x = \chi_r^2\) we arrive at:

\[\gamma = \frac{2} {\left( {1 + (2^{exp(\sigma)} - 1)e^{\rho (1 - \chi_r^2)}} \right)^{1/exp(\sigma)}} - 1\]

As \(\gamma \rightarrow 0\), the resulting shape becomes more symmetrical. This will be the case when the fitting point is away from the center of the sample distribution (high \(\sigma^2\)), the fit occurs in a low density area (low \(\beta\)), or \(\chi_r^2 \rightarrow 1\).

It should be noted that \(s_{corrected} \rightarrow s\) as \(\chi_r^2 \rightarrow \infty\). However, in the range \(0 < \chi_r^2 < 1\), the correction factor is actually negative, with \(\gamma \rightarrow -1\) as \(\chi_r^2 \rightarrow 0\). This has the effect of effectively rotating the shape so that its major axis is perpendicular to the mean sample gradient rather than parallel.

Parameters:
rchi2int or float or numpy.ndarray (shape)

The reduced chi-squared statistic of the initial fit.

densityint or float or numpy.ndarray (shape)

The local relative density at the fitting point (see relative_density()).

variance_offsetint or float or numpy.ndarray (shape)

The variance at the fitting with respect to the coordinate distribution of the sample used in the fit. Please see offset_variance() for further information.

Returns:
correctionnumpy.ndarray

The correction factor.

Notes

For high valued inflection points, numpy will set values in the denominator of the above equation to infinity, or 1, resulting in a misleading correction factor of \(\pm 1\). In order to counter this, half_max_sigmoid() could not be used, and the calculation is done “by hand” with spurious values resulting in a correction factor of zero. This also requires some numba hackery resulting in the final output value being a numpy.ndarray in all cases. When single valued inputs are supplied, the output will be a single-valued array of zero dimensions which should be suitable for subsequent calculations.

grig.resample_utils.update_mask(weights, mask)[source]

Updates a mask, setting False values where weights are zero or non-finite.

Utility function update a boolean mask in place given weights. mask values where weights are zero or non-finite will be set to False, and the total number of True values is returned as output.

Parameters:
weightsnumpy.ndarray (N,)

The weight values.

masknumpy.ndarray of bool (N,)

The mask array to update in place.

Returns:
countsint

The number of True mask values following the update.

grig.resample_utils.variance_from_offsets(offsets, covariance, sigma_inv=None)[source]

Determine the variance given offsets from the expected value.

The output variance is:

\[V = M^T \Sigma^{-1} M\]

where offsets (\(M\)) are the deviations \(X - E[X]\) and \(\Sigma\) is the covariance.

Parameters:
offsetsnumpy.ndarray (n_dimensions,)

The observational offsets from the expected value.

covariancenumpy.ndarray (n_dimensions, n_dimensions)

The covariance matrix of observations.

sigma_invnumpy.ndarray (n_dimensions, n_dimensions), optional

The matrix inverse of the covariance matrix, optionally passed in for speed.

Returns:
variancefloat

The variance as described above.

grig.resample_utils.weighted_fit_variance(residuals, weights, weightsum=None, rank=1)[source]

Calculate variance of a fit from the residuals of the fit to data.

For data \(y\), weights \(w\), and fitted function \(f(x) = fit(x, y, w)\), the residual is given as \(r = y - f(x)\). The variance is then given as:

\[V = \frac{1}{N - M} \frac{\sum_{i=1}^{N}{w_i r_i^2}} {\sum_{i=1}^{N}{w_i}}\]

where \(M\) = dof if \(M < N\) and \(M = N - 1\) otherwise.

Parameters:
residualsnumpy.ndarray (ndata,)

The residuals given as data - fit.

weightsnumpy.ndarray (ndata,)

The weights.

weightsumint or float, optional

The sum of weights optionally passed in for speed if pre-calculated.

rankint, optional

The degrees of freedom used in the variance calculation is taken as ndata - rank. The default is 1 and applies the Bessel correction. If ndata < rank, rank is automatically set to ndata - 1.

Returns:
variancefloat

Variance calculated from residuals.

grig.resample_utils.weighted_mean(data, weights, weightsum=None)[source]

Calculate the weighted mean of a data set.

The weighted mean of data \(y\) with weights \(w\) is given as:

\[\bar{y} = \frac{\sum_{i=1}^{N}{w_i y_i}} {\sum_{i=1}^{N}{w_i}}\]

This is a jit compiled numba function for use within other functions in sofia_redux.toolkit.resampling.

Parameters:
datanumpy.ndarray (ndata,)

Data.

weightsnumpy.ndarray (ndata,)

Weights.

weightsumint or float, optional

Sum of weights, optionally passed in for speed if pre-calculated.

Returns:
weighted_meanfloat

The weighted mean.

grig.resample_utils.weighted_mean_variance(variance, weights, weightsum=None)[source]

Calculated mean weighted variance.

Propagate variance as:

\[\bar{V} = \frac{\sum_{i=1}^{N}{w_i^2 V_i}} {(\sum_{i=1}^{N}{w_i})^2}\]
Parameters:
variancenumpy.ndarray (ndata,)

Variance array.

weightsnumpy.ndarray (ndata,)

Weights.

weightsumint or float, optional

Sum of weights. Passed in for speed if pre-calculated.

Returns:
mean_variancefloat

The propagated variance.

grig.resample_utils.weighted_variance(error, weights, weightsum=None)[source]

Utility function to calculate the biased weighted variance.

Calculates the biased weighted variance from data errors as:

\[V = \frac{\sum{(w\sigma)^2}}{(\sum{w})^2}\]
Parameters:
errornumpy.ndarray (ndata,)

1-sigma error values.

weightsnumpy.ndarray (ndata,)

Data weights.

weightsumint or float, optional

Sum of weights. Optionally passed in for speed if pre-calculated.

Returns:
weighted_variancefloat

The weighted variance.