grig package¶
Subpackages¶
Submodules¶
grig.clean_image module¶
- grig.clean_image.clean_image(image, error=None, mask=None, window=None, order=1, fix_order=True, robust=None, negthresh=None, mode=None, leaf_size=None, **kwargs)[source]¶
Uses ResamplePolynomial to correct NaNs in image and/or supplied in mask.
- Parameters:
- imagearray_like of float
(M, N) array of data values. NaN values will automatically be considered ‘bad’ and replaced.
- errorarray_like of float, optional
(M, N) array of error (1-sigma) values associated with the data array. error will be used to weight fits and also be propagated to the output error values.
- maskarray_like of bool, optional
(M, N) array of bool where True indicates a valid data point that can be included the fitting and False indicates data points that should be excluded from the fit and replaced. Masked points will be reflected in the output counts array.
- windowarray_like or float or int, optional
(2,) array or single float value specifying the maximum euclidean distance of a data sample from a resampling point such that it can be included in a local fit. window may be declared for each feature. For example, when fitting 2-dimensional (x, y) data, a window of 1.0 would create a circular fitting window around each resampling point, whereas a window of (1.0, 0.5) would create an eliptical fitting window with a semi-major axis of 1.0 in x and semi-minor axis of 0.5 in y. If not supplied, window is calculated based on an estimate of the median population density of the data for each feature.
- orderarray_like or int, optional
(2,) array or single integer value specifying the polynomial fit order for each feature.
- fix_orderbool, optional
In order for local polynomial fitting to occur, the default requirement is that nsamples >= (order + 1) ** 2, where nsamples is the number of data samples within window. If fix_order is True and this condition is not met, then local fitting will be aborted for that point and a value of cval will be returned instead. If fix_order is False, then order will be reduced to the maximum value where this condition can be met. NOTE: this is only available if order is symmetrical. i.e. it was passed in as a single integer to be applied across both x and y. Otherwise, it is unclear as to which dimension order should be reduced to meet the condition.
- robustfloat, optional
Specifies an outlier rejection threshold for data. A data point is identified as an outlier if:
|x_i - x_med|/MAD > robust,
where x_med is the median, and MAD is the Median Absolute Deviation defined as:
1.482 * median(|x_i - x_med|).
- negthreshfloat, optional
Specifies a negative value rejection threshold such that data < (-stddev(data) * negthresh) will be excluded from the fit.
- modestr, optional
The type of check to perform on whether the sample distribution for each resampling point is adequate to derive a polynomial fit. Depending on order and fix_order, if the distribution does not meet the criteria for mode, either the fit will be aborted, returning a value of cval or the fit order will be reduced. Available modes are:
‘edges’: Require that there are order samples in both the negative and positive directions of each feature from the resampling point.
‘counts’: Require that there are (order + 1) ** 2 samples within the window of each resampling point.
‘extrapolate’: Attempt to fit regardless of the sample distribution.
Note that ‘edges’ is the most robust mode as it ensures that no singular values will be encountered during the least-squares fitting of polynomial coefficients.
- leaf_sizeint, optional
Number of points at which to switch to brute-force during the ball tree query algorithm. See sklearn.neighbours.BallTree for further details.
- Returns:
- cleaned_image, [variance_out], [counts]n_tuple of numpy.ndarray (M, N)
See grig.ResamplePolynomial
grig.resample module¶
- grig.resample.Resample¶
alias of
ResamplePolynomial
- grig.resample.resampler(coordinates, data, *locations, error=None, mask=None, window=None, order=1, fix_order=True, robust=None, negthresh=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=False, smoothing=0.0, relative_smooth=False, adaptive_threshold=None, adaptive_algorithm='scaled', fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', order_algorithm='bounded', error_weighting=True, estimate_covariance=False, is_covar=False, jobs=None, get_error=False, get_counts=False, get_weights=False, get_distance_weights=False, get_rchi2=False, get_cross_derivatives=False, get_offset_variance=False, **distance_kwargs)¶
ResamplePolynomial data using local polynomial fitting.
Initializes and then calls the
ResamplePolynomial
class. For further details on all available parameters, please seeResamplePolynomial.__init__()
andResamplePolynomial.__call__()
.- Parameters:
- coordinates
- data
- locations
- error
- mask
- window
- order
- fix_order
- robust
- negthresh
- window_estimate_bins
- window_estimate_percentile
- window_estimate_oversample
- leaf_size
- large_data
- smoothing
- relative_smooth
- adaptive_threshold
- adaptive_algorithm
- fit_threshold
- cval
- edge_threshold
- edge_algorithm
- order_algorithm
- error_weighting
- estimate_covariance
- is_covar
- jobs
- get_error
- get_counts
- get_weights
- get_distance_weights
- get_rchi2
- get_cross_derivatives
- get_offset_variance
- distance_kwargs
- Returns:
- resultsfloat or numpy.ndarray or n-tuple of (float or numpy.ndarray)
If a fit is performed at a single location, the output will consist of int or float scalar values. Multiple fits result in numpy arrays. The exact output shape depends on the number of data sets, number of fitted points, dimensions of the fit locations. Assuming that all get_* keywords are set to True, the output order is:
results[0] = fitted values results[1] = error on the fit results[2] = sample counts for each fit results[3] = total weight of all samples in fit results[4] = total distance weight sum of all samples in fit results[5] = reduced chi-squared statistic of the fit results[6] = derivative mean squared cross products results[7] = offset variance of fit from sample distribution
grig.resample_base module¶
- class grig.resample_base.ResampleBase(coordinates, data, error=None, mask=None, window=None, robust=None, negthresh=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=None, check_memory=True, memory_buffer=None, **distance_kwargs)[source]¶
Bases:
object
- Attributes:
features
int : number of data features (dimensions)
fit_settings
dict : Fit reduction settings applied during last call
fit_tree
Return the fitting tree representative of points to fit.
grid_class
Return the grid class of the resampler
multi_set
bool : True if solving for multiple data sets
n_samples
int : The number of samples in each data set.
n_sets
int : The number of data sets to fit.
window
numpy.ndarray (n_features,) : Window radius in each dimension.
Methods
__call__
(*args[, fit_threshold, cval, ...])Resample data defined during initialization onto new coordinates.
block_loop
(sample_values, error, mask, ...)Perform resampling reduction in parallel or series.
Return the minimum number of points for a fit.
combine_blocks
(blocks, n_sets, n_fits, ...)Combines the results from multiple reductions into one set.
estimate_feature_windows
(coordinates[, ...])Estimates the radius of the fitting window for each feature.
estimate_max_bytes
(coordinates[, window, ...])Estimate the maximum number of bytes required by a reduction.
Return the appropriate grid class for the resampler.
Return the global resampling values.
pre_fit
(settings, *args)Perform pre-fitting steps and build the fitting tree.
process_block
(args, block)Run
solve_fits()
on each block.process_blocks
(args, kwargs, settings, ...)Wrapper for handling block resampling in a multiprocessing environment.
reduction_settings
([error_weighting, ...])Define a set of reduction instructions based on user input.
set_sample_tree
(coordinates[, radius, ...])Build the sample tree from input coordinates.
sufficient_memory_for
(memory)Return whether there is sufficient available memory for the task.
- classmethod block_loop(sample_values, error, mask, fit_tree, sample_tree, settings, iteration, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True, jobs=None)[source]¶
Perform resampling reduction in parallel or series.
Utility function to allow the resampling algorithm to process blocks of data in series or parallel, recombining the data once complete. Please see
ResamplePolynomial.__call__()
for descriptions of the arguments.- Parameters:
- sample_valuesnumpy.ndarray
- errornumpy.ndarray
- masknumpy.ndarray
- fit_treeBaseTree
- sample_treeBaseTree
- settingsdict
- iterationint
- get_errorbool, optional
- get_countsbool, optional
- get_weightsbool, optional
- get_distance_weightsbool, optional
- get_rchi2bool, optional
- get_cross_derivativesbool, optional
- get_offset_variancebool, optional
- jobsint, optional
- Returns:
- combined_results8-tuple of numpy.ndarray
- In order: fit, error, counts, weights, distance weights,
reduced chi-squared, MSCP derivatives, distribution offset.
- calculate_minimum_points()[source]¶
Return the minimum number of points for a fit.
- Parameters:
- argsn-tuple
Input arguments used to determine the number of points required for a fit.
- Returns:
- minimum_pointsint
The minimum number of points for a fit
- static combine_blocks(blocks, n_sets, n_fits, n_dimensions, cval, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True)[source]¶
Combines the results from multiple reductions into one set.
The resampling reduction may be performed in serial or parallel over multiple “blocks”, where each block contains a set of spatially close fit coordinates and all samples necessary to perform a fit over all points.
- Parameters:
- blocksn-tuple of processed reductions for n blocks.
- n_setsint
The number of data sets in the reduction. Each set is contains the same sample coordinates as all other sets, but the sample values may vary.
- n_fitsint
The number of fitting points over all blocks.
- n_dimensionsint
The number of coordinate dimensions.
- cvalfloat
The fill value for missing data in the output fit value arrays.
- get_errorbool, optional
If True, indicates that errors on the fit were calculated.
- get_countsbool, optional
If True, indicates that the number of samples used for each fit should be returned.
- get_weightsbool, optional
If True, indicates that the total weight sum of all samples used in each fit should be returned.
- get_distance_weightsbool, optional
If True, indicates that the distance weight sum of all samples used in each fit should be returned.
- get_rchi2bool, optional
If True, indicates that the reduced chi-squared statistic for each fit should be returned.
- get_cross_derivativesbool, optional
If True, indicates that the derivative MSCP should be returned.
- get_offset_variancebool, optional
If True, indicates that the offset variance of the fit from the sample distribution should be returned.
- Returns:
- results8-tuple of numpy.ndarray
results[0] = fitted values results[1] = error on the fit results[2] = counts results[3] = total weight sums results[4] = total distance weight sums results[5] = reduced chi-squared statistic results[6] = derivative MSCP results[7] = offset variance
Notes
The return value is always an 8-tuple, and the get_* keywords indicate whether the calculated values in the block reductions are valid. If False, the corresponding output array will have the correct number of axes, but be of zero size.
- estimate_feature_windows(coordinates, feature_bins=10, percentile=50, oversample=2.0)[source]¶
Estimates the radius of the fitting window for each feature.
The estimate of the window is given as the minimum required to theoretically allow for a polynomial fit based on the number of samples within such a window. Since the window is a constant of the resampling algorithm, it is difficult to calculate a precise value that will allow all fits to occur at every point within (or close to) the sample distribution.
Therefore, the sample distribution is divided up into feature_bins n-dimensional boxes over each feature. For example, if feature_bins=10 for 2-dimensional data, we would divide the sample distribution into a total of 100 (10 * 10) equal boxes before counting the number of samples inside each box.
The number of samples used to then calculate the final window radius is determined from a given percentile (default = 50) of the box counts.
For a fit of polynomial order \(p\), the minimum number of samples required for fitting is:
\[N = (p + 1)^K\]for \(K\) features if the order is “symmetrical” (i.e., the same for each feature), or
\[N_{required} = \prod_{k=1}^{K}{(p_k + 1)}\]if the fit order varies by feature. Coordinates are then scaled so that:
\[x_k^{\prime} = \text{feature\_bins} \times \frac{x_k - min(x_k)} {\beta_k}\]where the scaling factor \(\beta_k = max(x_k) - min(x_k)\). In this scheme, the coordinates have been normalized such that the width of the bin in each dimension is 1, and has a volume equal to one. Therefore, the sample density (\(\rho\)) of the bin is equal to the number of samples it contains. The volume of a unit radius spheroid is then calculated as:
\[V_{r=1} = \frac{\pi^{K/2}}{\Gamma(\frac{n}{2} + 1)}\]and the expected number of samples expected to fall inside the spheroid is given as:
\[N_{r=1} = \rho V_{r=1}\]We can then set the radius of the spheroid to give the required number of points as:
\[r_{scaled} = \left( \frac{N_{required} \nu}{N_{r=1}} \right)^{\frac{1}{K}} + \epsilon\]where \(\nu\) is the oversample factor and \(\epsilon\) is added to ensure that if resampling on a uniform grid of samples, fitting at a point between two samples will always result in enough samples available to perform a fit. \(\epsilon\) is given as:
\[\epsilon = 0.5 \rho^{\frac{-1}{K}}\]or half the average spacing between samples. We can then define the final window radius for dimension \(k\) as:
\[\Omega_k = \frac{\beta_k r_{scaled}}{\text{feature\_bins}}\]- Parameters:
- coordinatesnumpy.ndarray (n_features, n_coordinates)
The coordinates of all samples to be fit.
- feature_binsint, optional
The number of bins to divide the sample coordinates into, per feature, when determining the sample density (default = 10).
- percentileint or float, optional
The percentile used to define a representative value for samples per bin. The default (50), gives the median of all bin populations.
- oversampleint or float, optional
The oversampling factor for the window region. A value of one will result in a window that should provide the exact number of samples required for a polynomial fit of the given order assuming uniform density of the samples.
- Returns:
- windownumpy.ndarray (n_features,)
The principle axes of an ellipsoid used to create a fitting region around each resampling point.
- classmethod estimate_max_bytes(coordinates, window=1, n_sets=1, leaf_size=40, full_tree=True, **kwargs)[source]¶
Estimate the maximum number of bytes required by a reduction.
Assumes that many factors are precalculated for processing time reduction and that the ball-tree is constructed in the least efficient configuration.
- Parameters:
- coordinatesnumpy.ndarray
The coordinates for the tree of shape (n_dimensions, n)
- windowint or float, optional
The size of the tree windows
- n_setsint, optional
The number of data sets to reduce sharing coordinates.
- leaf_sizeint, optional
The number of leaves used to construct the ball-tree.
- full_treebool, optional
Calculate the maximum number of bytes if the full ball-tree is pre-calculated. Otherwise, calculates the size of a single neighborhood sized ball-tree.
- kwargsdict, optional
Optional keyword arguments to pass into the calculation for subclasses of the Reduction. Will include order (int) for polynomial resampling.
- Returns:
- max_sizeint
An upper limit for the maximum number of bytes used to construct the reduction including pre-calculation and all possible inputs.
- property features¶
int : number of data features (dimensions)
- property fit_settings¶
dict : Fit reduction settings applied during last call
- property fit_tree¶
Return the fitting tree representative of points to fit.
- Returns:
- BaseTree
- get_grid_class()[source]¶
Return the appropriate grid class for the resampler.
- Returns:
- BaseGrid subclass
- classmethod global_resampling_values()[source]¶
Return the global resampling values.
The global resampling values are of main importance when performing multiprocessing, and allows each process to gain fast access to the necessary data.
- Returns:
- dict
- property grid_class¶
Return the grid class of the resampler
- Returns:
- BaseGrid subclass
- property multi_set¶
bool : True if solving for multiple data sets
- property n_samples¶
int : The number of samples in each data set.
- property n_sets¶
int : The number of data sets to fit.
- pre_fit(settings, *args)[source]¶
Perform pre-fitting steps and build the fitting tree.
- Parameters:
- settingsdict
Settings calculated via reduction_settings to be applied if necessary.
- argsn-tuple
The call input arguments.
- Returns:
- None
- classmethod process_block(args, block)[source]¶
Run
solve_fits()
on each block.Utility function that parses the settings and tree objects to something usable by the numba JIT compiled resampling functions. This is not meant to be called directly.
- Parameters:
- args2-tuple
A tuple of form (filename, iteration) where the filename is a string pointing towards a previously saved pickle file containing the relevant information for the reduction if required. If set to None, the arguments are retrieved from the _global_resampling_values global parameter.
- blockint
The block index to process.
- Returns:
- results9-tuple of numpy.ndarray
The first element contains the fit point indices to be fit. For the remaining elements, please see
solve_fits()
return values.
- classmethod process_blocks(args, kwargs, settings, sample_tree, fit_tree, jobs, iteration)[source]¶
Wrapper for handling block resampling in a multiprocessing environment.
- Parameters:
- argsn-tuple
The arguments to pass into
multitask()
.- kwargsdict or None
The keyword arguments to pass into
multitask()
.- settingsdict
Reduction settings.
- sample_treeBaseTree
The resampling tree in sample space.
- fit_treeBaseTree
The fitting tree.
- jobsint
The number of jobs to perform in parallel.
- iterationint
The current resampling iteration. This is simply used as a marker to distinguish pickled files on each resampling run.
- Returns:
- blockslist
A list of the return values from the process_block method for each block.
- reduction_settings(error_weighting=True, fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', jobs=None, use_threading=None, use_processes=None, **kwargs)[source]¶
Define a set of reduction instructions based on user input.
This method is responsible for determining, formatting, and checking a number variables required for the resampling algorithm based on user input. For detailed descriptions of user options, please see
ResamplePolynomial.__call__()
.- Parameters:
- error_weightingbool, optional
- fit_thresholdfloat, optional
- cvalfloat, optional
- edge_thresholdfloat or array_like (n_features,), optional
- edge_algorithmstr, optional
- jobsint, optional
- use_threadingbool, optional
If True, force use of threads during multiprocessing.
- use_processesbool, optional
If True, force use of sub-processes during multiprocessing.
- kwargsdict
Optional keyword arguments to the reduction settings.
- Returns:
- settingsdict
The reduction settings. Also, stored as
ResamplePolynomial.fit_settings()
.
- set_sample_tree(coordinates, radius=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=None, check_memory=True, memory_buffer=None, memory_kwargs=None, **distance_kwargs)[source]¶
Build the sample tree from input coordinates.
- Parameters:
- coordinatesnumpy.ndarray (float)
The input coordinates of shape (n_features, n_samples).
- radiusfloat or sequence (float), optional
The radius of the window around each fitting point used to determine sample selection for fit. If not supplied, will be estimated using
ResamplePolynomial.estimate_feature_windows()
.- window_estimate_binsint, optional
Used to estimate the window if not supplied using
ResamplePolynomial.estimate_feature_windows()
.- window_estimate_percentileint or float, optional
Used to estimate the window if not supplied using
ResamplePolynomial.estimate_feature_windows()
.- window_estimate_oversampleint or float, optional
Used to estimate the window if not supplied using
ResamplePolynomial.estimate_feature_windows()
.- leaf_sizeint, optional
Number of points at which to switch to brute-force during the ball tree query algorithm. See sklearn.neighbours.BallTree for further details.
- large_databool or None, optional
If True, indicates that this resampling algorithm will run on a large set of data, and the ball tree should be created on subsets of the data.
- check_memorybool, optional
If True, check the memory requirements for resampling the supplied data.
- memory_bufferfloat, optional
A fraction (positive or negative) with which to modify the memory estimates for the process memory requirements. Memory estimates are scaled by the factor 1 + memory_buffer.
- memory_kwargsdict, optional
- distance_kwargsdict, optional
Optional keyword arguments passed into
sklearn.neighbors.DistanceMetric()
. The default is to use the “minkowski” definition with p=2, i.e., the Euclidean definition. This is important in determining which samples lie inside the window region of a resampling point, and when deriving distance weighting factors.
- Returns:
- None
- classmethod sufficient_memory_for(memory)[source]¶
Return whether there is sufficient available memory for the task.
- Parameters:
- memoryint or None
The memory required for the task in bytes
- Returns:
- perform_taskbool
- property window¶
numpy.ndarray (n_features,) : Window radius in each dimension.
grig.resample_kernel module¶
- class grig.resample_kernel.ResampleKernel(coordinates, data, kernel, error=None, mask=None, robust=None, negthresh=None, kernel_spacing=None, kernel_offsets=None, kernel_weights=None, limits=None, degrees=3, smoothing=0.0, knots=None, knot_estimate=None, eps=1e-08, fix_knots=None, tolerance=0.001, max_iteration=20, exact=False, reduce_degrees=False, imperfect=False, leaf_size=40, large_data=False, **distance_kwargs)[source]¶
Bases:
ResampleBase
- Attributes:
degrees
Return the degrees of the spline fit to the kernel.
exit_code
Return the exit code of the spline fit.
exit_message
Return the spline exit message.
features
int : number of data features (dimensions)
fit_settings
dict : Fit reduction settings applied during last call
fit_tree
Return the fitting tree representative of points to fit.
grid_class
Return the grid class of the resampler
kernel
Return the resampling kernel.
kernel_offsets
Return the coordinate offsets for the kernel.
kernel_spacing
Return the spacing between kernel grid points.
multi_set
bool : True if solving for multiple data sets
n_samples
int : The number of samples in each data set.
n_sets
int : The number of data sets to fit.
spline
Return the spline object for the kernel fit.
window
numpy.ndarray (n_features,) : Window radius in each dimension.
Methods
__call__
(*args[, cval, edge_threshold, ...])Resample data defined during initialization onto new coordinates.
block_loop
(sample_values, error, mask, ...)Perform resampling reduction in parallel or series.
calculate_minimum_points
()Return the minimum number of points for a fit.
combine_blocks
(blocks, n_sets, n_fits, ...)Combines the results from multiple reductions into one set.
estimate_feature_windows
(*args, **kwargs)Estimates the radius of the fitting window for each feature.
estimate_max_bytes
(coordinates[, window, ...])Estimate the maximum number of bytes required by a reduction.
get_grid_class
()Return the appropriate grid class for the resampler.
global_resampling_values
()Return the global resampling values.
pre_fit
(settings, *args)Perform pre-fitting steps and build the fitting tree.
process_block
(args, block)Run
solve_fits()
on each block.process_blocks
(args, kwargs, settings, ...)Wrapper for handling block resampling in a multiprocessing environment.
reduction_settings
([error_weighting, ...])Define a set of reduction instructions based on user input.
set_kernel
(kernel[, kernel_spacing, ...])Set the kernel for subsequent fitting.
set_sample_tree
(coordinates[, leaf_size, ...])Build the sample tree from input coordinates.
sufficient_memory_for
(memory)Return whether there is sufficient available memory for the task.
- classmethod block_loop(sample_values, error, mask, fit_tree, sample_tree, settings, iteration, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True, jobs=None, **kwargs)[source]¶
Perform resampling reduction in parallel or series.
Utility function to allow the resampling algorithm to process blocks of data in series or parallel, recombining the data once complete. Please see
ResamplePolynomial.__call__()
for descriptions of the arguments.- Parameters:
- sample_valuesnumpy.ndarray
- errornumpy.ndarray
- masknumpy.ndarray
- fit_treeresampling.tree.PolynomialTree object
- sample_treeresampling.tree.PolynomialTree object
- settingsdict
- iterationint
- get_errorbool, optional
- get_countsbool, optional
- get_weightsbool, optional
- get_distance_weightsbool, optional
- get_rchi2bool, optional
- get_offset_variancebool, optional
- jobsint, optional
- kwargsdict, optional
For consistency with the resampler base only.
- Returns:
- combined_results8-tuple of numpy.ndarray
- In order: fit, error, counts, weights, distance weights,
reduced chi-squared, MSCP derivatives, distribution offset.
- static combine_blocks(blocks, n_sets, n_fits, n_dimensions, cval, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True, **kwargs)[source]¶
Combines the results from multiple reductions into one set.
The resampling reduction may be performed in serial or parallel over multiple “blocks”, where each block contains a set of spatially close fit coordinates and all samples necessary to perform a fit over all points.
- Parameters:
- blocksn-tuple of processed reductions for n blocks.
- n_setsint
The number of data sets in the reduction. Each set is contains the same sample coordinates as all other sets, but the sample values may vary.
- n_fitsint
The number of fitting points over all blocks.
- n_dimensionsint
The number of coordinate dimensions.
- cvalfloat
The fill value for missing data in the output fit value arrays.
- get_errorbool, optional
If True, indicates that errors on the fit were calculated.
- get_countsbool, optional
If True, indicates that the number of samples used for each fit should be returned.
- get_weightsbool, optional
If True, indicates that the total weight sum of all samples used in each fit should be returned.
- get_distance_weightsbool, optional
If True, indicates that the distance weight sum of all samples used in each fit should be returned.
- get_rchi2bool, optional
If True, indicates that the reduced chi-squared statistic for each fit should be returned.
- get_offset_variancebool, optional
If True, indicates that the offset variance of the fit from the sample distribution should be returned.
- kwargsdict, optional
For consistency with ResampleBase only (not used).
- Returns:
- results7-tuple of numpy.ndarray
results[0] = fitted values results[1] = error on the fit results[2] = counts results[3] = total weight sums results[4] = total distance weight sums results[5] = reduced chi-squared statistic results[6] = offset variance
Notes
The return value is always an 8-tuple, and the get_* keywords indicate whether the calculated values in the block reductions are valid. If False, the corresponding output array will have the correct number of axes, but be of zero size.
- property degrees¶
Return the degrees of the spline fit to the kernel.
- Returns:
- numpy.ndarray (int)
- estimate_feature_windows(*args, **kwargs)[source]¶
Estimates the radius of the fitting window for each feature.
The window for the resampling algorithm will be set to encompass the kernel extent over all dimensions. Unlike the standard resampling algorithm, the coordinates of the data samples are irrelevant.
The window radius is based on the kernel extent in this implementation.
Since the resampling algorithm uses an ellipsoid window to determine possible candidates for fitting, the window is determined as an ellipsoid which circumscribes the cuboid kernel array. Although there are an infinite number of possible ellipsoids, this particular method uses the constraints that principle axes of the ellipsoid and cuboid widths are at a constant ratio. For example, in two dimensions where a and b are the principle axes of an ellipsoid, and w and h are the widths of the kernel:
a/w = b/h; (w / 2a)^2 + (h / 2b)^2 = 1.
This leads to us setting the principle axes as:
a_i = sqrt(n / 4) * w_i + delta
where w_i is the width of the kernel in dimension i in n-dimensions, and delta = spacing/1e6 so that any edge cases can be included safely. This is also valid in one dimension (a_i = w_i / 2). Note that kernels are assumed to be exactly centered, so the width of a kernel in any dimension will be:
w_i = spacing_i * (kernel.shape[i] - 1)
Note that this implementation takes the width as 2 * the maximum absolute offset of the kernel for each dimension, so that irregular kernels may be safely used.
- Returns:
- windownumpy.ndarray (n_features,)
The principle axes of an ellipsoid used to create a fitting region around each resampling point.
- property exit_code¶
Return the exit code of the spline fit.
Please see the
Spline
class for further details on the meanings of each code.- Returns:
- int
- property exit_message¶
Return the spline exit message.
- Returns:
- str
- property kernel¶
Return the resampling kernel.
- Returns:
- numpy.ndarray (float)
- property kernel_offsets¶
Return the coordinate offsets for the kernel.
- Returns:
- numpy.ndarray (float)
- property kernel_spacing¶
Return the spacing between kernel grid points.
- Returns:
- numpy.ndarray (float)
- classmethod process_block(args, block)[source]¶
Run
solve_fits()
on each block.Utility function that parses the settings and tree objects to something usable by the numba JIT compiled resampling functions. This is not meant to be called directly.
- Parameters:
- args2-tuple
A tuple of form (filename, iteration) where the filename is a string pointing towards a previously saved pickle file containing the relevant information for the reduction if required. If set to None, the arguments are retrieved from the _global_resampling_values global parameter.
- blockint
The block index to process.
- Returns:
- results9-tuple of numpy.ndarray
The first element contains the fit point indices to be fit. For the remaining elements, please see
solve_fits()
return values.
- reduction_settings(error_weighting=True, absolute_weight=None, fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', is_covar=False, jobs=None, use_threading=None, use_processes=None, **kwargs)[source]¶
Define a set of reduction instructions based on user input.
This method is responsible for determining, formatting, and checking a number variables required for the resampling algorithm based on user input. For detailed descriptions of user options, please see
ResamplePolynomial.__call__()
.- Parameters:
- error_weightingbool, optional
If True (default), weight polynomial fitting by the error values of each sample.
- absolute_weightbool, optional
If the kernel weights are negative, can lead to almost zero-like divisions in many of the algorithms. If set to True, the sum of the absolute weights are used for normalization.
- fit_thresholdfloat, optional
Not implemented for kernel fitting.
- cvalfloat, optional
During fitting, any fit that does not meet the order_algorithm requirement will be set to this value. This will be NaN by default.
- edge_thresholdfloat or array_like or float
If set to a value > 0, edges of the fit will be masked out according to edge_algorithm. Values close to zero will result in a low degree of edge clipping, while values close to 1 clip edges to a greater extent. The exact definition of edge_threshold depends on the algorithm. For further details, please see
resampling.resample_utils.check_edges()
.- edge_algorithmstr, optional
Describes how to clip edges if edge_threshold is non-zero. The available algorithms are:
‘distribution’ (default): Statistics on sample distributions are calculated, and if the resampling point is > 1/threshold standard deviations away from the sample mean, it will be clipped.
‘ellipsoid’: If the samples used to fit a resampling point deviate from the resampling point location by more than this amount, it will be clipped.
‘box’: If the flattened 1-dimensional distribution of samples center-of-mass deviates from the resampling point location in any dimension, it will be clipped.
‘range’: Over each dimension, check the distribution of points is greater than edge_threshold to the “left” and “right” of the resampling point.
- is_covarbool, optional
If True, the input data is treated as a covariance instead of a flux, and is propagated as if through a weighted mean.
- jobsint, optional
Specifies the maximum number of concurrently running jobs. An attempt will be made to parallel process using a thread-pool if available, but will otherwise revert to the “loky” backend. Values of 0 or 1 will result in serial processing. A negative value sets jobs to n_cpus + 1 + jobs such that -1 would use all cpus, and -2 would use all but one cpu.
- use_threadingbool, optional
If True, force use of threads during multiprocessing.
- use_processesbool, optional
If True, force use of sub-processes during multiprocessing.
- kwargsdict
Optional keyword arguments to the reduction settings.
- Returns:
- settingsdict
The reduction settings. Also, stored as
ResamplePolynomial.fit_settings()
.
- set_kernel(kernel, kernel_spacing=None, kernel_offsets=None, weights=None, limits=None, degrees=3, smoothing=0.0, knots=None, knot_estimate=None, eps=1e-08, fix_knots=None, tolerance=0.001, max_iteration=20, exact=False, reduce_degrees=False, imperfect=False)[source]¶
Set the kernel for subsequent fitting.
During this process, a spline will be fit to describe the kernel at all intermediate points.
- Parameters:
- kernelnumpy.ndarray (float)
The kernel to set. Must have n_features dimensions.
- kernel_spacingfloat or numpy.ndarray (float), optional
The spacing between each kernel element in units of the coordinates. Either supplied as a single value for all features, or as an array of shape (n_features,) giving the kernel spacing for each feature.
- kernel_offsetstuple or array_like, optional
If the kernel is regular, should be an n-dimensional tuple containing the grid indices in each dimension. Otherwise, should be an array of shape (n_dimensions, kernel.size).
- weightsnumpy.ndarray, optional
Optional weights to supply to the spline fit for each data point. Should be the same shape as the supplied kernel.
- limitsnumpy.ndarray (float), optional
An array of shape (n_dimensions, 2) that may be supplied to set the minimum and maximum coordinate values used during the spline fit. For example, limits[1, 0] sets the minimum knot value in the second dimensions and limits[1, 1] sets the maximum knot value in the second dimension. By default this is set to the minimum and maximum values of the coordinates in each dimension.
- degreesint or numpy.ndarray (int), optional
The degree of spline to fit in each dimension. Either a scalar can be supplied pertaining to all dimensions, or an array of shape (n_dimensions,) can be used.
- smoothingfloat, optional
Used to specify the smoothing factor. If set to None, the smoothing will be determined based on user settings or input data. If exact is True, smoothing will be disabled (zero). If exact is False, smoothing will be set to n - sqrt(2 * n) where n is the number of data values. If supplied, smoothing must be greater than zero. See above for further details. Note that if smoothing is zero, and the degrees are not equal over each dimension, smoothing will be set to eps due to numerical instabilities.
- knotslist or tuple or numpy.ndarray, optional
A set of starting knot coordinates for each dimension. If a list or tuple is supplied it should be of length n_dimensions where element i is an array of shape (n_knots[i]) for dimension i. If an array is supplied, it should be of shape (n_dimension, max(n_knots)). Note that there must be at least 2 * (degree + 1) knots for each dimension. Unused or invalid knots may be set to NaN, at the end of each array. Knot coordinates must also be monotonically increasing in each dimension.
- knot_estimatenumpy.ndarray (int), optional
The maximum number of knots required for the spline fit in each dimension and of shape (n_dimensions,). If not supplied, the knot estimate will be set to int((n / n_dimensions) ** n_dimensions^(-1)) or n_knots if knots were supplied and fixed.
- epsfloat, optional
A value where 0 < eps < 1. This defines the magnitude used to identify singular values in the spline observation matrix (A). If any row of A[:, 0] < (eps * max(A[:,0])) it will be considered singular.
- fix_knotsbool, optional
If True, do not attempt to modify or add knots to the spline fit. Only the initial supplied user knots will be used.
- tolerancefloat, optional
A value in the range 0 < tolerance < 1 used to determine the exit criteria for the spline fit. See above for further details.
- max_iterationint, optional
The maximum number of iterations to perform when solving for the spline fit.
- exactbool, optional
If True, the initial knots used will coincide with the actual input coordinates and smoothing will be set to zero. No knots should be supplied by the user in this instance.
- reduce_degreesbool, optional
Only relevant if exact is True. If set to True, the maximum allowable degree in each dimension will be limited to (len(unique(x)) // 2) - 1 where x are the coordinate values in any dimension.
- imperfectbool, optional
If a spline fit to the kernel is allowed to be imperfect (True), will only raise an error on spline fitting if a major error was encountered. Otherwise, fits will be permitted so long as a solution was reached, even if that solution did not meet expectations.
- Returns:
- None
- set_sample_tree(coordinates, leaf_size=40, large_data=False, **distance_kwargs)[source]¶
Build the sample tree from input coordinates.
- Parameters:
- coordinatesnumpy.ndarray (float)
The input coordinates of shape (n_features, n_samples).
- leaf_sizeint, optional
Number of points at which to switch to brute-force during the ball tree query algorithm. See sklearn.neighbours.BallTree for further details.
- large_databool, optional
If True, indicates that this resampling algorithm will run on a large set of data, and the ball tree should be created on subsets of the data.
- distance_kwargsdict, optional
Optional keyword arguments passed into
sklearn.neighbors.DistanceMetric()
. The default is to use the “minkowski” definition with p=2, i.e., the Euclidean definition. This is important in determining which samples lie inside the window region of a resampling point, and when deriving distance weighting factors.
- Returns:
- None
- property spline¶
Return the spline object for the kernel fit.
- Returns:
- Spline
grig.resample_kernel_utils module¶
- grig.resample_kernel_utils.apply_mask_to_kernel_set_arrays(mask, data, error, weights, counts=None)[source]¶
Set certain arrays to a fixed size based on a mask array.
- Parameters:
- masknumpy.ndarray of bool (N,)
Mask where True values indicate the associated element should be kept, and False will result in exclusion from the output arrays.
- datanumpy.ndarray (N,)
The data array.
- errornumpy.ndarray
An array of shape (1,) or (N,). If an array of size 1 is supplied, it will be expanded to an array of counts size.
- weightsnumpy.ndarray
An array of shape (1,) or (N,). If an array of size 1 is supplied, it will be expanded to an array of counts size.
- countsint, optional
The number of True values in the mask, optionally passed in for speed. Determines the output size of all arrays.
- Returns:
- data_out, error_out, weight_out3-tuple of numpy.ndarray.
Resized arrays in which the last axis is of size counts.
- grig.resample_kernel_utils.calculate_kernel_weights(coordinates, reference, knots, n_knots, coefficients, degrees, panel_mapping, panel_steps, knot_steps, nk1, spline_mapping, eps=1e-08)[source]¶
Calculate values of a kernel centered on a reference for given coordinates.
The kernel must have been transformed into a spline representation. A value of zero on the spline knots represents the center of the kernel.
- Parameters:
- coordinatesnumpy.ndarray (float)
The coordinates for which to calculate weights of shape (n_dimensions, n).
- referencenumpy.ndarray (float)
The location of the kernel center of shape (n_dimensions,).
- knotsnumpy.ndarray (float)
The kernel spline knots of shape (n_dimensions, >max(n_knots)).
- n_knotsnumpy.ndarray (int)
The number of spline knots in each dimension of shape (n_dimensions,).
- coefficientsnumpy.ndarray (float)
The spline coefficients of shape (n_coefficients,).
- degreesnumpy.ndarray (int)
The spline degrees in each dimension of shape (n_dimensions,).
- panel_mappingnumpy.ndarray (int)
An array containing the panel mapping (flat to n-D) indices. This is created by passing the panel shape (n_knots - (2 * degrees) - 1) into
flat_index_mapping()
. Should be an array of shape (n_dimensions, n_panels).- panel_stepsnumpy.ndarray (int)
The flat index mapping steps in panel-space of shape (n_dimensions,). These are returned by passing the shape Spline.panel_shape into
flat_index_mapping()
.- knot_stepsnumpy.ndarray (int)
The flat index mapping steps in knot-space of shape (n_dimensions,). These are returned by passing the shape (n_knots - degrees - 1) into
flat_index_mapping()
.- nk1numpy.ndarray (int)
An array of shape (n_dimensions,) containing the values n_knots - k1 where n_knots are the number of knots in each dimension, and k1 are the spline degrees + 1 in each dimension.
- spline_mappingnumpy.ndarray (int)
An array containing the spline mapping (flat to n-D) indices. This is created by passing the spline shape (degrees + 1) into
flat_index_mapping()
. Should be an array of shape (n_dimensions, n_spline_coefficients).- epsfloat, optional
Due to rounding errors, sometimes a value is flagged as invalid if it is exactly on the edge of the kernel. This value allows a certain tolerance to those incorrect results.
- Returns:
- weightsnumpy.ndarray (float)
The interpolated weights of shape (n,).
- grig.resample_kernel_utils.solve_kernel_fit(window_coordinates, window_values, window_error, window_mask, kernel_weights, fit_coordinate, is_covar=False, cval=nan, error_weighting=True, absolute_weight=False, edge_algorithm_idx=1, edge_threshold=None, get_error=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True)[source]¶
Solve for a kernel convolution at a single coordinate.
Generally, the kernel convolution value is of the form:
\[f(x) = \sum_{j}{d_j k_j} / \sum_{j}{k_j}\]where \(x\) is the coordinate at the fit point, \(j\) are the indices of all samples within the fitting window, \(d\) are the sample values, and \(k\) are the kernel weights.
EDGE CHECKING
The first of the on-the-fly calculation is the “edge check”. Generally, polynomial fits are not well-defined away from the sample distribution from which they were derived. This is especially true for higher order fits that may fit the sample distribution well, but start to deviate wildly when evaluated outside of the distribution. The edge check step defines a border around the distribution, outside of which the fit will be aborted. There are a number of algorithms available which vary in robustness and speed. Please see
check_edges()
for details on available algorithms.FITTING
If the above checks pass, a fit can be attempted. There are two types of fit that may be performed. The first is the standard kernel fit described above. However, if is_covar was set to True, the window_values are considered covariances to propagate, and a fit will derived by propagating a weighted variance.
- Parameters:
- window_coordinatesnumpy.ndarray (n_dimensions, n_samples)
The independent coordinates within the window region of the fitting coordinate.
- window_valuesnumpy.ndarray (n_samples,)
The dependent values of the samples.
- window_errornumpy.ndarray (n_samples,)
The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (1,) in which case all samples in a set will share the same associated error value. If the shape is set to (0,) this indicates that no error values are available for the samples.
- window_masknumpy.ndarray (n_samples,)
A mask where False indicates that the associated sample should be excluded from the fit.
- kernel_weightsnumpy.ndarray (n_samples,)
The kernel weighting factors applied to each sample in the fit.
- fit_coordinatenumpy.ndarray (n_dimensions,)
The coordinate of the fitting point.
- is_covarbool, optional
If True, indicates that window_values contains covariance values that should be propagated through algorithm. If this is the case, polynomial fitting is disabled, and a weighted variance is calculated instead.
- cvalfloat, optional
In a case that a fit is unable to be calculated at certain location, cval determines the returned fit value.
- error_weightingbool, optional
If True, weight the samples in the fit by the inverse variance (1 / window_error^2) in addition to distance weighting.
- absolute_weightbool, optional
If the kernel weights are negative, can lead to almost zero-like divisions in many of the algorithms. If set to True, the sum of the absolute weights are used for normalization.
- edge_algorithm_idxint, optional
Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see
check_edges()
for further information. The default (1), is always the most robust of the available algorithms.- edge_thresholdnumpy.ndarray (n_dimensions,)
A threshold parameter determining how close an edge should be to the center of the distribution during
check_edges()
. Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.- get_errorbool, optional
If True, return the error on the fit.
- get_weightsbool, optional
If True, return the sum of all sample weights used in determining the fit.
- get_distance_weightsbool, optional
If True, return the sum of only the distance weights used in determining the fit.
- get_rchi2bool, optional
If True, return the reduced chi-squared statistic for each of the fits.
- get_offset_variancebool optional
If True, return the offset of the fitting point from the sample distribution. See
offset_variance()
for further information.
- Returns:
- fit_result7-tuple
- fit_result[0]: Fitted value (float).
Set to cval on fit failure.
- fit_result[1]: Error on the fit (float).
Set to NaN on fit failure.
- fit_result[2]: Number of samples included in the fit (int).
Set to 0 on fit failure.
- fit_result[3]: Weight sum (float).
Set to 0.0 on fit failure.
- fit_result[4]: Distance weight sum (float).
Set to 0.0 on fit failure.
- fit_result[5]: Reduced chi-squared statistic (float).
Set to NaN on fit failure.
- fit_result[6]: Offset variance from the distribution center (float).
Set to NaN on fit failure.
- grig.resample_kernel_utils.solve_kernel_fits(sample_indices, sample_coordinates, sample_data, sample_error, sample_mask, fit_coordinates, knots, coefficients, degrees, panel_mapping, panel_steps, knot_steps, nk1, spline_mapping, n_knots, is_covar=False, cval=nan, error_weighting=True, absolute_weight=False, edge_algorithm_idx=1, edge_threshold=None, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True)[source]¶
Solve all fits within one intersection block.
This function is a wrapper for
solve_kernel_fit()
over all data sets and fit points. The main computations here involve:Creating and populating the output arrays.
Selecting the correct samples within the region of each fitting window.
Calculating the full weighting factors for the fits.
For further details on the actual fitting, please see
solve_kernel_fit()
.- Parameters:
- sample_indicesnumba.typed.List
A list of 1-dimensional numpy.ndarray (dtype=int) of length n_fits. Each list element sample_indices[i], contains the indices of samples within the “window” region of fit_indices[i].
- sample_coordinatesnumpy.ndarray (n_dimensions, n_samples)
The independent coordinates for each sample in n_dimensions
- sample_datanumpy.ndarray (n_sets, n_samples)
The dependent values of the samples for n_sets, each containing n_samples.
- sample_errornumpy.ndarray (n_sets, n_samples)
The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (n_sets, 1) in which case all samples in a set will share the same associated error value. If the shape is set to (n_sets, 0), this indicates that no error values are available for the samples.
- sample_masknumpy.ndarray (n_sets, n_samples)
A mask where False indicates that the associated sample should be excluded from all fits.
- fit_coordinatesnumpy.ndarray (n_dimensions, n_fits)
The independent variables at each fit coordinate in d_dimensions.
- knotslist or tuple or numpy.ndarray, optional
A set of starting knot coordinates for each dimension. If a list or tuple is supplied it should be of length n_dimensions where element i is an array of shape (n_knots[i]) for dimension i. If an array is supplied, it should be of shape (n_dimension, max(n_knots)). Note that there must be at least 2 * (degree + 1) knots for each dimension. Unused or invalid knots may be set to NaN, at the end of each array. Knot coordinates must also be monotonically increasing in each dimension.
- coefficientsnumpy.ndarray (float)
The spline coefficients of shape (n_coefficients,).
- degreesnumpy.ndarray (int)
The degrees of the spline in each dimension (n_dimensions,).
- panel_mappingnumpy.ndarray (int)
An array containing the panel mapping (flat to n-D) indices. This is created by passing the panel shape (n_knots - (2 * degrees) - 1) into
flat_index_mapping()
. Should be an array of shape (n_dimensions, n_panels).- panel_stepsnumpy.ndarray (int)
The flat index mapping steps in panel-space of shape (n_dimensions,). These are returned by passing the shape Spline.panel_shape into
flat_index_mapping()
.- knot_stepsnumpy.ndarray (int)
The flat index mapping steps in knot-space of shape (n_dimensions,). These are returned by passing the shape (n_knots - degrees - 1) into
flat_index_mapping()
.- nk1numpy.ndarray (int)
An array of shape (n_dimensions,) containing the values n_knots - k1 where n_knots are the number of knots in each dimension, and k1 are the spline degrees + 1 in each dimension.
- spline_mappingnumpy.ndarray (int)
An array containing the spline mapping (flat to n-D) indices. This is created by passing the spline shape (degrees + 1) into
flat_index_mapping()
. Should be an array of shape (n_dimensions, n_spline_coefficients).- n_knotsnumpy.ndarray (int)
The number of knots in each dimension (n_dimensions,).
- is_covarbool, optional
If True, indicates that sample_data contains covariance values that should be propagated through algorithm.
- cvalfloat, optional
In a case that a fit is unable to be calculated at certain location, cval determines the fill value for the output fit array at those locations.
- error_weightingbool, optional
If True, weight the samples in the fit by the inverse variance (1 / sample_error^2) in addition to distance weighting.
- absolute_weightbool, optional
If the kernel weights are negative, can lead to almost zero-like divisions in many of the algorithms. If set to True, the sum of the absolute weights are used for normalization.
- edge_algorithm_idxint, optional
Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see
check_edges()
for further information. The default (1), is always the most robust of the available algorithms.- edge_thresholdnumpy.ndarray (n_dimensions,)
A threshold parameter determining how close an edge should be to the center of the distribution during
check_edges()
. Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.- get_errorbool, optional
If True, return the error on the fit.
- get_countsbool, optional
If True, return the number of samples used when determining the fit at each fitting point.
- get_weightsbool, optional
If True, return the sum of all sample weights used in determining the fit at each point.
- get_distance_weightsbool, optional
If True, return the sum of only the distance weights used in determining the fit at each point.
- get_rchi2bool, optional
If True, return the reduced chi-squared statistic for each of the fitted points.
- get_offset_variancebool optional
If True, return the offset of the fitting point from the sample distribution. See
offset_variance()
for further information.
- Returns:
- fit_results7-tuple of numpy.ndarray
fit_results[0]: Fitted values. fit_results[1]: Error on the fit. fit_results[2]: Number of samples in each fit. fit_results[3]: Weight sums. fit_results[4]: Distance weight sums. fit_results[5]: Reduced chi-squared statistic. fit_results[6]: Offset variances from the sample distribution center.
All arrays have the shape (n_sets, n_fits) or (0, 0) depending on whether get_<name> is True or False respectively.
grig.resample_polynomial module¶
- class grig.resample_polynomial.ResamplePolynomial(coordinates, data, error=None, mask=None, window=None, robust=None, negthresh=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, order=1, fix_order=True, large_data=None, check_memory=True, memory_buffer=0.25, **distance_kwargs)[source]¶
Bases:
ResampleBase
- Attributes:
features
int : number of data features (dimensions)
fit_settings
dict : Fit reduction settings applied during last call
fit_tree
Return the fitting tree representative of points to fit.
grid_class
Return the grid class of the resampler
multi_set
bool : True if solving for multiple data sets
n_samples
int : The number of samples in each data set.
n_sets
int : The number of data sets to fit.
order
Return the order of polynomial fit.
window
numpy.ndarray (n_features,) : Window radius in each dimension.
Methods
__call__
(*args[, smoothing, ...])Resample data defined during initialization to new coordinates.
block_loop
(sample_values, error, mask, ...)Perform resampling reduction in parallel or series.
calculate_adaptive_smoothing
(settings)Calculate the adaptive distance weighting kernel.
Return the minimum number of points for a polynomial fit.
check_order
(order, n_features, n_samples)Check the order is of correct format and enough samples exist.
combine_blocks
(blocks, n_sets, n_fits, ...)Combines the results from multiple reductions into one set.
estimate_feature_windows
(coordinates[, ...])Estimates the radius of the fitting window for each feature.
estimate_max_bytes
(coordinates[, window, ...])Estimate the maximum number of bytes required by a reduction.
get_grid_class
()Return the appropriate grid class for the resampler.
global_resampling_values
()Return the global resampling values.
pre_fit
(settings, *args[, ...])Perform pre-fitting steps and build the fitting tree.
process_block
(args, block)Run
solve_fits()
on each block.process_blocks
(args, kwargs, settings, ...)Wrapper for handling block resampling in a multiprocessing environment.
reduction_settings
([smoothing, ...])Define a set of reduction instructions based on user input.
set_sample_tree
(coordinates[, radius, ...])Build the sample tree from input coordinates.
sufficient_memory_for
(memory)Return whether there is sufficient available memory for the task.
- calculate_adaptive_smoothing(settings)[source]¶
Calculate the adaptive distance weighting kernel.
Calculates a weighting kernel for each sample in the reduction. i.e, each sample will have a different weighting factor based on its distance (and optionally, direction) from each point at which a fit is required. This is done by performing a fit centered on each sample, and measuring how that fit deviates from the actual observed sample value. The first step is to decide upon the distance weighting factor used to perform the initial fit.
It is assumed that the coordinate error is known (or approximately known), and supplied as a \(1 \sigma\) error as the ‘alpha’ keyword in settings. For example, the error in astronomical observations taken using a Gaussian beam with known FWHM is:
\[\sigma = \frac{\text{FWHM}}{2 \sqrt{2 \ln{2}}}\]However, adaptive weighting may be calculated over select dimensions, leaving some fixed. In this case, the alpha keyword will always apply a weighting factor in a “fixed” dimension \(k\) as:
\[w_k = exp \left( \frac{-\Delta x_k^2}{\alpha_{fixed, k} \Omega_k^2} \right)\]where \(\Delta x_k\) is the distance from the resampling point to the sample in question, and \(\Omega_k\) is the window principle axis in dimension \(k\). Note that in the weighting function, \(\sigma\) is related to \(\alpha\) by:
\[\alpha_k = 2 \sigma_k^2\]and weighting is applied using the definition of \(w_k\) above. To signify that \(\alpha_{fixed, k}\) should be used instead of \(\sigma\) for dimension \(k\), the “adaptive_threshold” array in settings should be set to zero for the corresponding dimension. For example, if we had:
settings['alpha'] = [0.3, 0.3] settings['adaptive_threshold'] = [1, 0]
the first dimension would have \(\alpha_0=0.18\) (\(2 \times 0.3^2\)), and the second would have \(\alpha_1=0.3\). In this example, \(\alpha_0\) would be allowed to vary per sample, while \(\alpha_1\) would be fixed for each sample at 0.3. An initial fit is then performed at each sample coordinate using a test distance weighting parameter:
\[ \begin{align}\begin{aligned}\sigma_{test, k} = \frac{\pi \sigma_k}{\sqrt{2 ln{2}}}\\\alpha_{test, k} = 2 \sigma_{test, k}^2\end{aligned}\end{align} \]Using this initial weighting parameter, the adaptive kernels are derived using either
resample_utils.shaped_adaptive_weight_matrix()
orresample_utils.scaled_adaptive_weight_matrix()
depending on whether the “shaped” keyword in settings is set to True or False respectively.The weighting kernels are stored in the “adaptive_alpha” keyword value in settings.
- Parameters:
- settingsdict
Reduction settings, as returned by
ResamplePolynomial.reduction_settings()
.
- Returns:
- None
- calculate_minimum_points()[source]¶
Return the minimum number of points for a polynomial fit.
- Returns:
- minimum_pointsint
The minimum number of points for a polynomial fit
- static check_order(order, n_features, n_samples)[source]¶
Check the order is of correct format and enough samples exist.
- Parameters:
- orderint or array_like of int (n_features,)
The polynomial order to fit, either supplied as an integer to be applied over all dimensions, or as an array to give the order in each dimension.
- n_featuresint
The number of features (dimensions) of the sample coordinates.
- n_samplesint
The number of samples.
- Returns:
- orderint or numpy.ndarray of int (n_features,)
The formatted polynomial order.
- Raises:
- ValueError
If there are too few samples for the desired order, or order is not formatted correctly.
- property fit_tree¶
Return the fitting tree representative of points to fit.
- Returns:
- PolynomialTree
- property order¶
Return the order of polynomial fit.
- Returns:
- orderint or numpy.ndarray (int)
A symmetrical order, or the order for each feature.
- pre_fit(settings, *args, adaptive_region_coordinates=None)[source]¶
Perform pre-fitting steps and build the fitting tree.
- Parameters:
- settingsdict
Settings calculated via reduction_settings to be applied if necessary.
- argsn-tuple
The call input arguments.
- adaptive_region_coordinatesnumpy.ndarray (float), optional
The coordinates determined from a previous adaptive smoothing algorithm.
- Returns:
- None
- classmethod process_block(args, block)[source]¶
Run
solve_fits()
on each block.Utility function that parses the settings and tree objects to something usable by the numba JIT compiled resampling functions. This is not meant to be called directly.
- Parameters:
- args2-tuple
A tuple of form (filename, iteration) where the filename is a string pointing towards a previously saved pickle file containing the relevant information for the reduction if required. If set to None, the arguments are retrieved from the _global_resampling_values global parameter.
- blockint
The block index to process.
- Returns:
- results9-tuple of numpy.ndarray
The first element contains the fit point indices to be fit. For the remaining elements, please see
solve_fits()
return values.
- reduction_settings(smoothing=0.0, relative_smooth=False, adaptive_threshold=None, adaptive_algorithm='scaled', error_weighting=True, fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', order_algorithm='bounded', is_covar=False, estimate_covariance=False, jobs=None, adaptive_region_coordinates=None, use_threading=None, use_processes=None)[source]¶
Define a set of reduction instructions based on user input.
This method is responsible for determining, formatting, and checking a number variables required for the resampling algorithm based on user input. For detailed descriptions of user options, please see
ResamplePolynomial.__call__()
.- Parameters:
- smoothingfloat or array_like (n_features,), optional
- relative_smoothbool, optional
- adaptive_thresholdfloat or array_like (n_features,), optional
- adaptive_algorithmstr, optional
- error_weightingbool, optional
- fit_thresholdfloat, optional
- cvalfloat, optional
- edge_thresholdfloat or array_like (n_features,), optional
- edge_algorithmstr, optional
- order_algorithmstr, optional
- is_covarbool, optional
- estimate_covariancebool, optional
- jobsint, optional
- adaptive_region_coordinatesarray_like, optional
- use_threadingbool, optional
If True, force use of threads during multiprocessing.
- use_processesbool, optional
If True, force use of sub-processes during multiprocessing.
- Returns:
- settingsdict
The reduction settings. Also, stored as
ResamplePolynomial.fit_settings()
.
- set_sample_tree(coordinates, radius=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=None, check_memory=True, memory_buffer=0.25, **distance_kwargs)[source]¶
Build the sample tree from input coordinates.
- Parameters:
- coordinatesnumpy.ndarray (float)
The input coordinates of shape (n_features, n_samples).
- radiusfloat or sequence (float), optional
The radius of the window around each fitting point used to determine sample selection for fit. If not supplied, will be estimated using
ResamplePolynomial.estimate_feature_windows()
.- window_estimate_binsint, optional
Used to estimate the window if not supplied using
ResamplePolynomial.estimate_feature_windows()
.- window_estimate_percentileint or float, optional
Used to estimate the window if not supplied using
ResamplePolynomial.estimate_feature_windows()
.- window_estimate_oversampleint or float, optional
Used to estimate the window if not supplied using
ResamplePolynomial.estimate_feature_windows()
.- leaf_sizeint, optional
Number of points at which to switch to brute-force during the ball tree query algorithm. See sklearn.neighbours.BallTree for further details.
- large_databool, optional
If True, indicates that this resampling algorithm will run on a large set of data, and the ball tree should be created on subsets of the data.
- check_memorybool, optional
If True, check the memory requirements for resampling the supplied data.
- memory_bufferfloat, optional
A fraction (positive or negative) with which to modify the memory estimates for the process memory requirements. Memory estimates are scaled by the factor 1 + memory_buffer.
- distance_kwargsdict, optional
Optional keyword arguments passed into
sklearn.neighbors.DistanceMetric()
. The default is to use the “minkowski” definition with p=2, i.e., the Euclidean definition. This is important in determining which samples lie inside the window region of a resampling point, and when deriving distance weighting factors.
- Returns:
- None
- grig.resample_polynomial.resamp(coordinates, data, *locations, error=None, mask=None, window=None, order=1, fix_order=True, robust=None, negthresh=None, window_estimate_bins=10, window_estimate_percentile=50, window_estimate_oversample=2.0, leaf_size=40, large_data=False, smoothing=0.0, relative_smooth=False, adaptive_threshold=None, adaptive_algorithm='scaled', fit_threshold=0.0, cval=nan, edge_threshold=0.0, edge_algorithm='distribution', order_algorithm='bounded', error_weighting=True, estimate_covariance=False, is_covar=False, jobs=None, get_error=False, get_counts=False, get_weights=False, get_distance_weights=False, get_rchi2=False, get_cross_derivatives=False, get_offset_variance=False, **distance_kwargs)[source]¶
ResamplePolynomial data using local polynomial fitting.
Initializes and then calls the
ResamplePolynomial
class. For further details on all available parameters, please seeResamplePolynomial.__init__()
andResamplePolynomial.__call__()
.- Parameters:
- coordinates
- data
- locations
- error
- mask
- window
- order
- fix_order
- robust
- negthresh
- window_estimate_bins
- window_estimate_percentile
- window_estimate_oversample
- leaf_size
- large_data
- smoothing
- relative_smooth
- adaptive_threshold
- adaptive_algorithm
- fit_threshold
- cval
- edge_threshold
- edge_algorithm
- order_algorithm
- error_weighting
- estimate_covariance
- is_covar
- jobs
- get_error
- get_counts
- get_weights
- get_distance_weights
- get_rchi2
- get_cross_derivatives
- get_offset_variance
- distance_kwargs
- Returns:
- resultsfloat or numpy.ndarray or n-tuple of (float or numpy.ndarray)
If a fit is performed at a single location, the output will consist of int or float scalar values. Multiple fits result in numpy arrays. The exact output shape depends on the number of data sets, number of fitted points, dimensions of the fit locations. Assuming that all get_* keywords are set to True, the output order is:
results[0] = fitted values results[1] = error on the fit results[2] = sample counts for each fit results[3] = total weight of all samples in fit results[4] = total distance weight sum of all samples in fit results[5] = reduced chi-squared statistic of the fit results[6] = derivative mean squared cross products results[7] = offset variance of fit from sample distribution
grig.resample_utils module¶
- grig.resample_utils.apply_mask_to_set_arrays(mask, data, phi, error, weights, counts)[source]¶
Set certain arrays to a fixed size based on a mask array.
- Parameters:
- masknumpy.ndarray of bool (N,)
Mask where True values indicate the associated element should be kept, and False will result in exclusion from the output arrays.
- datanumpy.ndarray (N,)
The data array.
- phinumpy.ndarray (n_terms, N)
The polynomial terms of the fit equation.
- errornumpy.ndarray
An array of shape (1,) or (N,). If an array of size 1 is supplied, it will be expanded to an array of counts size.
- weightsnumpy.ndarray
An array of shape (1,) or (N,). If an array of size 1 is supplied, it will be expanded to an array of counts size.
- countsint
The number of True values in the mask. Determines the output size of all arrays.
- Returns:
- data_out, phi_out, error_out, weight_out4-tuple of numpy.ndarray.
Resized arrays in which the last axis is of size counts.
- grig.resample_utils.array_sum(mask)[source]¶
Return the sum of an array.
Utility function for fast
numba
calculation of the sum of a 1-D numpy array.- Parameters:
- masknumpy.ndarray (N,)
The input array.
- Returns:
- countint or float
The sum of values.
- grig.resample_utils.calculate_adaptive_distance_weights_scaled(coordinates, reference, adaptive_alpha)[source]¶
Returns distance weights based on offsets and scaled adaptive weighting.
Given a set of \(K\) dimensional coordinates (\(x\)), a single reference position (\(x_{ref}\)), and scaling factors adaptive_alpha (\(A\)), returns the weighting factor:
\[w(x) = exp \left( -\sum_{i=1}^{K} \sum_{j=1}^{K} {{\Delta x}_i A_{i,j} {\Delta x}_j} \right)\]where \({\Delta x}_k = x_{ref, k} - x_k\) for dimension \(k\).
Unlike
calculate_distance_weights_from_matrix()
, this function applies the function over multiple sets. In this context, a single set will contain the same independent values (coordinates) as all other sets in a reduction, but the dependent data values may vary between sets. Therefore, it is necessary for the adaptive weighting values to also vary between sets.The algorithm is equivalent to
calculate_adaptive_distance_weights_shaped()
except when no rotation is applied, and therefore only a 1-dimensional array, rather than a matrix, is required to perform the transform. The third axis of adaptive_alpha is set to one to allownumba
to compile the function successfully, also indicating that there is no need to store the off-diagonal elements since they are all zero. All stretching will therefore occur along the dimensional axes.- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
- referencenumpy.ndarray (n_dimensions,)
- adaptive_alphanumpy.ndarray
(n_coordinates, n_sets, 1, n_dimensions) array containing the scaled adaptive weighting factors.
- Returns:
- weightsnumpy.ndarray (n_sets, n_coordinates)
The distance weighting factors.
- grig.resample_utils.calculate_adaptive_distance_weights_shaped(coordinates, reference, shape_matrices)[source]¶
Returns distance weights based on offsets and shaped adaptive weighting.
Given a set of \(K\) dimensional coordinates (\(x\)), a single reference position (\(o\)), and the symmetric matrix shape_matrices (\(A\)), returns the weighting factor:
\[w(x) = exp(-\Delta x^T A^{-1} \Delta x)\]where \({\Delta x}_k = o_k - x_k\) for dimension \(k\).
This function is applied to multiple coordinates over multiple sets. In this context, a single set will contain the same independent values (coordinates) as all other sets in a reduction, but the dependent data values may vary between sets. Therefore, it is necessary for the adaptive weighting values to also vary between sets.
Unlike
calculate_adaptive_distance_weights_scaled()
, the matrix \(A\) allows for the kernel weighting function to be stretched along arbitrarily rotated orthogonal axes.- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
- referencenumpy.ndarray (n_dimensions,)
- shape_matricesnumpy.ndarray
(n_coordinates, n_sets, n_dimensions, n_dimensions) array containing the shaped adaptive weighting matrix for all coordinates and sets.
- Returns:
- weightsnumpy.ndarray (n_sets, n_coordinates)
The distance weighting factors.
- grig.resample_utils.calculate_distance_weights(coordinates, reference, alpha)[source]¶
Returns a distance weighting based on coordinate offsets.
Given a set of \(K\) dimensional coordinates (\(x\)), a single reference position (\(x_{ref}\)), and the scaling factor alpha (\(\alpha\)), returns the weighting factor:
\[w(x) = exp \left( \sum_{k=1}^{K}{\frac{-(x_{ref, k} - x_k)^2}{\alpha_k}} \right)\]- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, N)
An array of N coordinates in n_dimensions.
- referencenumpy.ndarray (n_dimensions,)
The reference position from which to determine distance offsets for the weighting function.
- alphanumpy.ndarray (1 or n_dimensions,)
The distance scaling factor. If an array of size 1 is supplied, it will be applied over all dimensions. Otherwise, a value must be provided for each dimension.
- Returns:
- weightsnumpy.ndarray (N,)
The distance weighting factors.
Notes
alpha relates to the standard deviation (\(\sigma\)) in a normal distribution as \(\alpha = 2\sigma^2\).
- grig.resample_utils.calculate_distance_weights_from_matrix(coordinates, reference, alpha_matrix)[source]¶
Returns distance weights based on coordinate offsets and matrix operation.
Given a set of \(K\) dimensional coordinates (\(x\)), a single reference position (\(o\)), and the symmetric matrix alpha_matrix (\(A\)), returns the weighting factor:
\[w(x) = exp(-\Delta x^T A \Delta x)\]where \({\Delta x}_k = o_k - x_k\) for dimension \(k\).
- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
An array of N coordinates in n_dimensions.
- referencenumpy.ndarray (n_dimensions,)
The reference position from which to determine distance offsets for the weighting function.
- alpha_matrixnumpy.ndarray (n_dimensions, n_dimensions)
Defines the matrix operation to perform on the coordinate offsets. Note that in this implementation, it should be a symmetric matrix such that \(A = A^T\). As such, only the upper triangle is iterated over and any off-diagonal elements in the lower triangle are ignored.
- Returns:
- weightsnumpy.ndarray (N,)
The distance weighting factors.
- grig.resample_utils.calculate_fitting_weights(errors, weights, error_weighting=True)[source]¶
Calculate the final weighting factor based on errors and other weights.
If error_weighting is applied, the return value is weights / error`^2. Otherwise, simply returns `weights.
- Parameters:
- errorsnumpy.ndarray (N,)
1-sigma error values to apply to weighting.
- weightsnumpy.ndarray (N,)
Other weighting factors, not including any type of error weighting.
- error_weightingbool
If False, returns weights, otherwise returns weights / errors^2.
- Returns:
- fit_weightingnumpy.ndarray (N,)
The final fitting weights.
Notes
The square root of the weight is used in the polynomial system of equations rather than the actual weight due to how the least-squares solution is derived.
For the linear system of equations A.x = B, we are solving (A^T.W.A).C = A^T.W.B, where C are the coefficients we wish to find. If we set X = sqrt(W).A, and Y = sqrt(W).B, this is the same as (X^T.X).C = X^T.Y, which can be solved easily. Another way of thinking about it is that we are minimizing the squared residuals (y - f(x))^2.
- grig.resample_utils.check_edge_with_box(coordinates, reference, mask, threshold)[source]¶
Defines a hyperrectangle edge around a coordinate distribution.
Given a distribution (\(X\)) of \(N\) samples in \(K\) dimensions, the center of mass for dimension \(k\) is:
\[\bar{X}_{k} = \frac{1}{N} \sum_{i=1}^{N}{X_{ik}}\]The hypercube center is at \(\bar{X}\) and its width in each dimension \(k\) is \(2\beta_k\) where \(\beta = 1 - \text{threshold}\). Note that the resampling algorithm scales all coordinates to a window parameter such that \(|X_k| \leq 1\), and the threshold parameter therefore defines a fraction of the window in the range \(0 < \text{threshold} < 1\). If threshold[k] = 0 or 1, then no edge will be defined for dimension \(k\).
If this definition the “edge” for dimension \(k\) is at \(\bar{X}_k \pm \beta_k\) and reference (\(X_{ref}\)) is considered inside the edge if:
\[| \bar{X}_k - X_{ref, k} | \leq \beta_k, \, \forall k\]- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_samples)
The coordinates of the sample distribution.
- referencenumpy.ndarray (n_dimensions,)
The reference coordinate.
- masknumpy.ndarray of bool (n_samples,)
A mask where False values exclude the corresponding sample from the center-of-mass calculation.
- thresholdnumpy.ndarray (n_dimensions,)
Defines the half-width dimensions (1 - threshold) of the hyperrectangle centered on the coordinate center of mass in units of the resampling window parameter. Must be in the range 0 < threshold < 1.
- Returns:
- insidebool
True if reference is inside the distribution “edge” and False otherwise.
- grig.resample_utils.check_edge_with_distribution(coordinates, reference, mask, threshold)[source]¶
Defines an edge based on statistical deviation from a sample distribution.
Given a sample distribution (\(X\)) of coordinates, the deviation of a reference coordinate \(X_{ref}\) from the mean sample distribution \(\bar{X}\) is given as:
\[\sigma_{ref} = \sqrt{(X_{ref} - \bar{X})^{T} \Sigma^{-1} (X_{ref} - \bar{X})}\]where \(\Sigma\) is the sample covariance of \(X\). In this definition, the “edge” of the distribution is defined at \(\beta = 1 / threshold\) so that reference locations where \(\sigma_{ref} \leq \beta\) are considered inside the distribution edge. For example, setting threshold = 2 will return a False value if \(\sigma_{ref} > 0.5\).
- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_samples)
The coordinates of the sample distribution.
- referencenumpy.ndarray (n_dimensions,)
The reference coordinate.
- masknumpy.ndarray of bool (n_samples,)
A mask where False values exclude the corresponding sample from any distribution statistics.
- thresholdint or float
The “edge” of the distribution is given by 1 / threshold. If the deviation of the reference coordinate from the distribution mean is greater than the edge, a False value will be returned. Setting threshold to zero results in an edge at infinity, i.e., all reference coordinates will be considered inside the distribution edge.
- Returns:
- insidebool
True if reference is inside the distribution “edge” and False otherwise.
- grig.resample_utils.check_edge_with_ellipsoid(coordinates, reference, mask, threshold)[source]¶
Defines an ellipsoid edge around a coordinate distribution.
Given a distribution (\(X\)) of \(N\) samples in \(K\) dimensions, the center of mass for dimension \(k\) is:
\[\bar{X}_{k} = \frac{1}{N} \sum_{i=1}^{N}{X_{ik}}\]The ellipsoid center is at \(\bar{X}\) with principle axes given by \(\beta\), where \(\beta = 1 - \text{threshold}\). Note that the resampling algorithm scales all coordinates to a window parameter such that \(|X_k| \leq 1\), and the threshold parameter therefore defines a fraction of the window in the range \(0 < \text{threshold} < 1\). If threshold[k] = 0 or 1, then no edge will be defined for dimension \(k\), and the ellipsoid definition will only apply over remaining dimensions (dimension \(k\) will be ignored in all calculations).
A reference coordinate (\(X_{ref}\)) is considered inside the ellipsoid edge if:
\[\sum_{k=1}^{K}{\frac{(X_{ref, k} - \bar{X}_k)^2}{\beta_k^2}} \leq 1\]- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_samples)
The coordinates of the sample distribution.
- referencenumpy.ndarray (n_dimensions,)
The reference coordinate.
- masknumpy.ndarray of bool (n_samples,)
A mask where False values exclude the corresponding sample from the center-of-mass calculation.
- thresholdnumpy.ndarray (n_dimensions,)
Defines the principle axes (1 - threshold) of an ellipsoid centered on the coordinate center of mass in units of the resampling window parameter. Must be in the range 0 < threshold < 1.
- Returns:
- insidebool
True if reference is inside the distribution “edge” and False otherwise.
- grig.resample_utils.check_edge_with_range(coordinates, reference, mask, threshold)[source]¶
Defines an edge based on the range of coordinates in each dimension.
Given a distribution of sample coordinates (\(X\)), and a reference position (\(X_{ref}\)) in \(K\) dimensions, check for each dimension \(k\):
\[\{-\beta_k, \beta_k\} \in [min(X_k - X_{ref, k}), max(X_k - X_{ref, k})] , \, \forall k\]where \(\beta = 1 - \text{threshold}\). In order words, in each dimension \(k\), there must be at least one member of \(X_k\) at a distance of \(\beta_k\) from \(X_{ref, k}\) for both the positive and negative directions along \(k\).
Note that the resampling algorithm scales all coordinates to a window parameter such that \(|X_k| \leq 1\), and the threshold parameter therefore defines a fraction of the window in the range \(0 < \text{threshold} < 1\). If threshold[k] < 0 or theshold > 1, then no check will be performed for dimension \(k\).
- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_samples)
The coordinates of the sample distribution.
- referencenumpy.ndarray (n_dimensions,)
The reference coordinate.
- masknumpy.ndarray of bool (n_samples,)
A mask where False values exclude the corresponding sample from the range check.
- thresholdnumpy.ndarray (n_dimensions,)
Defines the threshold. Must be in the range 0 < threshold < 1.
- Returns:
- insidebool
True if reference is inside the distribution “edge” and False otherwise.
- grig.resample_utils.check_edges(coordinates, reference, mask, threshold, algorithm=1)[source]¶
Determine whether a reference position is within a distribution “edge”.
The purpose of this function is to allow the resampling algorithm to determine whether a fit should be performed at a reference location given a sample distribution of coordinates. If a fit is attempted too far from the mean sample distribution, it will lead to a misleading result which becomes more pronounced at higher fit orders.
Therefore, the sample distribution is assigned an “edge” based on one of four definitions. If the reference position is outside of this edge, a False value will be returned, and no fitting will occur.
For all edge definition algorithms, the threshold parameter will determine the distance of the edge from the sample mean. As threshold is increased, the edge approaches the sample mean resulting in a more severe clipping of fit locations away from the center of a distribution. If threshold = 0, no edge clipping will occur.
Since the main engine of the resampling algorithm relies on
numba
, the edge algorithm should be supplied as an integer. Please see the relevant function listed below for further details on how the “edge” is defined.algorithm
Function
1
2
3
4
Generally, algorithms are ordered from most to least robust, and slowest to fastest, so, the default (1) is considered the most robust (although slowest) of the available algorithms.
When dealing with more than one dimension, the
check_edge_with_distribution()
algorithm is recommended as it accounts for the shape of the distribution. If the sample distribution is unknown (as opposed to a set of uniformly spaced coordinates), there is a chance for some samples to be (or to approach) a collinear distribution. Attempting a fit at any location in a tangential direction away from the distribution would likely result in a very poor fit.- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_samples)
The coordinates of the sample distribution.
- referencenumpy.ndarray (n_dimensions,)
The reference coordinate to test.
- masknumpy.ndarray of bool (n_samples,)
A mask where True values indicate a sample should be included in the edge determination.
- thresholdnumpy.ndarray (n_dimensions,)
A threshold parameter determining how close an edge should be to the center of the distribution. Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.
- algorithmint, optional
Integer specifying which edge definition to use. Please see above for the associated functions. Invalid choices will disable edge checking.
- Returns:
- insidebool
True if the reference coordinate is inside the edge of the sample distribution, and False otherwise.
- grig.resample_utils.check_orders(orders, coordinates, reference, algorithm=1, mask=None, minimum_points=None, required=False, counts=-1)[source]¶
Checks the sample distribution is suitable for a polynomial fit order.
For a polynomial fit to be successful at a given order, one needs to ensure that the sample distribution is suitable for such a fit. At a minimum, there need to be enough samples to derive a fit. However, unless the samples are known to be distributed in such a way where a fit is always possible (such as regularly spaced samples), there is the possibility that the fit may be under-determined. For example, it is not possible to perform any other fit than the mean of sample values (order 0) if the samples all share the same coordinate.
This problem is compounded by the use of
numba
which does not allow any exception handling. If a fit fails at a single reference position, the whole algorithm will fail.There are several algorithms available, ordered in decreasing robustness, but increasing speed. The chosen algorithm must be supplied using an integer label (because
numba
). Please see the relevant function listed in the table below for a more detailed description of each. The order_algorithm parameter is supplied to the main resampling algorithm during __init__ and is used to select the relevant algorithm.algorithm
Function
order_algorithm
1
‘bounded’
2
‘extrapolate’
3
‘counts’
Generally,
check_orders_with_bounds()
is the most robust and ensures that a fit will not only succeed, but is not likely to deviate widely from the expected result. Thecheck_orders_without_bounds()
function should also allow fits to succeed, but allow fits to be generated away from the sample distribution. This may be desirable, for example, if one is resampling an image and wishes to perform fits close to the edge. Finally,check_orders_with_counts()
is fast, but there is a decent possibility that the fit will fail if the user cannot guarantee that the samples are distributed appropriately.In rare cases, a distribution of collinear samples, not aligned along any dimensional axis may pass the check_orders test causing resampling algorithm to fail. In this case, please consider using
check_edge_with_distribution()
to perform such a check. This may be invoked by supplying edge_algorithm=’distribution’ during the initialization of the main resampling algorithm.- Parameters:
- algorithmint
An integer specifying the order checking algorithm to use. If an invalid option is supplied (not listed in the table above), the return value of -1 will abort any subsequent fitting.
- ordersnumpy.ndarray of int
The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions. This serves as an upper limit for the check. If the samples are distributed in a way that allows for a fit to be performed using orders, the return value will also be orders.
- coordinatesnumpy.ndarray (n_dimensions, n_samples)
The coordinates of the sample distribution. Not used by the counts algorithm, but must still be supplied as an array with the correct dimensions.
- referencenumpy.ndarray (n_dimensions,)
The coordinates of the point at which to perform a fit. Only required by the ‘bounded’ algorithm, but a value of the correct array shape must still be supplied.
- masknumpy.ndarray of bool (n_samples,), optional
An optional mask where False values indicate the associated sample should not be included in determining the maximum order.
- minimum_pointsint, optional
The minimum number of points required to perform a fit of the desired order, optionally passed in for speed if pre-calculated. Only used by the ‘counts’ algorithm.
- requiredbool, optional
If required is False, the maximum available order given the distribution will be returned (up to a maximum of orders). If required is True, and the maximum available order is less than orders, the first element of the return value will be set to -1, indicating the criteria was not met.
- countsint, optional
This is required by the ‘counts’ algorithm. If counts < 0, it will be determined by sum(mask). If counts is less than zero, and a mask is not supplied, not fit will be performed.
- Returns:
- maximum_ordersnumpy.ndarray
An array of shape (1,) or (n_dimensions,) based on whether a single orders was passed in for all dimensions, or each dimension has a separate order requirement. If required was set to True, and the sample distribution did not allow for the requested order, the first element will be set to -1. Otherwise, if required was False, the maximum order for each dimension will be returned. If a single orders was to be applied over all dimensions, the return value will also be of size 1, but contains the min(maximum_order) over all dimensions.
- grig.resample_utils.check_orders_with_bounds(orders, coordinates, reference, mask=None, required=False)[source]¶
Checks maximum order for sample coordinates bounding a reference.
Given the coordinates of a sample distribution (\(X\)), a reference position (\(X_{ref}\)), and the desired orders of fit (\(o\)), returns the maximum available order of fit.
For dimension \(k\) define the sets of unique values:
\[\begin{split}s_k^- = \{ x \in X_k |\, x < X_{ref, k} \} \\ s_k^+ = \{ x \in X_k |\, x > X_{ref, k} \}\end{split}\]The maximum order is then given as:
\[o_k^{max} = min\{ |s_k^-|, |s_k^+|, o_k \}\]where \(|.|\) represents the cardinality (size) of the set.
For example, consider a 1-dimensional set of coordinates:
\[X = [1, 1, 1, 2, 3, 4, 5, 5, 5, 5, 6]\]and we wish to perform an order=3 polynomial fit at \(X_{ref}=2.5\). There are 4 unique values of \(X > X_{ref}\) (\(\{3, 4, 5, 6\}\)), but only 2 unique values of \(X < X_{ref}\) (\(\{1, 2\}\)). The return value will be 2, indicating that only a 2nd order polynomial fit should be attempted. If a 1st order polynomial fit was requested, the return value would be 1 since there are enough points less than and greater than the reference.
- Parameters:
- ordersnumpy.ndarray of int
The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions. This serves as an upper limit for the check. If the samples are distributed in a way that allows for a fit to be performed using orders, the return value will also be orders.
- coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
The coordinates of the sample distribution.
- referencenumpy.ndarray (n_dimensions,)
The reference coordinate.
- masknumpy.ndarray of bool (n_coordinates,), optional
An optional mask where False values indicate the associated sample should not be included in determining the maximum order.
- requiredbool, optional
If required is False, the maximum available order given the distribution will be returned (up to a maximum of orders). If required is True, and the maximum available order is less than orders, the first element of the return value will be set to -1, indicating the criteria was not met.
- Returns:
- maximum_ordersnumpy.ndarray
An array of shape (1,) or (n_dimensions,) based on whether a single orders was passed in for all dimensions, or each dimension has a separate order requirement. If required was set to True, and the sample distribution did not allow for the requested order, the first element will be set to -1. Otherwise, if required was False, the maximum order for each dimension will be returned. If a single orders was to be applied over all dimensions, the return value will also be of size 1, but contains the min(maximum_order) over all dimensions.
- grig.resample_utils.check_orders_with_counts(orders, counts, mask=None, minimum_points=None, n_dimensions=None, required=False)[source]¶
Checks maximum order based only on the number of samples.
For \(N\) samples of \(K\) dimensional data, the minimum number of samples required to perform a polynomial fit with orders \(o\) is:
\[N_{min} = \prod_{k=1}^{K}{(o_k + 1)}\]if \(o_k = o_0, \, \forall k\), then it is possible to suggest a lower order over all dimensions in the case where \(N < N_{min}\). This is given as:
\[o_k^{max} = min\{ floor(N ^ {1 / K} - 1), o_k \}\]The suggested maximum order is returned by setting the required keyword to False. If the orders vary between dimensions or required is True, the value of \(o_0\) is set to -1 indicating a polynomial fit of the desired order is not possible.
- Parameters:
- ordersnumpy.ndarray of int
The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions. This serves as an upper limit for the check. If the samples are distributed in a way that allows for a fit to be performed using orders, the return value will also be orders.
- countsint
The number of samples available for the fit.
- masknumpy.ndarray of bool (n_coordinates,), optional
An optional mask where False values indicate the associated sample should not be included in determining the maximum order. The counts parameter will be ignored in favor of sum(mask).
- minimum_pointsint, optional
The minimum number of points required to perform a fit of the desired order, optionally passed in for speed.
- n_dimensionsint, optional
If orders was supplied as an array of size 1, but should be applied over multiple dimensions, the number of dimensions should be supplied. Otherwise, the number of dimensions is taken to be equal to the number of orders.
- requiredbool, optional
If required is False, the maximum available order given the distribution will be returned (up to a maximum of orders). If required is True, and the maximum available order is less than orders, the first element of the return value will be set to -1, indicating the criteria was not met.
- Returns:
- maximum_ordersnumpy.ndarray
An array of shape (1,) or (n_dimensions,) based on whether a single orders was passed in for all dimensions, or each dimension has a separate order requirement. If required was set to True, and the sample distribution did not allow for the requested order, the first element will be set to -1. Unlike
check_orders_with_bounds()
orcheck_orders_without_bounds()
, a suggested maximum order can only be returned by setting required to False if orders are equal in all dimensions. Otherwise, it is impossible to know which dimension the order should be reduced for. In this case, the first element of maximum_orders will be set to -1.
- grig.resample_utils.check_orders_without_bounds(orders, coordinates, mask=None, required=False)[source]¶
Checks maximum order based on unique samples, irrespective of reference.
Given the coordinates of a sample distribution (\(X\)), and the desired orders of fit (\(o\)), returns the maximum available order of fit. Unlike
check_orders_with_bounds()
, the location of the fit is unimportant. All that is required is for enough unique sample coordinates to be available for fitting.For dimension \(k\), the maximum fit order is given as:
\[o_k^{max} = min\{ |X_k| - 1, o_k \}\]where \(|.|\) represents the cardinality (size) of the set.
For example, consider a 1-dimensional set of coordinates:
\[X = [1, 1, 1, 2, 3, 4]\]The maximum order of fit would be 3 since there are 4 unique values (1, 2, 3, 4). Therefore, when orders >= 3, the return value would be 3. For orders < 3, orders would be returned.
If we had a 2-dimensional set of data:
\[X = [(1, 0), (1, 0), (1, 0), (2, 1), (3, 1), (4, 1)]\]The maximum order of fit would be 3 in the first dimension, and 1 in the second.
- Parameters:
- ordersnumpy.ndarray of int
The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions. This serves as an upper limit for the check. If the samples are distributed in a way that allows for a fit to be performed using orders, the return value will also be orders.
- coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
The coordinates of the sample distribution.
- masknumpy.ndarray of bool (n_coordinates,), optional
An optional mask where False values indicate the associated sample should not be included in determining the maximum order.
- requiredbool, optional
If required is False, the maximum available order given the distribution will be returned (up to a maximum of orders). If required is True, and the maximum available order is less than orders, the first element of the return value will be set to -1, indicating the criteria was not met.
- Returns:
- maximum_ordersnumpy.ndarray
An array of shape (1,) or (n_dimensions,) based on whether a single orders was passed in for all dimensions, or each dimension has a separate order requirement. If required was set to True, and the sample distribution did not allow for the requested order, the first element will be set to -1. Otherwise, if required was False, the maximum order for each dimension will be returned. If a single orders was to be applied over all dimensions, the return value will also be of size 1, but contains the min(maximum_order) over all dimensions.
- grig.resample_utils.convert_to_numba_list(thing)[source]¶
Converts a Python iterable to a Numba list for use in jitted functions.
- Parameters:
- thingiterable
- Returns:
- numba.typed.List()
- grig.resample_utils.coordinate_covariance(coordinates, mean=None, mask=None, dof=1)[source]¶
Calculate the covariance of a distribution.
Given the sample distribution of \(N\) coordinates (\(X\)) in \(K\) dimensions, the sample covariance is given as:
\[\Sigma = E[(X - E[X])(X - E[X])^T]\]where \(\Sigma\) is a \(K \times K\) matrix and \(E\) denotes the expected value. In the general case where the expected value of \(X\) is unknown and derived from the distribution itself, the covariance of the samples between dimension \(i\) and \(j\) is:
\[\Sigma_{ij} = \frac{1}{N - M} \sum_{k=1}^{N} {(X_{ki} - \bar{X}_i)(X_{kj} - \bar{X}_j)}\]where \(M\) is the number of degrees of freedom lost (dof) in determining the mean (\(\bar{X}\)). If the mean is not provided, it will be calculated using
coordinate_mean()
in which case the default dof of 1 is appropriate.- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
The coordinates of the distribution.
- meannumpy.ndarray (n_dimensions,), optional
The mean of the coordinate distribution in each dimension. If not provided, the expected value in each dimension will be calculated using
coordinate_mean()
.- masknumpy.ndarray (n_coordinates,), optional
An array of bool values where True indicates a coordinate should be included in the calculation, and False indicates that a coordinate should be ignored. By default, all coordinates are included.
- dofint or float, optional
The lost degrees of freedom, typically 1 to indicate that the population mean is not known and is replaced by the sample mean.
- Returns:
- covariancenumpy.ndarray of numpy.float64 (n_dimensions, n_dimensions)
The covariance of the sample distribution.
- grig.resample_utils.coordinate_mean(coordinates, mask=None)[source]¶
Returns the mean coordinate of a distribution.
Given a distribution of \(N\) coordinates in \(K\) dimensions, the mean coordinate in dimension \(k\) is given as:
\[\bar{x_k} = \frac{\sum_{i=1}^{N}{w_i x_{k,i}}}{\sum_{i=1}^{N}{w_i}}\]where \(w_i = 0\) if mask[i] is False and \(w_i = 1\) if mask[i] is True.
- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
The coordinate distribution.
- masknumpy.ndarray of bool (n_coordinates,), optional
An array of bool values where True indicates a coordinate should be included in the calculation, and False indicates that a coordinate should be ignored. By default, all coordinates are included.
- Returns:
- mean_coordinatenumpy.ndarray (n_dimensions,)
The mean coordinate of the distribution.
- grig.resample_utils.covariance_matrix_inverse(amat, phi, error, weights, rank=None)[source]¶
Calculates the inverse covariance matrix inverse of the fit coefficients.
If the least-squares solution to a fit is given as \(y = \hat{c} \cdot \Phi\) when \(y = c \cdot \Phi + \epsilon\), the inverse covariance on the estimated fit coefficients \(\hat{c}\) is given as:
\[\Sigma^{-1} = \frac{N}{N - M} (\Phi^T W Var(y) \Phi)^{-1}\]where \(N\) are the number of samples fit, \(M\) are the lost degrees of freedom, W are the fit weights, and \(Var(y)\) is related to the error (\(\sigma\)) by:
\[Var(y) = diag(1 / \sigma^2)\]Note that during the fitting process, it is common for the inverse of the covariance matrix \(\Sigma\) to have already been calculated (not factoring in lost degrees of freedom) if inverse variance was included as a weighting factor for the fit. If so, it should be passed in as amat (\(A\)), and the final covariance is simply given by:
\[\Sigma^{-1} = \frac{N}{N - M}A^{-1}\]If amat was not calculated, it should be supplied as an array of shape (0, 0) (because
numba
).- Parameters:
- amatnumpy.ndarray (n_terms, n_terms)
The matrix A as described above. If the shape of amat is set to (0, 0), it will be calculated using the error, weights, and phi terms. This should only be done if A was weighted by both error and weights.
- phinumpy.ndarray (n_terms, N)
The polynomial terms for each of the N data samples.
- errornumpy.ndarray (N,)
The 1-sigma errors.
- weightsnumpy.ndarray (N,)
The weighting factor applied to each sample when determining the least squares solution of the fit. Note that this must not factor in any error based weighting. Therefore, in the case of the resampling algorithm, it should refer to the distance weighting factor of each sample from the resampling point, or an array of ones (np.ones(N)) if distance weighting was not applied.
- rankint or float, optional
The matrix rank of amat, optionally passed in for speed if pre-calculated.
- Returns:
- inverse_covariancenumpy.ndarray (nterms, nterms)
The covariance matrix inverse of fit coefficients.
- grig.resample_utils.derivative_mscp(coefficients, phi_samples, derivative_map, sample_weights)[source]¶
Return the weighted mean-square-cross-product (mscp) of sample derivatives.
Given a polynomial equation of the form:
\[f(\Phi) = c \cdot \Phi\]The derivative is calculated as:
\[\frac{\partial f}{\partial X_k} = \sum_{m=1}^{M} {h_{k, 0, m} \cdot c_{h_{k, 1, m}} \cdot \Phi_{h_{k, 2, m}}}\]for an equation of \(M\) terms at the coordinate \(X\) in dimension \(k\), where \(h\) is the derivative_map and \(c\) are the coefficients. Please see
polynomial_derivative_map()
for a more complete description of the derivative calculation.One the derivatives (\(g = \frac{df}{dX}\)) are calculated for all samples, they are averaged, and the cross-product is returned as:
\[\bar{g}^2 = \frac{1}{tr(W W^T)} g^T W W^T g\]where \(W = diag(\text{weights})\).
For example, for polynomial fit of 2-dimensional data \(f(x, y)\), the returned matrix will be:
\[\begin{split}\bar{g}^2 = \begin{bmatrix} \frac{\partial f}{\partial x} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial x} \frac{\partial f}{\partial y} \\ \frac{\partial f}{\partial y} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \frac{\partial f}{\partial y} \end{bmatrix}\end{split}\]- Parameters:
- coefficientsnumpy.ndarray (n_terms,)
The coefficients of a polynomial fit for each term.
- phi_samplesnumpy.ndarray (n_terms, n_samples)
The polynomial terms of the sample coordinates. Please see
polynomial_exponents()
for a description of this variable.- derivative_mapnumpy.ndarray
An array of shape (n_dimensions, 3, n_valid_terms). Please see
polynomial_derivative_map()
for an explanation of this variable.- sample_weightsnumpy.ndarray (n_samples,)
The weighting to apply to each sample when determining the weighted mean (as a multiplier).
- Returns:
- mscpnumpy.ndarray (n_dimensions, n_dimensions)
An array where sscp[i, j] = derivative[i] * derivative[j]. where derivative is the weighted mean derivatives over all samples.
- grig.resample_utils.distribution_variances(coordinates, mean=None, covariance=None, mask=None, sigma_inv=None, dof=1)[source]¶
Return variance at each coordinate based on coordinate distribution.
Given a normal sample distribution \(X\), returns the variance at each sample coordinate. For example, consider a population of zero mean (\(\bar{X} = 0\)) and a standard deviation of one (\(\sigma = 1\)). Samples located at \(\bar{X} \pm \sigma\) will return a variance of 1, while samples located at \(\bar{X} \pm 2\sigma\) will return a variance of 4.
By default, the distribution variance is derived using
coordinate_covariance()
, and the sample mean is derived usingcoordinate_mean()
assuming the loss of 1 degree of freedom. However, the expected value (\(E[X]\)) may be supplied with the mean optional parameter along with the covariance, and the number of lost degrees of freedom (dof).- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_samples)
The coordinates of the sample distribution.
- meannumpy.ndarray (n_dimensions,), optional
The expected mean value of the distribution.
- covariancenumpy.ndarray (n_dimensions, n_dimensions), optional
The covariance matrix (if known) for the sample distribution.
- masknumpy.ndarray of bool (n_samples,), optional
An array of bool values where True indicates a sample should be included when calculating the mean and covariance, and False indicates that a sample should be ignored. By default, all samples are included. The output variance will still be calculated for all samples.
- sigma_invnumpy.ndarray (n_dimensions, n_dimensions), optional
The inverse of the covariance matrix, optionally passed in for speed if the covariance is known, and it’s inverse has been pre-calculated.
- dofint or float, optional
The lost degrees of freedom, typically 1 to indicate that the population mean is not known and is replaced by the sample mean.
- Returns:
- variancenumpy.ndarray (n_samples,)
The variance at each sample coordinate based on the sample distribution.
- grig.resample_utils.estimated_covariance_matrix_inverse(phi, error, weights, rank=None)[source]¶
Calculates covariance matrix inverse of fit coefficients from mean error.
An estimate to the covariance of fit parameters is given by:
\[\Sigma = \frac{N}{N - M} \bar{\sigma}^2 (\Phi^T W \Phi)^{-1}\]where N are the number of samples used in the fit, \(M\) are the number of lost degrees of freedom, \(W\) are the weights applied to the samples during the fit, and the estimated coefficients \(\hat{c}\) are used to fit \(y = \hat{c} \cdot \Phi + \sigma\) to the sample population \(y = c \cdot \Phi + \epsilon\). The weighted mean of the squared error (\(\bar{\sigma}^2\)) is given by:
\[\bar{\sigma}^2 = \frac{\sum_{i=1}^{N}{w_i e_i^2}} {\sum_{i=1}^{N}{w_i}}\]The final returned matrix is \(\Sigma^{-1}\).
- Parameters:
- phinumpy.ndarray (nterms, N)
The polynomial terms for each of the N data samples.
- errornumpy.ndarray (N,)
The 1-sigma errors or residuals to the fit.
- weightsnumpy.ndarray (N,)
The weighting of each sample in the fit, not including any error weighting.
- rankint or float, optional
The matrix rank of \(\Phi^T W \Phi\), optionally passed in for speed if pre-calculated.
- Returns:
- covariance_inversenumpy.ndarray (nterms, nterms)
The inverse covariance matrix.
- grig.resample_utils.evaluate_derivative(coefficients, phi_point, derivative_map)[source]¶
Calculates the derivative of a polynomial at a single point.
Please see
polynomial_derivative_map()
for a full description of how the derivatives are calculated from a polynomial equation defined bypolynomial_exponents()
. These also explain how one should transform the independent variables to the “phi” (\(\Phi\)) terms (which may be done usingpolynomial_terms()
).The derivative at a point is calculated by:
\[\frac{\partial f}{\partial X_k} = \sum_{m=1}^{M} {h_{k, 0, m} \cdot c_{h_{k, 1, m}} \cdot \Phi_{h_{k, 2, m}}}\]for a polynomial equation consisting of \(M\) terms at the coordinate \(X\) in dimension \(k\), where \(h\) is the derivative map.
- Parameters:
- coefficientsnumpy.ndarray (n_terms,)
The coefficients of the polynomial equation for each term.
- phi_pointnumpy.ndarray (n_terms,)
The polynomial terms of the fittin equation at a single coordinate.
- derivative_mapnumpy.ndarray
An array of shape (n_dimensions, 3, n_valid_terms). The second dimension (of size 3) gives a constant multiplier in the first element, the coefficient index in the second element, and the phi index in the second element. The third dimension will generally be of a smaller size than the number of terms in the polynomial equation as not all are required to calculate the derivative. Due to the fact that some dimensions may contain more valid terms than others, n_valid_terms is set to the maximum number of valid terms over all dimensions. Any invalid terms still remaining in the mapping array will have multipliers set to zero, and index pointers set to -1.
- Returns:
- derivativenumpy.ndarray (n_dimensions,)
The partial derivative of the polynomial equation with respect to each dimension.
- grig.resample_utils.evaluate_derivatives(coefficients, phi_points, derivative_map)[source]¶
Calculates the derivative of a polynomial at multiple points.
Please see
polynomial_derivative_map()
for a full description of how the derivatives are calculated from a polynomial equation defined bypolynomial_exponents()
. These also explain how one should transform the independent variables to the “phi” (\(\Phi\)) terms (which may be done usingpolynomial_terms()
).The derivative at a point is calculated by:
\[\frac{\partial f}{\partial X_k} = \sum_{m=1}^{M} {h_{k, 0, m} \cdot c_{h_{k, 1, m}} \cdot \Phi_{h_{k, 2, m}}}\]for a polynomial equation consisting of \(M\) terms at the coordinate \(X\) in dimension \(k\), where \(h\) is the derivative map.
- Parameters:
- coefficientsnumpy.ndarray (n_terms,)
The coefficients of the polynomial equation for each term.
- phi_pointsnumpy.ndarray (n_terms, n_points)
The polynomial terms of the fitting equation at a multiple points.
- derivative_mapnumpy.ndarray
An array of shape (n_dimensions, 3, n_valid_terms). The second dimension (of size 3) gives a constant multiplier in the first element, the coefficient index in the second element, and the phi index in the second element. The third dimension will generally be of a smaller size than the number of terms in the polynomial equation as not all are required to calculate the derivative. Due to the fact that some dimensions may contain more valid terms than others, n_valid_terms is set to the maximum number of valid terms over all dimensions. Any invalid terms still remaining in the mapping array will have multipliers set to zero, and index pointers set to -1.
- Returns:
- derivativesnumpy.ndarray (n_dimensions, n_points)
The partial derivative of the polynomial equation with respect to each dimension at each point.
- grig.resample_utils.fasttrapz(y, x)[source]¶
Fast 1-D integration using Trapezium method.
Approximates the integration of a 1-D discrete valued function \(y_i = f(x_i)\) with \(N\) measurements as:
\[\int_a^b f(x) \approx \frac{1}{2} \sum_{i=1}^{N}{ \left( y_{i - 1} + y_i \right) \left( x_i - x_{i - 1} \right) }\]- Parameters:
- ynumpy.ndarray (N,)
Dependent variable
- xnumpy.ndarray (N,)
Independent variable
- Returns:
- areafloat
The integrated area
- grig.resample_utils.fit_phi_value(phi, coefficients)[source]¶
Returns the dot product of phi and coefficients.
A utility function for use in calculating the polynomial fit based on the polynomial terms of the independent values (phi), and a set of calculated coefficients.
The return value for phi (\(\Phi\)) terms and coefficients (\(c\)) each consisting of \(L\) terms is:
\[y = \sum_{l=1}^{L}{c_l \Phi_l}\]The polynomial terms \(\Phi\) are pre-calculated and used in place of regular independent values \(x\) in the resampling algorithm to avoid the unnecessary recalculation of terms in a polynomial equation. For example, if fitting
\[y = 5 x_0 x_1 + 6 x_0 x_3^2 + 7 x_1^3 x_4^2 + 8 x_0 x_1 x_2 x_3\]we set
\[ \begin{align}\begin{aligned}c = [5,\, 6,\, 7,\, 8]\\\Phi = [x_0 x_1,\, x_0 x_3^2,\, x_1^3 x_4^2,\, x_0 x_1 x_2 x_3]\end{aligned}\end{align} \]and then only need to perform the simple fast calculation
\[y = c \cdot \Phi\]- Parameters:
- phinumpy.ndarray (n_coefficients,)
Polynomial terms of independent values.
- coefficientsnumpy.ndarray (n_coefficients,)
Coefficients used to determine fit.
- Returns:
- fitfloat
The fitted value.
- grig.resample_utils.fit_phi_variance(phi, inv_covariance)[source]¶
Calculates variance given the polynomial terms of a coordinate.
The output variance for given polynomial terms phi (\(\Phi\)) is given as:
\[V = \Phi^T Var(\hat{c}) \Phi\]where \(Var(\hat{c})\) is the covariance matrix inverse of the fit coefficients (inv_covariance) such that \(Var(\hat{c})_{i, j}\) gives the covariance between the coefficients for terms \(i\) and \(j\), and the coefficients \(\hat{c}\) define the fit:
\[y = \hat{c} \cdot \Phi\]- Parameters:
- phinumpy.ndarray (ncoeffs,)
The polynomial terms.
- inv_covariancenumpy.ndarray (ncoeffs, ncoeffs)
The covariance matrix inverse of the fit coefficients.
- Returns:
- variancefloat
The calculated variance.
- grig.resample_utils.fit_residual(data, phi, coefficients)[source]¶
Calculates the residual of a polynomial fit to data.
The residual is calculated using the matrix operation Y - CX where Y is the dataset, C are the coefficients and X is the phi polynomial terms.
- Parameters:
- datanumpy.ndarray (n_samples,)
Data from which coefficients were derived.
- phinumpy.ndarray (n_terms, n_samples)
Polynomial terms of independent values of each sample.
- coefficientsnumpy.ndarray (n_terms,)
Coefficient values.
- Returns:
- residualnumpy.ndarray (n_samples,)
The residual
- grig.resample_utils.half_max_sigmoid(x, x_half=0.0, k=1.0, a=0.0, c=1.0, q=1.0, b=1.0, v=1.0)[source]¶
Evaluate a special case of the logistic function where f(x0) = 0.5.
The generalized logistic function is given as:
\[f(x) = A + \frac{K - A} {\left( C + Q e^{-B(x - x_0)} \right)^{1/\nu}}\]and may be evaluated with
logistic_curve()
.We can manipulate this function so that \(f(x_{half}) = (K + A) / 2\) (the midpoint of the function) by setting the location of maximum growth (\(x_0\)) to occur at:
\[x_0 = x_{half} + \frac{1}{B} \ln{\left( \frac{2^\nu - C}{Q} \right)}\]Since a logarithm is required, it is incumbent on the user to ensure that no logarithms are taken of any quantity \(\leq 0\), i.e., \((2^\nu - C) / Q > 0\).
- Parameters:
- xint or float or numpy.ndarray (shape)
The independent variable.
- x_halfint or float or numpy.ndarray (shape), optional
The x value for which f(x) = 0.5.
- kint or float or numpy.ndarray (shape), optional
The upper asymptote when c is one.
- aint or float or numpy.ndarray (shape), optional
The lower asymptote.
- cint or float or numpy.ndarray (shape), optional
Typically takes a value of 1. Otherwise, the upper asymptote is a + ((k - a) / c^(1/v)).
- qint or float or numpy.ndarray (shape), optional
Related to the value of f(0). Fixes the point of inflection. In this implementation, q is completely factored out after simplifying and does not have any affec
- bint or float or numpy.ndarray (shape), optional
The growth rate.
- vint or float or numpy.ndarray (shape), optional
Must be greater than zero. Affects near which asymptote the maximum growth occurs (point of inflection).
- Returns:
- resultfloat or numpy.ndarray
The half-max sigmoid evaluated at x.
- grig.resample_utils.logistic_curve(x, x0=0.0, k=1.0, a=0.0, c=1.0, q=1.0, b=1.0, v=1.0)[source]¶
Evaluate the generalized logistic function.
The generalized logistic function is given as:
\[f(x) = A + \frac{K - A} {\left( C + Q e^{-B(x - x_0)} \right)^{1/\nu}}\]Taken from Wikipedia contributors. (2020, June 11). Generalised logistic function. In Wikipedia, The Free Encyclopedia. Retrieved 23:51, July 6, 2020, from https://en.wikipedia.org/w/index.php?title=Generalised_logistic_function&oldid=961965809
- Parameters:
- xint or float or numpy.ndarray (shape)
The independent variable.
- x0int or float or numpy.ndarray (shape), optional
An offset applied to x.
- kint or float or numpy.ndarray (shape), optional
The upper asymptote when c is one.
- aint or float or numpy.ndarray (shape), optional
The lower asymptote.
- cint or float or numpy.ndarray (shape), optional
Typically takes a value of 1. Otherwise, the upper asymptote is a + ((k - a) / c^(1/v)).
- qint or float or numpy.ndarray (shape), optional
Related to the value of f(0).
- bint or float or numpy.ndarray (shape), optional
The growth rate.
- vint or float or numpy.ndarray (shape), optional
Must be greater than zero. Affects near which asymptote the maximum growth occurs.
- Returns:
- resultfloat or numpy.ndarray (shape)
The logistic function evaluated at x.
- grig.resample_utils.multiple_polynomial_terms(coordinates, exponents)[source]¶
Derive polynomial terms for a coordinate set given polynomial exponents.
Raises multiple coordinates by a power and then calculates the product over all dimensions. For example, the output of an (x, y) vector with exponent=[[2, 3]] would be \(x^2y^3\).
Note that multiple sets of exponents are expected to be provided during this operation, so the exponents parameter should be a 2-dimensional array. The return value will be a 2-dimensional array with the size of the first dimension equal to the number of exponent sets, and the size of the second dimension equal to the number of vector sets.
- Parameters:
- coordinatesnumpy.ndarray (N, n_vectors)
Sets of vectors in N-dimensions.
- exponentsnumpy.ndarray (n_exponents, N)
Sets of exponents by which to raise the vector.
- Returns:
- numpy.ndarray of numpy.float64 (n_exponents, n_vectors)
The product of the exponentiation of the vectors.
- grig.resample_utils.multivariate_gaussian(covariance, coordinates, center=None, normalize=False)[source]¶
Return values of a multivariate Gaussian in K-dimensional coordinates.
The density of a multivariate Gaussian is given as:
\[f_{\mathbf X}(x_1, \ldots, x_k) = \frac {\exp\left(-\frac{1}{2} (x - \mu)^T \Sigma^{-1}(x - \mu) \right)} {\sqrt{(2 \pi)^K |\Sigma|}}\]where the coordinates \(X\) are real \(K\) dimensional vectors, \(\Sigma\) is the covariance matrix with determinant \(|\Sigma|\), and the center of the distribution is given by \(\mu\).
Note that by default, the factor \({\sqrt{(2 \pi)^K |\Sigma|}}\) is not applied unless normalize is True.
- Parameters:
- covariancenumpy.ndarray (n_dimensions, n_dimensions)
The covariance matrix (\(\Sigma\)). Should be symmetric and positive definite.
- coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
The coordinates at which to evaluate the multivariate Gaussian.
- centernumpy.ndarray (n_dimensions,), optional
The center of the distribution. If not supplied, the center is assumed to be zero in all dimensions.
- normalizebool, optional
If True, normalize by dividing by \(\sqrt{(2 \pi)^k |\Sigma|}\).
- Returns:
- densitynumpy.ndarray (n_coordinates,)
The density evaluated at each coordinate.
- grig.resample_utils.no_fit_solution(set_index, point_index, fit_out, error_out, counts_out, weights_out, distance_weights_out, rchi2_out, offset_variance_out, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_offset_variance=True, cval=nan)[source]¶
Fill output arrays with set values on fit failure.
On fit failure, the output arrays are filled with certain values indicating that a fit is not possible. Count, and weight arrays contain zeros; the error, reduced chi-squared and offset variance are set to NaN; finally, the data array is set to cval, a user set float value.
- Parameters:
- set_indexint
An integer representing the data set for which a fit cannot be performed.
- point_indexint
An integer representing the index of the fit coordinate at which the fit cannot be performed.
- fit_outnumpy.ndarray (n_sets, n_coordinates)
The output fit values.
- error_outnumpy.ndarray (n_sets, n_coordinates)
The output error values on the fit.
- counts_outnumpy.ndarray (n_sets, n_coordinates)
The number of samples used to create the fit.
- weights_outnumpy.ndarray (n_sets, n_coordinates)
The sum of full weights applied to samples in the fit.
- distance_weights_outnumpy.ndarray (n_sets, n_coordinates)
The sum of only the distance weights applied to samples in the fit.
- rchi2_outnumpy.ndarray (n_sets, n_coordinates)
The reduced chi-squared statistic of the fit.
- offset_variance_outnumpy.ndarray (n_sets, n_coordinates)
The variance as derived from the offset of the fit coordinate from the sample distribution.
- get_errorbool, optional
If False do not update error_out.
- get_countsbool, optional
If ‘False’ do not update counts_out.
- get_weightsbool, optional
If False do not update weights_out.
- get_distance_weightsbool, optional
If False do not update distance_weights_out.
- get_rchi2bool, optional
If False do not update rchi2_out.
- get_offset_variancebool, optional
If False do not update offset_variance_out.
- cvalfloat, optional
The fill value for data_out on fit failure.
- Returns:
- None
All arrays are updated in-place.
- grig.resample_utils.offset_variance(coordinates, reference, mask=None, mean=None, sigma_inv=None, scale=1.0, dof=1)[source]¶
Variance at reference coordinate derived from distribution uncertainty.
Given a distribution of coordinates (\(X\)), calculate the variance at a reference coordinate (\(X_{ref}\)) based upon the uncertainty in the coordinate distribution. Firstly, the distribution covariance matrix (\(\Sigma\)) is calculated by
coordinate_covariance()
enabling the variance at the reference position to be given as:\[V(X_{ref}) = (X_{ref} - E[X])^{T} \Sigma^{-1} (X_{ref} - E[X])\]If the expected value of the distribution is known (\(E[X]\)) or pre-calculated, it can be passed in to the function using the mean optional parameter along with the lost degrees of freedom (dof) spent in determining \(E[X]\). If not, the default is to use \(\bar{X}\) and dof = 1.
The user may optionally specify a scale factor (\(\beta\)) such that:
\[V(X_{ref}) = (\beta(X_{ref} - E[X]))^{T} \Sigma^{-1} (\beta(X_{ref} - E[X]))\]or:
\[V(X_{ref}) = \beta^2 (X_{ref} - E[X])^{T} \Sigma^{-1} (X_{ref} - E[X])\]- Parameters:
- coordinatesnumpy.ndarray (n_dimensions, n_coordinates)
The coordinates of the distribution.
- referencenumpy.ndarray (n_dimensions,)
The reference coordinate.
- masknumpy.ndarray of bool (n_coordinates,), optional
An array of bool values where True indicates a coordinate should be included in the calculation, and False indicates that a coordinate should be ignored. By default, all coordinates are included.
- meannumpy.ndarray (n_dimensions,), optional
The mean of the coordinate distribution in each dimension. If not provided, the expected value in each dimension will be calculated using
coordinate_mean()
.- scaleint or float, optional
The scaling factor described above.
- dofint or float, optional
The lost degrees of freedom, typically 1 to indicate that the population mean is not known and is replaced by the sample mean.
- sigma_invnumpy.ndarray (n_dimensions, n_dimensions), optional
If the covariance matrix of the coordinate distribution has already been calculated, the matrix inverse may be passed in as sigma_inv for speed.
- Returns:
- variancefloat
The variance at the reference coordinate determined from the coordinate distribution.
- grig.resample_utils.polynomial_derivative_map(exponents)[source]¶
Creates a mapping from polynomial exponents to derivatives.
Please see
polynomial_exponents()
for details on how a polynomial equation is defined within the resampling algorithm, and the use of the exponents array in defining the polynomial terms (\(\Phi\)).Within the confines of the resampling algorithm, the polynomial exponents should have always been defined in a way that will always allow the derivative of a polynomial fit to be calculated from existing, pre-calculated \(\Phi\) terms.
For example, consider the 2-dimensional 2nd order polynomial equation and its derivatives in each dimension:
\[f(x, y) = c_1 + c_2 x + c_3 x^2 + c_4 y + c_5 x y + c_6 y^2\]\[\frac{\partial f}{\partial x} = c_2 + 2 c_3 x + c_5 y\]\[\frac{\partial f}{\partial y} = c_4 + c_5 x + 2 c_6 y\]Converting \(f(x, y) \rightarrow f(\Phi)\) we get:
\[f(\Phi) = c_1 \Phi_1 + c_2 \Phi_2 + c_3 \Phi_3 + c_4 \Phi_4 + c_5 \Phi_5 + c_6 \Phi_6\]It can then be seen that
\[\frac{\partial f}{\partial x} = c_2 \Phi_1 + 2 c_3 \Phi_2 + c_5 \Phi_4\]\[\frac{\partial f}{\partial y} = c_4 \Phi_1 + c_5 \Phi_2 + 2 c_6 \Phi_4\]Generalizing for a polynomial equation consisting of \(M\) terms of the independent variable \(X\) in \(K-\text{dimensions}\), a mapping (\(h\)) can be devised enabling calculation of the derivatives from pre-existing terms. For dimension \(k\):
\[\frac{\partial f}{\partial X_k} = \sum_{m=1}^{M} {h_{k, 0, m} \cdot c_{h_{k, 1, m}} \cdot \Phi_{h_{k, 2, m}}}\]This allows the derivative to be calculated from the existing polynomial terms (\(\Phi\)) and coefficients (\(c\)). In addition, the mapping can be calculated prior to reduction and will therefore only need to be calculated once along with \(\Phi\). Once the coefficients are known, the derivatives can then be calculated using \(h\).
Derivatives may be evaluated using
evaluate_derivative()
andevaluate_derivatives()
.- Parameters:
- exponentsnumpy.ndarray (n_terms, n_dimensions)
The exponents defining a polynomial equation.
- Returns:
- derivative_mapnumpy.ndarray of int
An array of shape (n_dimensions, 3, n_valid_terms). The second dimension (of size 3) gives a constant multiplier in the first element, the coefficient index in the second element, and the phi index in the second element. The third dimension will generally be of a smaller size than the number of terms in the polynomial equation as not all are required to calculate the derivative. Due to the fact that some dimensions may contain more valid terms than others, n_valid_terms is set to the maximum number of valid terms over all dimensions. Any invalid terms still remaining in the mapping array will have multipliers set to zero, and index pointers set to -1.
- grig.resample_utils.polynomial_exponents(order, ndim=None, use_max_order=False)[source]¶
Define a set of polynomial exponents.
The resampling algorithm uses defines a set of polynomial exponents as an array of shape (dimensions, terms) for an equation of the form:
\[f( \Phi ) = \sum_{m=1}^{M}{c_m \Phi_m}\]for \(M\) terms. Here, \(\Phi_m\) represents the product of independent variables, each raised to an appropriate power as defined by exponents. For example, consider the equation for 2-dimensional data with independent variables \(x\) and \(y\):
\[f(x, y) = c_1 + c_2 x + c_3 x^2 + c_4 y + c_5 x y + c_6 y^2\]In this case:
exponents = [[0, 0], # represents a constant or x^0 y^0 [1, 0], # represents x [2, 0], # represents x^2 [0, 1], # represents y [1, 1], # represents xy [0, 2]] # represents y^2
The resampling algorithm solves for the coefficients (\(c\)) by converting \(f(X) \rightarrow f(\Phi)\) for \(K-\text{dimensional}\) independent variables (\(X\)) and exponents (\(p\)) by setting:
\[\Phi_m = \prod_{k=1}^{K}{X_{k}^{p_{m, k}}}\]In most of the code, the \(\Phi\) terms are interchangable with “polynomial terms”, and in the above example \(\Phi_5 = xy\) since exponents[4] = [1, 1] representing \(x^1 y^1\).
Note that for all terms (\(m\)) in each dimension \(k\), \(\sum_{k=1}^{K}{p_{m, k}} \leq max(\text{order})\). In addition, if use_max_order is False (default), \(p_{m,k} \leq \text{order}[k]\).
- Parameters:
- orderint or array_like of int
Polynomial order for which to generate exponents. If an array will create full polynomial exponents over all len(order) dimensions.
- ndimint, optional
If set, return Taylor expansion for ndim dimensions for the given order if order is not an array.
- use_max_orderbool, optional
This keyword is only applicable for multi-dimensional data when orders are unequal across dimensions. When True, the maximum exponent for each dimension is equal to max(order). If False, the maximum available exponent for dimension k is equal to order[k].
- Returns:
- exponentsnumpy.ndarray
(n_terms, n_dimensions) array of polynomial exponents.
Examples
>>> polynomial_exponents(3) array([[0], [1], [2], [3]])
>>> polynomial_exponents([1, 2]) array([[0, 0], [1, 0], [0, 1], [1, 1], [0, 2]])
>>> polynomial_exponents(3, ndim=2) array([[0, 0], [1, 0], [2, 0], [3, 0], [0, 1], [1, 1], [2, 1], [0, 2], [1, 2], [0, 3]])
- grig.resample_utils.polynomial_terms(coordinates, exponents)[source]¶
Derive polynomial terms given coordinates and polynomial exponents.
Raises a single coordinate or multiple coordinates by a power and then calculates the product over all dimensions. For example, the output of an (x, y) vector with exponent=[[2, 3]] would be \(x^2y^3\).
Note that multiple sets of exponents are expected to be provided during this operation, so the exponents parameter should be a 2-dimensional array. If a single N-dimensional vector is provided, the output will be a 1-dimensional array with a single value for each exponent set. If multiple vectors are provided, the output will be of shape (number of exponent sets, number of vectors).
- Parameters:
- coordinatesnumpy.ndarray (N, n_vectors) or (N,)
Sets of coordinates in N-dimensions or a single coordinate of N-dimensions.
- exponentsnumpy.ndarray (n_exponents, N)
Sets of polynomial exponents to apply to coordinates.
- Returns:
- numpy.ndarray of numpy.float64 (n_exponents, n_vectors) or (n_exponents,)
The polynomial terms.
- grig.resample_utils.relative_density(sigma, counts, weight_sum, tolerance=None, max_dim=4)[source]¶
Returns the relative density of samples compared to a uniform distribution.
The relative local density is defined as 1 for a uniform distribution, < 1 for a distribution that is sparse near the center, and > 1 when clustered around the center.
The sum of the distance_weights returned from a Gaussian weighting function on the samples is required for this calculation. The weighting function should be of the form:
\[w(\Delta x) = exp \left( -\sum_{k=1}^{K}{\frac{-\Delta x_k^2}{2 \sigma_k^2}} \right)\]over \(K\) dimensions where \(\Delta x_k\) is the offset of a sample in dimension \(k\) from the point of interest, and \(\sigma\) must be supplied to relative_density as sigma, where distance_weights = \(\sum_{i=1}^{N}{w(x_i)}\) and counts = \(N\). Note that \(\sigma\) and \(x\) must be scaled such that the principle axis of an ellipsoid window containing all samples are equal to unity (principle axis in dimension \(k\) is \(\Omega_k = 1\) such that \(\prod_{k=1}^{K}{\Omega_k} = 1\) below).
The local relative density is then given as:
\[\rho = \frac{\rho(\text{measured})}{\rho(\text{uniform})}\]where,
\[\rho(\text{uniform}) = N \frac{\Gamma \left( 1 + \frac{K}{2} \right)} {\pi^{K/2} \prod_{k=1}^{K}{\Omega_k}}\]\[\rho(\text{measured}) = \frac {\sum_{i=1}^{N}{w_i}} {\int \cdots \int_{R} w(\mathbf{\Delta x}) \, {dx}_1 \cdots {dx}_K}\]and region \(R\) satisfies the requirement \(\| \mathbf{\Delta x} \|_2 \leq 1\)
- Parameters:
- sigmanp.ndarray (n_dimensions,)
The standard deviation of the Gaussian weighting function used to calculate the distance_weights for each dimension.
- countsint or float or numpy.ndarray (N,)
The number of data samples included in the sum of distance weights.
- weight_sumint or float or numpy.ndarray (N,)
The sum of weights as returned from a Gaussian weighting function.
- tolerancefloat, optional
Relative error tolerance passed to scipy.integrate.quad when determining the integral of the weighting function. The default of 10^(2*dim - 7) determined by testing, balancing precision with speed and convergence.
- max_dimint, optional
If the number of dimensions is greater than max_dim, do not attempt to calculate the relative density since the integral calculation is unlikely to converge and will take a vast amount of time. The return output will be 1.0 or an array of ones (N,). The maximum recommended number of dimensions is 4 (default).
- Returns:
- float or numpy.ndarray of float64 (N,)
The relative density.
- grig.resample_utils.scale_coordinates(coordinates, scale, offset, reverse=False)[source]¶
Apply scaling factors and offsets to N-dimensional data.
The two available transforms are controlled by the reverse. The transform functions apply the following functions:
Reverse
f(x)
False (default)
(x - offset) / scale
True
(x * scale) + offset
- Parameters:
- coordinatesnumpy.ndarray (N, M) or (N,)
Either a 1 or 2-dimensional array may be supplied. If a 1-dimensional array is supplied, it is assumed that it represents a single coordinates in N-dimensions. If a 2-dimensional array is supplied, it should be of shape (N, M) where N is the number of dimensions, and M is the number of coordinates.
- scalenumpy.ndarray (N,)
The scaling factor to apply to each dimension.
- offsetnumpy.ndarray (N,)
The offset to apply to each dimension.
- reversebool, optional
If True, apply the reverse transform. The default is False.
- Returns:
- numpy.ndarray of numpy.float64 (N, M) or (N,)
The scaled coordinates array.
- grig.resample_utils.scale_forward_scalar(coordinate, scale, offset)[source]¶
Applies the function f(x) = (x - offset) / scale to a single coordinate.
This is a
numba
jit compiled function.- Parameters:
- coordinatenumpy.ndarray (N,)
An array where N is the number of dimensions.
- scalenumpy.ndarray (N,)
The scaling factor to apply to each dimension.
- offsetnumpy.ndarray (N,)
The offset to apply to each dimension.
- Returns:
- numpy.ndarray of numpy.float64 (N,)
The scaled coordinates array.
- grig.resample_utils.scale_forward_vector(coordinates, scale, offset)[source]¶
Applies the function f(x) = (x - offset) / scale to a coordinate array.
This is a
numba
jit compiled function.- Parameters:
- coordinatesnumpy.ndarray (N, M)
An array where N is the number of dimensions, and M is the number of coordinates.
- scalenumpy.ndarray (N,)
The scaling factor to apply to each dimension.
- offsetnumpy.ndarray (N,)
The offset to apply to each dimension.
- Returns:
- numpy.ndarray of numpy.float64 (N, M)
The scaled coordinates array.
- grig.resample_utils.scale_reverse_scalar(coordinate, scale, offset)[source]¶
Applies the function f(x) = (x * scale) + offset to a single coordinate.
This is a
numba
jit compiled function.- Parameters:
- coordinatenumpy.ndarray (N,)
An array where N is the number of dimensions.
- scalenumpy.ndarray (N,)
The scaling factor to apply to each dimension.
- offsetnumpy.ndarray (N,)
The offset to apply to each dimension.
- Returns:
- numpy.ndarray of numpy.float64 (N,)
The scaled coordinates array.
- grig.resample_utils.scale_reverse_vector(coordinates, scale, offset)[source]¶
Applies the function f(x) = (x * scale) + offset to a coordinate array.
This is a
numba
jit compiled function.- Parameters:
- coordinatesnumpy.ndarray (N, M)
An array where N is the number of dimensions, and M is the number of coordinates.
- scalenumpy.ndarray (N,)
The scaling factor to apply to each dimension.
- offsetnumpy.ndarray (N,)
The offset to apply to each dimension.
- Returns:
- numpy.ndarray of numpy.float64 (N, M)
The scaled coordinates array.
- grig.resample_utils.scaled_adaptive_weight_matrices(sigma, rchi2_values, fixed=None)[source]¶
Wrapper for scaled_adaptive_weight_matrix over multiple values.
Please see
scaled_adaptive_weight_matrix()
for details on how the weighting kernel is modified using a single scaling factor. This function performs the calculation for multiple scaling factors (\(\chi_r^2\)).- Parameters:
- sigmanumpy.ndarray (n_dimensions,)
The standard deviations of the Gaussian for each dimensional component used for the distance weighting of each sample in the initial fit.
- rchi2_valuesnumpy.ndarray (n_data_sets, fit_shape)
The reduced chi-squared statistics of the fit for each data set. Here, fit_shape is an arbitrary array shape which depends upon the shape of the output fit coordinates defined by the user.
- fixednumpy.ndarray of bool (n_dimensions,), optional
If supplied, True values indicate that the width of the Gaussian along the corresponding axis should not be altered in the output result.
- Returns:
- scaled_matricesnumpy.ndarray
The scaled weighting kernel with shape (n_data_sets, fit_shape, 1, n_dimensions) where fit_shape is determined by the shape of the output fit coordinates supplied by the user, and n_data_sets is the number of data sets to be fit. The third axis (of size 1), is a dummy dimension required for Numba to compile successfully. The last dimension contains the new scaled inverse \(\alpha_{scaled,k}^{-1}\) values as described in
scaled_adaptive_weight_matrix()
.
- grig.resample_utils.scaled_adaptive_weight_matrix(sigma, rchi2, fixed=None)[source]¶
Scales a Gaussian weighting kernel based on a prior fit.
In the standard resampling algorithm, a polynomial fit may weight each sample (\(x\)) according to its distance from the reference position at which the fit is derived (\(x_{ref}\)) such that samples closer to the reference position have more influence on the fit than those that are farther. The weighting function used is:
\[w(x) = exp \left( -\sum_{k=1}^{K}{\frac{(x_{ref, k} - x_k)^2}{2 \sigma_k^2}} \right)\]in \(K\) dimensions where \(\sigma\) (supplied to this function via sigma) is a scaling factor, equivalent to the standard deviation of a normal distribution. Following a fit, it is also possible to generate a reduced chi-squared statistic (\(\chi_r^2\)) which measures the “goodness” of fit.
With this information we can rescale sigma in an attempt to get \(\chi_r^2 \rightarrow 1\) i.e., get a good fit within noise limitations. This function assumes that if \(\chi_r^2 < 1\), the samples have been over-fit, and therefore, the weighting function should be “widened” to allow more distant samples to have a stronger influence on the fit and subsequent \(\chi_r^2\) calculation. Likewise, if \(\chi_r^2 > 1\), this implies that the weighting function should be truncated so that the fit focuses more strongly on providing a good fit to nearby samples. i.e., there is likely structure away from the fit location that cannot be modelled well by a polynomial of the given order.
To accomplish this, the weighting kernel is rescaled such that:
\[\chi_r^2 \prod_{k=1}^{K}{\sigma_{scaled, k}^2} = \prod_{k=1}^{K}{\sigma_k^2}\]The reason is that for a multivariate Gaussian:
\[\int_{R^K} exp \left( -\frac{1}{2} (x - x_{ref})^T \Sigma^{-1} (x - x_{ref}) \right) = (2 \pi)^{K/2} |\Sigma|^{1/2}\]where \(\sigma^2 = diag(\Sigma)\):
\[|\Sigma| \propto \prod_{k=1}^{K}{\sigma_k^2} \propto \chi_r\]Note that in this specific implementation, the shape of the weighting kernel remains unchanged and only the overall size is allowed to vary. Therefore, a single scaling factor (\(\beta\)) is applied over all dimensions such that:
\[\sigma_{scaled, k} = \frac{\sigma_k}{\sqrt{\beta}}\]where
\[\beta = \chi_r^{1 / K}\]To reduce subsequent calculations, a scaled \(\alpha\) value is passed out instead of \(\sigma_{scaled}\) where:
\[\alpha = 2 \sigma^2\]Therefore, the final output value will be:
\[\alpha_{scaled, k}^{-1} = \frac{\beta}{2 \sigma_k^2}\]Finally, scaling does not need to occur across all dimensions, and it is possible to fix the shape of the kernel in one or more dimensions by using the fixed parameter. If this is the case:
\[\beta = \chi_r^{\frac{1}{K - K_{fixed}}}\]where \(K_{fixed}\) is the number dimensions in which scaling has been disabled.
- Parameters:
- sigmanumpy.ndarray (n_dimensions,)
The standard deviations of the Gaussian for each dimensional component used for distance weighting of each sample in the initial fit.
- rchi2float
The reduced chi-squared statistic of the fit.
- fixednumpy.ndarray of bool (n_dimensions,), optional
If supplied, True values indicate that the width of the Gaussian along the corresponding axis should not be altered in the output result.
- Returns:
- inverse_alphanumpy.ndarray (n_dimensions,)
The scaled sigma values converted to the inverse alpha array, required by
calculate_adaptive_distance_weights_scaled()
to create a set of weighting factors.
- grig.resample_utils.shaped_adaptive_weight_matrices(sigma, rchi2_values, gradient_mscp, density=None, variance_offsets=None, fixed=None)[source]¶
Wrapper for shaped_adaptive_weight_matrix over multiple values.
Please see
shaped_adaptive_weight_matrix()
for details on how the weighting kernel is modified using a scale factor and measure of the derivatives of the fitting function. This function performs the calculation for multiple scaling factors (\(\chi_r^2\)) and derivative measures.- Parameters:
- sigmanumpy.ndarray (n_dimensions,)
The standard deviations of the Gaussian for each dimensional component used for the distance weighting of each sample in the initial fit.
- rchi2_valuesnumpy.ndarray (n_sets, shape)
The reduced chi-squared statistics of the fit for each data set. Here, shape is an arbitrary array shape which depends upon the shape of the output fit coordinates defined by the user.
- gradient_mscpnumpy.ndarray (n_sets, shape, n_dimensions, n_dimensions)
An array where gradient_mscp[i, j] = derivative[i] * derivative[j] in dimensions i and j. Please see
derivative_mscp()
for further information. The last two dimensions must be Hermitian and real-valued (symmetric) for each fit set/coordinate.- densitynumpy.ndarray (n_sets, shape)
The local relative density of the samples around the fit coordinate. A value of 1 represents uniform distribution. Values greater than 1 indicate clustering around the fitting point, and values less than 1 indicate that samples are sparsely distributed around the fitting point. Please see
relative_density()
for further information.- variance_offsetsnumpy.ndarray (n_sets, shape)
The variance at the fit coordinate determined from the sample coordinate distribution. i.e., if a fit is performed at the center of the sample distribution, the variance is zero. If done at 2-sigma from the sample distribution center, the variance is 4.
- fixednumpy.ndarray of bool (n_dimensions,), optional
If supplied, True values indicate that the width of the Gaussian along the corresponding axis should not be altered in the output result.
- Returns:
- shape_matricesnumpy.ndarray (n_sets, shape, n_dimensions, n_dimensions)
Shape matrices defined for each set/coordinate.
- grig.resample_utils.shaped_adaptive_weight_matrix(sigma, rchi2, gradient_mscp, density=1.0, variance_offset=0.0, fixed=None, tolerance=None)[source]¶
Shape and scale the weighting kernel based on a prior fit.
In the standard resampling algorithm, a polynomial fit may weight each sample coordinate (\(x\)) according to its distance from the reference position at which the fit is derived (\(x_{ref}\)) such that samples closer to the reference position have more influence on the fit than those that are farther. The weighting function used is:
\[W = exp(-\Delta x^T A^{-1} \Delta x)\]where \({\Delta x}_k = x_{ref, k} - x_k\) for dimension \(k\), and \(A\) is a symmetric positive definite matrix defining the “shape” of the weighting kernel. This effectively defines a multivariate Gaussian centered on the reference point where overall size, rotation, and stretch of each principle axis may be altered. The goal of shaping the weighting kernel \(A\), is to produce a fit where the reduced chi-squared statistic of the fit equal to one (\(\chi_r^2 = 1\)). To derive the shape \(A\), an initial fit must have first been performed using a square diagonal matrix \(A_0\), whose diagonal elements are the square of the sigma parameter (\(diag(A_0) = 2 \sigma^2\)). It is then easy to define \(A_0^{-1}\) in \(K\) dimensions as:
\[\begin{split}A_0^{-1} = \frac{1}{2} \begin{bmatrix} \frac{1}{\sigma_0^2} & 0 & \dots & 0 \\ 0 & \frac{1}{\sigma_1^2} & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \dots & 0 & \frac{1}{\sigma_K^2} \end{bmatrix}\end{split}\]Following (or during) the initial fit with \(A_0\), the mean square cross products of the derivatives evaluated at the sample coordinates should be calculated using
derivative_mscp()
, where the returned matrix has values:\[\bar{g}_{ij}^2 = \frac{\partial \bar{f}}{\partial X_i} \frac{\partial \bar{f}}{\partial X_j}\]for dimensions \(X_i\) and \(X_j\) in K-dimensions, where \(\partial \bar{f} / \partial X_i\) is the weighted mean of the partial derivatives over all samples in the fit with respect to \(X_i\), with weighting defined by \(W\) (above) using \(A_0^{-1}\).
The matrix \(\bar{g}^2\) is only used to define the shape of new weighting kernel, not the overall size. Therefore, it is normalized such that \(|\bar{g}^2| = 1\).
We can then use singular value decomposition to factorize \(\bar{g}^2\) into:
\[\bar{g}^2 = U S V^T\]Here, the matrices \(U\) and \(V^T\) represent rotations (since \(|\bar{g}^2| > 0\)), and the singular values (\(S\)) can be thought of as the magnitudes of the semi-axes of an ellipsoid in K-dimensional space. Naively, this provides us with a basis from which to determine the final “shape” matrix where:
\[A^{-1} = \beta\, \bar{g}^2\]and \(\beta\) is an as yet undetermined scaling factor representing the overall size of the new kernel. The resulting weighting kernel has the shape of an ellipsoid with the smallest semi-axis oriented parallel to the gradient. In other words, the new weighting will result in a fit that is less sensitive to distant samples in directions where the gradient is high, and more sensitive to distant samples in directions where the gradient is low.
However, we must still keep in mind that our overall goal is to get \(\chi_r^2 \rightarrow 1\) when a fit is performed using the new kernel \(A\). For example, if \(\chi_r=1\) in the initial fit, there is no need to modify the kernel, and if \(\chi_r < 1\), then we do not want to get an even better fit.
Another factor to consider is that we cannot be completely confident in this new shape due to the distribution of samples. If we are fitting at a point that is away from the center of the sample distribution, it is unadvisable to use a highly shaped kernel due to increased uncertainty in the mean partial derivatives. Furthermore, even if we are fitting close to the center of the distribution, that does not mean that the derivative calculations were not skewed by a few nearby samples when fitting in a local depression of the sample density (for example, near the center of a donut-like distribution).
We model our confidence (\(\gamma\)) in the “shape” using a logistic function (see
stretch_correction()
) that factors in \(\chi_r^2\), a measure of the sample density profile (\(\rho\)) at the fit coordinate, and from the deviation of the fit coordinate from the center of the sample distribution (\(\sigma_d\)):\[\gamma = \frac{2} {\left( {1 + (2^{exp(\sigma)} - 1)e^{\rho (1 - \chi_r^2)}} \right)^{1/exp(\sigma)}} - 1\]The density \(\rho\) is calculated using
relative_density()
which sets \(\rho=1\) when the samples are uniformly distributed, \(\rho > 1\) when the distribution is concentrated on the fit coordinate, and \(0 < \rho < 1\) when the fitting in a local depression of the distribution density. The deviation (\(\sigma_d\)) is calculated usingoffset_variance()
.The confidence parameter has asymptotes at \(\gamma = \pm 1\), is equal to zero at \(\chi_r^2=1\), is positive for \(\chi_r^2 > 1\), and is negative for \(\chi_r^2 < 1\). Also, the magnitude increases with \(\rho\), and decreases with \(\sigma_d\). The final shape matrix is then defined as:
\[A^{-1} = \beta\, U S^{\gamma} V^T\]Note that as \(\gamma \rightarrow 0\), the kernel approaches that of a spheroid. For \(\chi_r^2 > 1\), the kernel approaches \(\bar{g}^2\). When \(\chi_r^2 < 1\), the kernel effectively rotates so that the fit becomes increasingly sensitive to samples along the direction of the derivative. The overall size of the kernel remains constant since \(|S^{\gamma}| = |\bar{g}^2| = 1\).
Finally, the only remaining factor to calculate is the scaling factor \(\beta\) which is given as:
\[\beta = \left( \frac{\chi_r}{|A_0|} \right)^{1/K}\]Note that this scaling has the effect of setting
\[\frac{|A_0|}{|A|} = \chi_r\]The user has the option of fixing the kernel in certain dimensions such that \({A_0}_{k, k} = A_{k, k}\). If this is the case, for any fixed dimension \(k\) we set:
\[ \begin{align}\begin{aligned}{U S^{\gamma} V^T}_{k, k} = 1\\{U S^{\gamma} V^T}_{k, i \neq k}^2 = 0\\{U S^{\gamma} V^T}_{i \neq k, k}^2 = 0\end{aligned}\end{align} \]meaning the scaling is unaltered for dimensions \(k\), and no rotation will be applied to any other dimension with respect to \(k\). Since the overall size must be controlled through fewer dimensions, \(\beta\) must take the form of a diagonal matrix:
\[diag(\beta)_{i \in fixed} = \frac{1}{2 \sigma_i^2} ,\,\, diag(\beta)_{i \notin fixed} = \left( \frac{\chi_r |A_0|}{|{U S^{\gamma} V^T}|} \prod_{i \in fixed}{2 \sigma_i^2} \right)^{1 / (K - K_{fixed})}\]Once again \(A^{-1} = \beta\, U S^{\gamma} V^T\).
- Parameters:
- sigmanumpy.ndarray (n_dimensions,)
The standard deviations of the Gaussian for each dimensional component used for the distance weighting of each sample in the initial fit.
- rchi2float
The reduced chi-squared statistic of the fit.
- gradient_mscpnumpy.ndarray (n_dimensions, n_dimensions)
An array where gradient_mscp[i, j] = derivative[i] * derivative[j] in dimensions i and j. Please see
derivative_mscp()
for further information. Must be Hermitian and real-valued (symmetric).- densityfloat, optional
The local relative density of the samples around the fit coordinate. A value of 1 represents uniform distribution. Values greater than 1 indicate clustering around the fitting point, and values less than 1 indicate that samples are sparsely distributed around the fitting point. Please see
relative_density()
for further information.- variance_offsetfloat, optional
The variance at the fit coordinate determined from the sample coordinate distribution. i.e., if a fit is performed at the center of the sample distribution, the variance is zero. If done at 2-sigma from the sample distribution center, the variance is 4.
- fixednumpy.ndarray of bool (n_dimensions,), optional
If supplied, True values indicate that the width of the Gaussian along the corresponding axis should not be altered in the output result.
- tolerancefloat, optional
The threshold below which SVD values are considered zero when determining the matrix rank of derivative_mscp. Please see
numpy.linalg.matrix_rank()
for further information.
- Returns:
- shape_matrixnumpy.ndarray (n_dimensions, n_dimensions)
A matrix defining the shape of the weighting kernel required by
calculate_adaptive_distance_weights_shaped()
to create a set of weighting factors.
- grig.resample_utils.sigmoid(x, factor=1.0, offset=0.0)[source]¶
Evaluate a scaled and shifted logistic function.
The sigmoid function has the form:
\[f(x) = \frac{1}{1 + e^{\beta (x - \alpha)}}\]where \(\beta\) is the scaling factor, and \(\alpha\) is an offset applied to x.
- Parameters:
- xint or float or numpy.ndarray (shape)
The independent variable. If an array is supplied, must be the same shape as factor and offset (if both/either are also arrays).
- factorint or float or numpy.ndarray (shape)
The scaling factor applied to x. If an array is supplied, must be the same shape as x and offset (if both/either are also arrays).
- offsetint or float or numpy.ndarray (shape)
The offset to applied to x. If an array is supplied, must be the same shape as x and factor (if both/either are also arrays).
- Returns:
- resultfloat or numpy.ndarray (shape)
The sigmoid function evaluated at x.
- grig.resample_utils.single_polynomial_terms(coordinate, exponents)[source]¶
Derive polynomial terms for a single coordinate given polynomial exponents.
Raises a single coordinate by a power and then calculates the product over all dimensions. For example, the output of an (x, y) vector with exponent=[[2, 3]] would be \(x^2y^3\).
Note that multiple sets of exponents are expected to be provided during this operation, so the exponents parameter should be a 2-dimensional array. The return value will be a 1-dimensional array with size equal to the number of exponent sets provided.
- Parameters:
- coordinatenumpy.ndarray (N,)
The coordinate in each dimension.
- exponentsnumpy.ndarray (n_exponents, N)
Sets of exponents to apply to the coordinate.
- Returns:
- numpy.ndarray of numpy.float64 (n_exponents,)
The polynomial terms for the coordinate.
- grig.resample_utils.solve_amat_beta(phi, data, weights)[source]¶
Convenience function returning matrices suitable for linear algebra.
Given independent variables \(\Phi\), data \(y\), and weights \(W\), returns matrices \(A\) and \(B\) where:
\[ \begin{align}\begin{aligned}A = \Phi W \Phi^T\\B = \Phi W y\end{aligned}\end{align} \]- Parameters:
- phinumpy.ndarray (n_terms, n_samples)
Polynomial terms of independent variables for each sample in the fit.
- datanumpy.ndarray (n_samples,)
Sample data values.
- weightsnumpy.ndarray (n_samples,)
Squared weights.
- Returns:
- A, Bnumpy.ndarray (n_terms, n_terms), numpy.ndarray (n_terms,)
The \(A\) and \(B\) terms described above.
- grig.resample_utils.solve_coefficients(amat, beta)[source]¶
Find least squares solution of Ax=B and rank of A.
- Parameters:
- amatnumpy.ndarray (ncoeffs, ndata)
The coefficient array.
- betanumpy.ndarray (ncoeffs,) or (ncoeffs, N)
Dependent values. If 2-dimensional, the least-squares solution is calculated for each column.
- Returns:
- rank, xint, numpy.ndarray (min(M, N),)
The rank of amat and least-squares solution of x.
- grig.resample_utils.solve_fit(window_coordinates, window_phi, window_values, window_error, window_mask, window_distance_weights, fit_coordinate, fit_phi, order, is_covar=False, fit_threshold=0.0, mean_fit=False, cval=nan, error_weighting=True, estimate_covariance=False, order_algorithm_idx=1, term_indices=None, derivative_term_map=None, derivative_term_indices=None, edge_algorithm_idx=1, edge_threshold=None, minimum_points=None, get_error=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True)[source]¶
Solve for a fit at a single coordinate.
Solves a polynomial fit of the form:
\[f(\Phi) = \hat{c} \cdot \Phi\]where \(\hat{c}\) are the derived polynomial coefficients for the \(\Phi\) terms. The \(\Phi\) terms are derived from the independent values of the samples within the window region of the fit coordinate, and from the fit coordinates themselves (see
polynomial_terms()
for further details).The \(\Phi\) terms are pre-calculated early in the resampling algorithm as this is a relatively cheap calculation, and we do not want to repeat the same calculation multiple times. For example, if sample[1] is within the window region of point[1] and point[2], there should be no need to repeat the polynomial term calculation twice. Initially, one might think that the actual coordinates could then be discarded, but there are a number of calculations that depend on the sample coordinates relative to the fitting points, which must therefore be dealt with “on-the-fly”.
EDGE CHECKING
The first of the on-the-fly calculation is the “edge check”. Generally, polynomial fits are not well-defined away from the sample distribution from which they were derived. This is especially true for higher order fits that may fit the sample distribution well, but start to deviate wildly when evaluated outside of the distribution. The edge check step defines a border around the distribution, outside of which the fit will be aborted. There are a number of algorithms available which vary in robustness and speed. Please see
check_edges()
for details on available algorithms.ORDER CHECKING
The next step is to determine if it is possible to perform a polynomial fit of the given order. For example, a 1-dimensional 2nd order polynomial fit can only be derived from a minimum of 3 samples. Additionally, if some samples share the same coordinate, the fit becomes underdetermined, or if dealing with multidimensional data, one needs to ensure that the samples are not colinear. If we also wish to propagate or derive valid errors, we should ensure the system is overdetermined. There are a number of order checking algorithms which vary in robustness and speed. Please see
check_orders()
for details on the available algorithms.There are two available actions if the samples fail the order check. The first (default) is to simply abort fitting. The second option is to lower the order of fit until the samples meet the order check requirements. This is only possible if the fit order is equal across all dimensions. To allow for variable orders, set the order_term_indices (see parameter descriptions) to a valid value, and update window_phi and fit_phi accordingly.
FITTING
If the above checks pass, a fit can be attempted. There are actually three types of fit that may be performed. The first is the standard polynomial fit described above. The second is a weighted mean which may explicitly be performed by setting mean_fit to True, or may be performed on-the-fly if the order was lowered to zero during the order check. Finally, if is_covar was set to True, the window_values are considered covariances to propagate, and a fit will derived by propagating a weighted variance.
FINAL VALIDATION
If a polynomial fit was performed, a final check may be performed to confirm that the solution does not deviate to significantly from the expected values. This is done by evaluating the reduced chi-squared statistic of the fit (\(\chi_r^2\)). If \(\sqrt{\chi_r^2} > | \text{fit\_threshold} |\), the fit is not accepted, and is aborted if fit_threshold < 0, or set to the weighted mean of the samples if fit_threshold > 0. No validation will be performed if fit_threshold is set to zero (default). Note that window_error must be supplied in order for validation to be meaningful.
- Parameters:
- window_coordinatesnumpy.ndarray (n_dimensions, n_samples)
The independent coordinates within the window region of the fitting coordinate.
- window_phinumpy.ndarray (n_terms, n_samples)
The polynomial terms of window_coordinates. Please see
polynomial_terms()
for further details.- window_valuesnumpy.ndarray (n_samples,)
The dependent values of the samples.
- window_errornumpy.ndarray (n_samples,)
The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (1,) in which case all samples in a set will share the same associated error value. If the shape is set to (0,) this indicates that no error values are available for the samples.
- window_masknumpy.ndarray (n_samples,)
A mask where False indicates that the associated sample should be excluded from the fit.
- window_distance_weightsnumpy.ndarray (n_samples,)
The distance weighting factors applied to each sample in the fit.
- fit_coordinatenumpy.ndarray (n_dimensions,)
The coordinate of the fitting point.
- fit_phinumpy.ndarray (n_terms,)
The polynomial of fit_coordinate. Please see
polynomial_terms()
for further details.- ordernumpy.ndarray
The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions.
- is_covarbool, optional
If True, indicates that window_values contains covariance values that should be propagated through algorithm. If this is the case, polynomial fitting is disabled, and a weighted variance is calculated instead.
- fit_thresholdfloat, optional
If fit_threshold is non-zero, perform a check on the goodness of the fit. When the reduced-chi statistic is greater than abs(fit_threshold), the fit is determined to be a failure, and a replacement value is used. If fit_threshold < 0, failed fit values will be set to cval. If fit_threshold > 0, failed fit values will be replaced by the weighted mean.
- mean_fitbool, optional
If True, a weighted mean is performed instead of calculating a polynomial fit.
- cvalfloat, optional
In a case that a fit is unable to be calculated at certain location, cval determines the returned fit value.
- error_weightingbool, optional
If True, weight the samples in the fit by the inverse variance (1 / window_error^2) in addition to distance weighting.
- estimate_covariancebool, optional
If True, when determining the error on the fit and reduced chi-squared, calculate the covariance of the fit coefficients using
estimated_covariance_matrix_inverse()
. Otherwise, usecovariance_matrix_inverse()
.- order_algorithm_idxint, optional
An integer specifying which polynomial order validation algorithm to use. The default (1), will always be the more robust of all available options. For further information, please see
check_edges()
.- term_indicesnumpy.ndarray (> max(order) + 1,), optional
A 1-dimensional lookup array for use in determining the correct phi terms to use for a given polynomial order. The order validation algorithm ensures a fit of the requested order is possible. If not, and the orders are equal in all dimensions, it may also optionally return a suggested order. In this case, order_term_indices is used to select the correct window_phi and fit_phi for a given order (k), where terms are extracted via phi[order_term_indices[k]:order_term_indices[k+1]].
- derivative_term_mapnumpy.ndarray, optional
A mapping array for the determination of derivatives from the coefficients of the fit, and available terms in “phi”. The shape of the array is (n_dimensions, 3, n_derivative_terms). This is only required if the gradient is required as an output. For a full description of the derivative map, please see
polynomial_derivative_map()
.- derivative_term_indicesnumpy.ndarray (max(order) + 1,), optional
If the fit order is allowed to vary, gives the indices in derivative_term_map for a given symmetrical order. The correct derivative_term_map mapping for order k is given as derivative_term_map[:, :, indices[k]:indices[k + 2]].
- edge_algorithm_idxint, optional
Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see
check_edges()
for further information. The default (1), is always the most robust of the available algorithms.- edge_thresholdnumpy.ndarray (n_dimensions,)
A threshold parameter determining how close an edge should be to the center of the distribution during
check_edges()
. Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.- minimum_pointsint, optional
Certain order validation algorithms check the number of available samples as a means to determine what order of fit is appropriate. If pre-calculated for the base order, it may be passed in here for a slight speed advantage.
- get_errorbool, optional
If True, return the error on the fit.
- get_weightsbool, optional
If True, return the sum of all sample weights used in determining the fit.
- get_distance_weightsbool, optional
If True, return the sum of only the distance weights used in determining the fit.
- get_rchi2bool, optional
If True, return the reduced chi-squared statistic for each of the fits.
- get_cross_derivativesbool, optional
If True, return the derivative mean-squared-cross-products of the samples in the fit. See
derivative_mscp()
for further information.- get_offset_variancebool optional
If True, return the offset of the fitting point from the sample distribution. See
offset_variance()
for further information.
- Returns:
- fit_result8-tuple
- fit_result[0]: Fitted value (float).
Set to cval on fit failure.
- fit_result[1]: Error on the fit (float).
Set to NaN on fit failure.
- fit_result[2]: Number of samples included in the fit (int).
Set to 0 on fit failure.
- fit_result[3]: Weight sum (float).
Set to 0.0 on fit failure.
- fit_result[4]: Distance weight sum (float).
Set to 0.0 on fit failure.
- fit_result[5]: Reduced chi-squared statistic (float).
Set to NaN on fit failure.
- fit_result[6]: Derivative mean-squared-cross-product (numpy.ndarray).
Set to shape (0, 0) on fit failure, and (n_dimensions, n_dimensions) otherwise.
- fit_result[7]: Offset variance from the distribution center (float).
Set to NaN on fit failure.
- grig.resample_utils.solve_fits(sample_indices, sample_coordinates, sample_phi_terms, sample_data, sample_error, sample_mask, fit_coordinates, fit_phi_terms, order, alpha, adaptive_alpha, is_covar=False, mean_fit=False, cval=nan, fit_threshold=0.0, error_weighting=True, estimate_covariance=False, order_algorithm_idx=1, order_term_indices=None, derivative_term_map=None, derivative_term_indices=None, edge_algorithm_idx=1, edge_threshold=None, minimum_points=None, get_error=True, get_counts=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True)[source]¶
Solve all fits within one intersection block.
This function is a wrapper for
solve_fit()
over all data sets and fit points. The main computations here involve:Creating and populating the output arrays.
Selecting the correct samples within the region of each fitting window.
Calculating the full weighting factors for the fits.
For further details on the actual fitting, please see
solve_fit()
.- Parameters:
- sample_indicesnumba.typed.List
A list of 1-dimensional numpy.ndarray (dtype=int) of length n_fits. Each list element sample_indices[i], contains the indices of samples within the “window” region of fit_indices[i].
- sample_coordinatesnumpy.ndarray (n_dimensions, n_samples)
The independent coordinates for each sample in n_dimensions
- sample_phi_termsnumpy.ndarray (n_terms, n_samples)
The polynomial terms of sample_coordinates. Please see
polynomial_terms()
for further details.- sample_datanumpy.ndarray (n_sets, n_samples)
The dependent values of the samples for n_sets, each containing n_samples.
- sample_errornumpy.ndarray (n_sets, n_samples)
The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (n_sets, 1) in which case all samples in a set will share the same associated error value. If the shape is set to (n_sets, 0), this indicates that no error values are available for the samples.
- sample_masknumpy.ndarray (n_sets, n_samples)
A mask where False indicates that the associated sample should be excluded from all fits.
- fit_coordinatesnumpy.ndarray (n_dimensions, n_fits)
The independent variables at each fit coordinate in d_dimensions.
- fit_phi_termsnumpy.ndarray (n_terms, n_fits)
The polynomial terms of fit_coordinates. Please see
polynomial_terms()
for further details.- ordernumpy.ndarray
The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions.
- alphanumpy.ndarray (n_dimensions,)
A distance weighting scaling factor per dimension. The weighting kernel is applied equally to all sets and samples. For further details, please see
calculate_distance_weights()
. Will be overridden by adaptive_alpha if adaptive_alpha.size > 0.- adaptive_alphanumpy.ndarray
Shape = (n_samples, n_sets, [1 or n_dimensions], n_dimensions). Defines a weighting kernel for each sample in each set. The function
calculate_adaptive_distance_weights_scaled()
will be used for kernels of shape (1, n_dimensions), andcalculate_adaptive_distance_weights_shaped()
will be used for kernels of shape (n_dimensions, n_dimensions). adaptive_alpha is a required parameter due to Numba constraints, and will override the alpha parameter unless it has a size of 0. Therefore, to disable, please set the size of any dimension to zero.- is_covarbool, optional
If True, indicates that sample_data contains covariance values that should be propagated through algorithm. If this is the case, polynomial fitting is disabled, and a weighted variance is calculated instead.
- mean_fitbool, optional
If True, a weighted mean is performed instead of calculating a polynomial fit.
- cvalfloat, optional
In a case that a fit is unable to be calculated at certain location, cval determines the fill value for the output fit array at those locations.
- fit_thresholdfloat, optional
If fit_threshold is non-zero, perform a check on the goodness of the fit. When the reduced-chi statistic is greater than abs(fit_threshold), the fit is determined to be a failure, and a replacement value is used. If fit_threshold < 0, failed fit values will be set to cval. If fit_threshold > 0, failed fit values will be replaced by the weighted mean.
- error_weightingbool, optional
If True, weight the samples in the fit by the inverse variance (1 / sample_error^2) in addition to distance weighting.
- estimate_covariancebool, optional
If True, calculate the covariance of the fit coefficients using
estimated_covariance_matrix_inverse()
. Otherwise, usecovariance_matrix_inverse()
.- order_algorithm_idxint, optional
An integer specifying which polynomial order validation algorithm to use. The default (1), will always be the more robust of all available options. For further information, please see
check_edges()
.- order_term_indicesnumpy.ndarray (> max(order) + 1,), optional
A 1-dimensional lookup array for use in determining the correct phi terms to use for a given polynomial order. The order validation algorithm ensures a fit of the requested order is possible. If not, and the orders are equal in all dimensions, it may also optionally return a suggested order. In this case, order_term_indices is used to select the correct sample_phi_terms and fit_phi_terms for a given order (k), where terms are extracted via phi[order_term_indices[k]:order_term_indices[k + 2]].
- derivative_term_mapnumpy.ndarray, optional
A mapping array for the determination of derivatives from the coefficients of the fit, and available terms in “phi”. The shape of the array is (n_dimensions, 3, n_derivative_terms). This is only required if the gradient is required as an output. For a full description of the derivative map, please see
polynomial_derivative_map()
.- derivative_term_indicesnumpy.ndarray (max(order) + 1,), optional
If the fit order is allowed to vary, gives the indices in derivative_term_map for a given symmetrical order. The correct derivative_term_map mapping for order k is given as derivative_term_map[:, :, indices[k]:indices[k + 2]].
- edge_algorithm_idxint, optional
Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see
check_edges()
for further information. The default (1), is always the most robust of the available algorithms.- edge_thresholdnumpy.ndarray (n_dimensions,)
A threshold parameter determining how close an edge should be to the center of the distribution during
check_edges()
. Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.- minimum_pointsint, optional
Certain order validation algorithms check the number of available samples as a means to determine what order of fit is appropriate. If pre-calculated for the base order, it may be passed in here for a slight speed advantage.
- get_errorbool, optional
If True, return the error on the fit.
- get_countsbool, optional
If True, return the number of samples used when determining the fit at each fitting point.
- get_weightsbool, optional
If True, return the sum of all sample weights used in determining the fit at each point.
- get_distance_weightsbool, optional
If True, return the sum of only the distance weights used in determining the fit at each point.
- get_rchi2bool, optional
If True, return the reduced chi-squared statistic for each of the fitted points.
- get_cross_derivativesbool, optional
If True, return the derivative mean-squared-cross-products of the samples for each of the fitted points. See
derivative_mscp()
for further information.- get_offset_variancebool optional
If True, return the offset of the fitting point from the sample distribution. See
offset_variance()
for further information.
- Returns:
- fit_results8-tuple of numpy.ndarray
fit_results[0]: Fitted values. fit_results[1]: Error on the fit. fit_results[2]: Number of samples in each fit. fit_results[3]: Weight sums. fit_results[4]: Distance weight sums. fit_results[5]: Reduced chi-squared statistic. fit_results[6]: Derivative mean-squared-cross-products. fit_results[7]: Offset variances from the sample distribution center.
All arrays except for fit_results[6] have the shape (n_sets, n_fits) or (0, 0) depending on whether get_<name> is True or False respectively. The derivative MSCP is of shape (n_sets, n_fits, n_dimensions, n_dimensions) if requested, and (1, 0, 0, 0) otherwise.
- grig.resample_utils.solve_inverse_covariance_matrices(phi, error, residuals, weights, error_weighted_amat=None, rank=None, calculate_error=True, calculate_residual=True, estimate_covariance=False)[source]¶
Inverse covariance matrices on fit coefficients from errors and residuals.
A utility function to calculate the inverse covariance matrices of the fit coefficients (\(c\)) based on the \(1\sigma\) error values of the sample measurements and/or the residuals of the fit \(y - c \cdot \Phi\).
The function used to calculate the error covariance may be either
estimated_covariance_matrix_inverse()
orcovariance_matrix_inverse()
. However,estimated_covariance_matrix_inverse()
will always be used to calculate the covariance matrix derived from residuals.- Parameters:
- phinumpy.ndarray (nterms, N)
The polynomial terms for each of the N samples.
- errornumpy.ndarray (N,)
The 1-sigma error values for each sample.
- residualsnumpy.ndarray (N,)
The residuals of the fit y - c.phi.
- weightsnumpy.ndarray (N,)
The weighting of each sample in the fit.
- error_weighted_amatnumpy.ndarray (nterms, nterms), optional
The matrix \(A = \Phi^T W Var(y) W \Phi\), optionally passed in for speed if pre-calculated.
- rankint or float, optional
The rank of error_weighted_amat, if provided, and it’s rank was pre-calculated. Otherwise, it will be solved for.
- calculate_errorbool, optional
If True, calculate the covariance of the fit coefficients based upon the error values.
- calculate_residualbool, optional
If True, calculate the covariance of the fit coefficients based upon residuals of the fit.
- estimate_covariancebool, optional
If True, calculate the covariance of the fit coefficients from the error values using
estimated_covariance_matrix_inverse()
. Otherwise, usecovariance_matrix_inverse()
.
- Returns:
- e_inv, r_invnumpy.ndarray, numpy.ndarray
The inverse covariance calculated from error, and the inverse covariance calculated from residuals. If calculate_error is True, the shape of e_cov will be (nterms, nterms) or (0, 0) otherwise. The same is true for calculate_residual and r_cov.
- grig.resample_utils.solve_mean_fit(data, error, weight, weightsum=None, calculate_variance=True, calculate_rchi2=True)[source]¶
Return the weighted mean of data, variance, and reduced chi-squared.
For data (\(y\)), error (\(\sigma\)), and weights (\(w\)), the weighted mean is given as:
\[\bar{y} = \frac{\sum_{i=1}^{N}{w_i y_i}} {\sum_{i=1}^{N}{w_i}}\]The returned variance (\(V\)) will depend on use_error. If use_error is True:
\[V = \frac{\sum_{i=1}^{N}{(w_i\sigma_i)^2}}{(\sum_{i=1}^{N}{w_i})^2}\]If use_error is False:
\[V = \frac{1}{N - 1} \frac{\sum_{i=1}^{N}{w_i (y_i - \bar{y})^2}} {\sum_{i=1}^{N}{w_i}}\]Finally, the reduced chi-squared statistic is given as:
\[\chi_r^2 = \frac{N}{N - 1} \frac{\sum_{i=1}^{N}{w_i (y_i - \bar{y})^2 / \sigma_i^2}} {\sum_{i=1}^{N}{w_i}}\]Note that \(\chi_r^2 = 1\) is use_error is False.
- Parameters:
- datanumpy.ndarray (N,)
The data array consisting of N samples.
- errornumpy.ndarray (N,)
The associated 1-sigma error values for each of the N data samples.
- weightnumpy.ndarray (N,)
The weighting applied to each of the N data samples.
- weightsumint or float, optional
The sum of the weights, optionally passed in for speed if pre-calculated.
- calculate_variancebool, optional
If True, calculate the variance. Otherwise, variance will be returned as a float value of zero.
- calculate_rchi2bool, optional
If True, calculate the reduced chi-squared statistic. Otherwise, it will be returned as a float value of zero.
- Returns:
- mean, variance, rchi2float, float, float
The weighted mean, variance and reduced chi-squared statistic.
- grig.resample_utils.solve_polynomial_fit(phi_samples, phi_point, data, error, distance_weight, weight, derivative_term_map=None, calculate_variance=True, calculate_rchi2=True, calculate_derivative_mscp=True, error_weighting=True, estimate_covariance=False)[source]¶
Derive a polynomial fit from samples, then calculate fit at single point.
The fit to the sample distribution is given as
\[f(\Phi) = \hat{c} \cdot \Phi\]where \(\Phi\) contains products of the sample coordinates for each coefficient term. The coefficients \(\hat{c}\) are solved for using least-squares fitting and then applied to calculate the fitted value at a single point \(f(\Phi_{fit})\).
It is also possible to return an on the fit as a variance. If a valid error is supplied, it will be propagated. If no valid errors are available, they will be calculated from residuals on the fit.
The reduced chi-squared (\(\chi^2\)) statistic may also be calculated, but is only really meaningful if valid errors were supplied. Otherwise, \(\chi^2 \equiv 1\).
Finally, the covariance of gradients between dimensions may also be returned. Note that these are the weighted mean of all sample gradients.
- Parameters:
- phi_samplesnumpy.ndarray (n_terms, n_samples)
The array of independent terms for each sample.
- phi_pointnumpy.ndarray (n_terms,)
The array containing the independent terms at the fitting point.
- datanumpy.ndarray (n_samples,)
The array of sample values.
- errornumpy.ndarray (n_samples,)
The array of error values for each sample. Note that if errors are unavailable, an array of size 0 may be supplied. If this is the case, and an error on the fit is required, it will be derived from the residuals of the fit from the data. In addition, the reduced chi-squared statistic will always be 1.0 if derived from residuals.
- distance_weightnumpy.ndarray (n_samples,)
The distance weighting factor (not including any error weighting) applied to each sample in the fit.
- weightnumpy.ndarray (n_samples,)
The full weighting factor applied to each sample in the fit.
- derivative_term_mapnumpy.ndarray, optional
A mapping array for the determination of derivatives from the coefficients of the fit, and available terms in “phi”. The shape of the array is (n_dimensions, 3, n_derivative_terms). This is only required if the gradient is required as an output. For a full description of the derivative map, please see
polynomial_derivative_map()
.- calculate_variancebool, optional
If True, calculate the variance on the fit. The variance will be calculated irrespectively if a valid error was supplied, and the reduced chi-squared statistic is required as a return value.
- calculate_rchi2bool, optional
If True, calculate the reduced chi-squared statistic of the fit. Note that valid errors must be supplied for this to be meaningful.
- calculate_derivative_mscpbool, optional
If True, calculate the covariance of the derivatives at the fit point.
- error_weightingbool, optional
If True, indicates that weights includes an error weighting factor of 1/sigma^2. This allows for a slight speed increase when performing the fit as some mathematical terms will not need to be recalculated.
- estimate_covariancebool, optional
If True, uses
estimated_covariance_matrix_inverse()
instead ofcovariance_matrix_inverse()
when determining the variance. This is suggested if the errors are not well-behaved.
- Returns:
- fit_value, variance, rchi2, gradientsfloat, float, float, numpy.ndarray
The value of the fit at the fit point. The variance and reduced chi-squared will only be calculated if calculate_variance and calculate_rchi2 are respectively set to True. The gradients matrix is an (n_dimensions, n_dimensions) array where gradients[i, j] = dx_i * dx_j.
- grig.resample_utils.solve_rchi2_from_error(residuals, weights, errors, weightsum=None, rank=1)[source]¶
Return the reduced chi-squared given residuals and sample errors.
For weights \(w\), errors \(\sigma\), and residuals \(r\) where \(r = y - f(x)\), the reduced chi-squared is given as:
\[\chi_r^2 = \frac{N}{N - M} \frac{\sum_{i=1}^{N}{w_i r_i^2 / \sigma_i^2}} {\sum_{i=1}^{N}{w_i}}\]where \(M\) is given by rank.
- Parameters:
- residualsnumpy.ndarray (N,)
The residuals to the fit, or y - f(x).
- weightsnumpy.ndarray (N,)
The weights to each sample in the fit.
- errorsnumpy.ndarray (N,)
The 1-sigma measurement errors for each sample in the fit.
- weightsumint or float, optional
The sum of the sample weights, optionally passed in for speed.
- rankint or float, optional
The degrees of freedom used in the reduced chi-squared value is taken as N - rank. The default is 1 and applies the Bessel correction. If N < rank, rank is automatically set to N - 1.
- Returns:
- rchi2float
The reduced chi-squared value for the fit.
- grig.resample_utils.solve_rchi2_from_variance(residuals, weights, variance, weightsum=None, rank=1)[source]¶
Return the reduced chi-squared given residuals and constant variance.
For weights \(w\), variance \(V\), and residuals \(r\) where \(r = y - f(x)\), the reduced chi-squared is given as:
\[\chi_r^2 = \frac{1}{N - M} \frac{\sum_{i=1}^{N}{w_i r_i^2}} {V \sum_{i=1}^{N}{w_i}}\]where \(M\) is given by rank.
- Parameters:
- residualsnumpy.ndarray (N,)
The residuals to the fit, or y - f(x).
- weightsnumpy.ndarray (N,)
The weights to each sample in the fit.
- varianceint or float
The constant variance of the fit.
- weightsumint or float, optional
The sum of the sample weights, optionally passed in for speed.
- rankint or float, optional
The degrees of freedom used in the reduced chi-squared value is taken as N - rank. The default is 1 and applies the Bessel correction. If N < rank, rank is automatically set to N - 1.
- Returns:
- rchi2float
The reduced chi-squared value for the fit.
- grig.resample_utils.sscp(matrix, weight=None, normalize=False)[source]¶
Calculate the sum-of-squares-and-cross-products of a matrix.
For the matrix \(A\), calculates \(A^TA\). If weights (\(W\)) are provided.
\[sscp = WA^TAW^T\]Note that the weight should only contain the diagonal elements of \(W\), and as such should be a 1-dimensional array.
If normalize=True:
\[sscp = \frac{WA^TAW^T}{trace(W^TW)}\]where \(W = I\), the identity matrix if weight is not supplied.
- Parameters:
- matrixnumpy.ndarray (M, N)
Input matrix.
- weightnumpy.ndarray (N,), optional
Weights to be applied during sscp.
- normalizebool, optional
If True, scales result as described above. Default is False.
- Returns:
- numpy.ndarray of numpy.float64 (M, M)
Square output array containing the sum-of-squares-and-cross-products matrix.
- grig.resample_utils.stretch_correction(rchi2, density, variance_offset)[source]¶
A sigmoid function used by the “shaped” adaptive resampling algorithm.
This sigmoid function is applied when determining the severity of stretch (\(s\)) applied to principle axes of a rotated weighting kernel. The correction term (\(\gamma\)) is applied as:
\[s_{corrected} = s^\gamma\]Since the stretch values are determined from the singular values of a normalized Hermitian matrix \(A\) (see
shaped_adaptive_weight_matrix()
), where \(|A| \equiv 1\), then:\[\prod_{k=1}^{K}{s_k} = \prod_{k=1}^{K}{s_{corrected, k}} = 1\]in \(K\) dimensions. In other words, this does not affect the overall size (or volume) of the weighting kernel.
The correction factor is calculated using
half_max_sigmoid()
using \(c=1\), lower and upper asymptotes as -1 and 1, such that the midpoint is fixed at zero when \(x=0\). After making the necessary substitutions, the correction factor is given as:\[\gamma = \frac{2} {\left( {1 + (2^{\nu} - 1)e^{B(1 - x)}} \right)^{1/\nu}} - 1\]We then set the rate as \(B = \rho\) where \(\rho\) is the density as determined by
relative_density()
, and the point of inflection as \(exp(\sigma)\) where \(\sigma^2\) is the variance_offset as determined byoffset_variance()
. Finally, setting \(x = \chi_r^2\) we arrive at:\[\gamma = \frac{2} {\left( {1 + (2^{exp(\sigma)} - 1)e^{\rho (1 - \chi_r^2)}} \right)^{1/exp(\sigma)}} - 1\]As \(\gamma \rightarrow 0\), the resulting shape becomes more symmetrical. This will be the case when the fitting point is away from the center of the sample distribution (high \(\sigma^2\)), the fit occurs in a low density area (low \(\beta\)), or \(\chi_r^2 \rightarrow 1\).
It should be noted that \(s_{corrected} \rightarrow s\) as \(\chi_r^2 \rightarrow \infty\). However, in the range \(0 < \chi_r^2 < 1\), the correction factor is actually negative, with \(\gamma \rightarrow -1\) as \(\chi_r^2 \rightarrow 0\). This has the effect of effectively rotating the shape so that its major axis is perpendicular to the mean sample gradient rather than parallel.
- Parameters:
- rchi2int or float or numpy.ndarray (shape)
The reduced chi-squared statistic of the initial fit.
- densityint or float or numpy.ndarray (shape)
The local relative density at the fitting point (see
relative_density()
).- variance_offsetint or float or numpy.ndarray (shape)
The variance at the fitting with respect to the coordinate distribution of the sample used in the fit. Please see
offset_variance()
for further information.
- Returns:
- correctionnumpy.ndarray
The correction factor.
Notes
For high valued inflection points,
numpy
will set values in the denominator of the above equation to infinity, or 1, resulting in a misleading correction factor of \(\pm 1\). In order to counter this,half_max_sigmoid()
could not be used, and the calculation is done “by hand” with spurious values resulting in a correction factor of zero. This also requires somenumba
hackery resulting in the final output value being a numpy.ndarray in all cases. When single valued inputs are supplied, the output will be a single-valued array of zero dimensions which should be suitable for subsequent calculations.
- grig.resample_utils.update_mask(weights, mask)[source]¶
Updates a mask, setting False values where weights are zero or non-finite.
Utility function update a boolean mask in place given weights. mask values where weights are zero or non-finite will be set to False, and the total number of True values is returned as output.
- Parameters:
- weightsnumpy.ndarray (N,)
The weight values.
- masknumpy.ndarray of bool (N,)
The mask array to update in place.
- Returns:
- countsint
The number of True mask values following the update.
- grig.resample_utils.variance_from_offsets(offsets, covariance, sigma_inv=None)[source]¶
Determine the variance given offsets from the expected value.
The output variance is:
\[V = M^T \Sigma^{-1} M\]where offsets (\(M\)) are the deviations \(X - E[X]\) and \(\Sigma\) is the covariance.
- Parameters:
- offsetsnumpy.ndarray (n_dimensions,)
The observational offsets from the expected value.
- covariancenumpy.ndarray (n_dimensions, n_dimensions)
The covariance matrix of observations.
- sigma_invnumpy.ndarray (n_dimensions, n_dimensions), optional
The matrix inverse of the covariance matrix, optionally passed in for speed.
- Returns:
- variancefloat
The variance as described above.
- grig.resample_utils.weighted_fit_variance(residuals, weights, weightsum=None, rank=1)[source]¶
Calculate variance of a fit from the residuals of the fit to data.
For data \(y\), weights \(w\), and fitted function \(f(x) = fit(x, y, w)\), the residual is given as \(r = y - f(x)\). The variance is then given as:
\[V = \frac{1}{N - M} \frac{\sum_{i=1}^{N}{w_i r_i^2}} {\sum_{i=1}^{N}{w_i}}\]where \(M\) = dof if \(M < N\) and \(M = N - 1\) otherwise.
- Parameters:
- residualsnumpy.ndarray (ndata,)
The residuals given as data - fit.
- weightsnumpy.ndarray (ndata,)
The weights.
- weightsumint or float, optional
The sum of weights optionally passed in for speed if pre-calculated.
- rankint, optional
The degrees of freedom used in the variance calculation is taken as ndata - rank. The default is 1 and applies the Bessel correction. If ndata < rank, rank is automatically set to ndata - 1.
- Returns:
- variancefloat
Variance calculated from residuals.
- grig.resample_utils.weighted_mean(data, weights, weightsum=None)[source]¶
Calculate the weighted mean of a data set.
The weighted mean of data \(y\) with weights \(w\) is given as:
\[\bar{y} = \frac{\sum_{i=1}^{N}{w_i y_i}} {\sum_{i=1}^{N}{w_i}}\]This is a jit compiled
numba
function for use within other functions in sofia_redux.toolkit.resampling.- Parameters:
- datanumpy.ndarray (ndata,)
Data.
- weightsnumpy.ndarray (ndata,)
Weights.
- weightsumint or float, optional
Sum of weights, optionally passed in for speed if pre-calculated.
- Returns:
- weighted_meanfloat
The weighted mean.
- grig.resample_utils.weighted_mean_variance(variance, weights, weightsum=None)[source]¶
Calculated mean weighted variance.
Propagate variance as:
\[\bar{V} = \frac{\sum_{i=1}^{N}{w_i^2 V_i}} {(\sum_{i=1}^{N}{w_i})^2}\]- Parameters:
- variancenumpy.ndarray (ndata,)
Variance array.
- weightsnumpy.ndarray (ndata,)
Weights.
- weightsumint or float, optional
Sum of weights. Passed in for speed if pre-calculated.
- Returns:
- mean_variancefloat
The propagated variance.
- grig.resample_utils.weighted_variance(error, weights, weightsum=None)[source]¶
Utility function to calculate the biased weighted variance.
Calculates the biased weighted variance from data errors as:
\[V = \frac{\sum{(w\sigma)^2}}{(\sum{w})^2}\]- Parameters:
- errornumpy.ndarray (ndata,)
1-sigma error values.
- weightsnumpy.ndarray (ndata,)
Data weights.
- weightsumint or float, optional
Sum of weights. Optionally passed in for speed if pre-calculated.
- Returns:
- weighted_variancefloat
The weighted variance.