solve_fit¶
- sofia_redux.toolkit.resampling.solve_fit(window_coordinates, window_phi, window_values, window_error, window_mask, window_distance_weights, fit_coordinate, fit_phi, order, is_covar=False, fit_threshold=0.0, mean_fit=False, cval=nan, error_weighting=True, estimate_covariance=False, order_algorithm_idx=1, term_indices=None, derivative_term_map=None, derivative_term_indices=None, edge_algorithm_idx=1, edge_threshold=None, minimum_points=None, get_error=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True)[source]¶
Solve for a fit at a single coordinate.
Solves a polynomial fit of the form:
\[f(\Phi) = \hat{c} \cdot \Phi\]where \(\hat{c}\) are the derived polynomial coefficients for the \(\Phi\) terms. The \(\Phi\) terms are derived from the independent values of the samples within the window region of the fit coordinate, and from the fit coordinates themselves (see
polynomial_terms()
for further details).The \(\Phi\) terms are pre-calculated early in the resampling algorithm as this is a relatively cheap calculation, and we do not want to repeat the same calculation multiple times. For example, if sample[1] is within the window region of point[1] and point[2], there should be no need to repeat the polynomial term calculation twice. Initially, one might think that the actual coordinates could then be discarded, but there are a number of calculations that depend on the sample coordinates relative to the fitting points, which must therefore be dealt with “on-the-fly”.
EDGE CHECKING
The first of the on-the-fly calculation is the “edge check”. Generally, polynomial fits are not well-defined away from the sample distribution from which they were derived. This is especially true for higher order fits that may fit the sample distribution well, but start to deviate wildly when evaluated outside of the distribution. The edge check step defines a border around the distribution, outside of which the fit will be aborted. There are a number of algorithms available which vary in robustness and speed. Please see
check_edges()
for details on available algorithms.ORDER CHECKING
The next step is to determine if it is possible to perform a polynomial fit of the given order. For example, a 1-dimensional 2nd order polynomial fit can only be derived from a minimum of 3 samples. Additionally, if some samples share the same coordinate, the fit becomes underdetermined, or if dealing with multidimensional data, one needs to ensure that the samples are not colinear. If we also wish to propagate or derive valid errors, we should ensure the system is overdetermined. There are a number of order checking algorithms which vary in robustness and speed. Please see
check_orders()
for details on the available algorithms.There are two available actions if the samples fail the order check. The first (default) is to simply abort fitting. The second option is to lower the order of fit until the samples meet the order check requirements. This is only possible if the fit order is equal across all dimensions. To allow for variable orders, set the
order_term_indices
(see parameter descriptions) to a valid value, and updatewindow_phi
andfit_phi
accordingly.FITTING
If the above checks pass, a fit can be attempted. There are actually three types of fit that may be performed. The first is the standard polynomial fit described above. The second is a weighted mean which may explicitly be performed by setting
mean_fit
toTrue
, or may be performed on-the-fly if the order was lowered to zero during the order check. Finally, ifis_covar
was set toTrue
, thewindow_values
are considered covariances to propagate, and a fit will derived by propagating a weighted variance (this was created for the SOFIA HAWC+ pipeline).FINAL VALIDATION
If a polynomial fit was performed, a final check may be performed to confirm that the solution does not deviate to significantly from the expected values. This is done by evaluating the reduced chi-squared statistic of the fit (\(\chi_r^2\)). If \(\sqrt{\chi_r^2} > | \text{fit\_threshold} |\), the fit is not accepted, and is aborted if
fit_threshold
< 0, or set to the weighted mean of the samples iffit_threshold
> 0. No validation will be performed iffit_threshold
is set to zero (default). Note thatwindow_error
must be supplied in order for validation to be meaningful.- Parameters:
- window_coordinatesnumpy.ndarray (n_dimensions, n_samples)
The independent coordinates within the window region of the fitting coordinate.
- window_phinumpy.ndarray (n_terms, n_samples)
The polynomial terms of
window_coordinates
. Please seepolynomial_terms()
for further details.- window_valuesnumpy.ndarray (n_samples,)
The dependent values of the samples.
- window_errornumpy.ndarray (n_samples,)
The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (1,) in which case all samples in a set will share the same associated error value. If the shape is set to (0,) this indicates that no error values are available for the samples.
- window_masknumpy.ndarray (n_samples,)
A mask where
False
indicates that the associated sample should be excluded from the fit.- window_distance_weightsnumpy.ndarray (n_samples,)
The distance weighting factors applied to each sample in the fit.
- fit_coordinatenumpy.ndarray (n_dimensions,)
The coordinate of the fitting point.
- fit_phinumpy.ndarray (n_terms,)
The polynomial of
fit_coordinate
. Please seepolynomial_terms()
for further details.- ordernumpy.ndarray
The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions.
- is_covarbool, optional
If
True
, indicates thatwindow_values
contains covariance values that should be propagated through algorithm. If this is the case, polynomial fitting is disabled, and a weighted variance is calculated instead.- fit_thresholdfloat, optional
If fit_threshold is non-zero, perform a check on the goodness of the fit. When the reduced-chi statistic is greater than abs(fit_threshold), the fit is determined to be a failure, and a replacement value is used. If
fit_threshold
< 0, failed fit values will be set tocval
. Iffit_threshold
> 0, failed fit values will be replaced by the weighted mean.- mean_fitbool, optional
If
True
, a weighted mean is performed instead of calculating a polynomial fit.- cvalfloat, optional
In a case that a fit is unable to be calculated at certain location,
cval
determines the returned fit value.- error_weightingbool, optional
If
True
, weight the samples in the fit by the inverse variance (1 / window_error^2) in addition to distance weighting.- estimate_covariancebool, optional
If
True
, when determining the error on the fit and reduced chi-squared, calculate the covariance of the fit coefficients usingestimated_covariance_matrix_inverse()
. Otherwise, usecovariance_matrix_inverse()
.- order_algorithm_idxint, optional
An integer specifying which polynomial order validation algorithm to use. The default (1), will always be the more robust of all available options. For further information, please see
check_edges()
.- term_indicesnumpy.ndarray (> max(order) + 1,), optional
A 1-dimensional lookup array for use in determining the correct phi terms to use for a given polynomial order. The order validation algorithm ensures a fit of the requested order is possible. If not, and the orders are equal in all dimensions, it may also optionally return a suggested order. In this case,
order_term_indices
is used to select the correctwindow_phi
andfit_phi
for a given order (k), where terms are extracted viaphi[order_term_indices[k]:order_term_indices[k+1]]
.- derivative_term_mapnumpy.ndarray, optional
A mapping array for the determination of derivatives from the coefficients of the fit, and available terms in “phi”. The shape of the array is (n_dimensions, 3, n_derivative_terms). This is only required if the gradient is required as an output. For a full description of the derivative map, please see
polynomial_derivative_map()
.- derivative_term_indicesnumpy.ndarray (max(order) + 1,), optional
If the fit order is allowed to vary, gives the indices in
derivative_term_map
for a given symmetrical order. The correctderivative_term_map
mapping for order k is given asderivative_term_map[:, :, indices[k]:indices[k + 2]]
.- edge_algorithm_idxint, optional
Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see
check_edges()
for further information. The default (1), is always the most robust of the available algorithms.- edge_thresholdnumpy.ndarray (n_dimensions,)
A threshold parameter determining how close an edge should be to the center of the distribution during
check_edges()
. Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.- minimum_pointsint, optional
Certain order validation algorithms check the number of available samples as a means to determine what order of fit is appropriate. If pre-calculated for the base
order
, it may be passed in here for a slight speed advantage.- get_errorbool, optional
If
True
, return the error on the fit.- get_weightsbool, optional
If
True
, return the sum of all sample weights used in determining the fit.- get_distance_weightsbool, optional
If
True
, return the sum of only the distance weights used in determining the fit.- get_rchi2bool, optional
If
True
, return the reduced chi-squared statistic for each of the fits.- get_cross_derivativesbool, optional
If
True
, return the derivative mean-squared-cross-products of the samples in the fit. Seederivative_mscp()
for further information.- get_offset_variancebool optional
If
True
, return the offset of the fitting point from the sample distribution. Seeoffset_variance()
for further information.
- Returns:
- fit_result8-tuple
- fit_result[0]: Fitted value (float).
Set to
cval
on fit failure.- fit_result[1]: Error on the fit (float).
Set to NaN on fit failure.
- fit_result[2]: Number of samples included in the fit (int).
Set to 0 on fit failure.
- fit_result[3]: Weight sum (float).
Set to 0.0 on fit failure.
- fit_result[4]: Distance weight sum (float).
Set to 0.0 on fit failure.
- fit_result[5]: Reduced chi-squared statistic (float).
Set to NaN on fit failure.
- fit_result[6]: Derivative mean-squared-cross-product (numpy.ndarray).
Set to shape (0, 0) on fit failure, and (n_dimensions, n_dimensions) otherwise.
- fit_result[7]: Offset variance from the distribution center (float).
Set to NaN on fit failure.