solve_fit

sofia_redux.toolkit.resampling.solve_fit(window_coordinates, window_phi, window_values, window_error, window_mask, window_distance_weights, fit_coordinate, fit_phi, order, is_covar=False, fit_threshold=0.0, mean_fit=False, cval=nan, error_weighting=True, estimate_covariance=False, order_algorithm_idx=1, term_indices=None, derivative_term_map=None, derivative_term_indices=None, edge_algorithm_idx=1, edge_threshold=None, minimum_points=None, get_error=True, get_weights=True, get_distance_weights=True, get_rchi2=True, get_cross_derivatives=True, get_offset_variance=True)[source]

Solve for a fit at a single coordinate.

Solves a polynomial fit of the form:

\[f(\Phi) = \hat{c} \cdot \Phi\]

where \(\hat{c}\) are the derived polynomial coefficients for the \(\Phi\) terms. The \(\Phi\) terms are derived from the independent values of the samples within the window region of the fit coordinate, and from the fit coordinates themselves (see polynomial_terms() for further details).

The \(\Phi\) terms are pre-calculated early in the resampling algorithm as this is a relatively cheap calculation, and we do not want to repeat the same calculation multiple times. For example, if sample[1] is within the window region of point[1] and point[2], there should be no need to repeat the polynomial term calculation twice. Initially, one might think that the actual coordinates could then be discarded, but there are a number of calculations that depend on the sample coordinates relative to the fitting points, which must therefore be dealt with “on-the-fly”.

EDGE CHECKING

The first of the on-the-fly calculation is the “edge check”. Generally, polynomial fits are not well-defined away from the sample distribution from which they were derived. This is especially true for higher order fits that may fit the sample distribution well, but start to deviate wildly when evaluated outside of the distribution. The edge check step defines a border around the distribution, outside of which the fit will be aborted. There are a number of algorithms available which vary in robustness and speed. Please see check_edges() for details on available algorithms.

ORDER CHECKING

The next step is to determine if it is possible to perform a polynomial fit of the given order. For example, a 1-dimensional 2nd order polynomial fit can only be derived from a minimum of 3 samples. Additionally, if some samples share the same coordinate, the fit becomes underdetermined, or if dealing with multidimensional data, one needs to ensure that the samples are not colinear. If we also wish to propagate or derive valid errors, we should ensure the system is overdetermined. There are a number of order checking algorithms which vary in robustness and speed. Please see check_orders() for details on the available algorithms.

There are two available actions if the samples fail the order check. The first (default) is to simply abort fitting. The second option is to lower the order of fit until the samples meet the order check requirements. This is only possible if the fit order is equal across all dimensions. To allow for variable orders, set the order_term_indices (see parameter descriptions) to a valid value, and update window_phi and fit_phi accordingly.

FITTING

If the above checks pass, a fit can be attempted. There are actually three types of fit that may be performed. The first is the standard polynomial fit described above. The second is a weighted mean which may explicitly be performed by setting mean_fit to True, or may be performed on-the-fly if the order was lowered to zero during the order check. Finally, if is_covar was set to True, the window_values are considered covariances to propagate, and a fit will derived by propagating a weighted variance (this was created for the SOFIA HAWC+ pipeline).

FINAL VALIDATION

If a polynomial fit was performed, a final check may be performed to confirm that the solution does not deviate to significantly from the expected values. This is done by evaluating the reduced chi-squared statistic of the fit (\(\chi_r^2\)). If \(\sqrt{\chi_r^2} > | \text{fit\_threshold} |\), the fit is not accepted, and is aborted if fit_threshold < 0, or set to the weighted mean of the samples if fit_threshold > 0. No validation will be performed if fit_threshold is set to zero (default). Note that window_error must be supplied in order for validation to be meaningful.

Parameters:
window_coordinatesnumpy.ndarray (n_dimensions, n_samples)

The independent coordinates within the window region of the fitting coordinate.

window_phinumpy.ndarray (n_terms, n_samples)

The polynomial terms of window_coordinates. Please see polynomial_terms() for further details.

window_valuesnumpy.ndarray (n_samples,)

The dependent values of the samples.

window_errornumpy.ndarray (n_samples,)

The associated 1-sigma error values for each sample in each set. The user may also supply an array of shape (1,) in which case all samples in a set will share the same associated error value. If the shape is set to (0,) this indicates that no error values are available for the samples.

window_masknumpy.ndarray (n_samples,)

A mask where False indicates that the associated sample should be excluded from the fit.

window_distance_weightsnumpy.ndarray (n_samples,)

The distance weighting factors applied to each sample in the fit.

fit_coordinatenumpy.ndarray (n_dimensions,)

The coordinate of the fitting point.

fit_phinumpy.ndarray (n_terms,)

The polynomial of fit_coordinate. Please see polynomial_terms() for further details.

ordernumpy.ndarray

The desired order of the fit as a (1,) or (n_dimensions,) array. If only a single value is supplied, it will be applied over all dimensions.

is_covarbool, optional

If True, indicates that window_values contains covariance values that should be propagated through algorithm. If this is the case, polynomial fitting is disabled, and a weighted variance is calculated instead.

fit_thresholdfloat, optional

If fit_threshold is non-zero, perform a check on the goodness of the fit. When the reduced-chi statistic is greater than abs(fit_threshold), the fit is determined to be a failure, and a replacement value is used. If fit_threshold < 0, failed fit values will be set to cval. If fit_threshold > 0, failed fit values will be replaced by the weighted mean.

mean_fitbool, optional

If True, a weighted mean is performed instead of calculating a polynomial fit.

cvalfloat, optional

In a case that a fit is unable to be calculated at certain location, cval determines the returned fit value.

error_weightingbool, optional

If True, weight the samples in the fit by the inverse variance (1 / window_error^2) in addition to distance weighting.

estimate_covariancebool, optional

If True, when determining the error on the fit and reduced chi-squared, calculate the covariance of the fit coefficients using estimated_covariance_matrix_inverse(). Otherwise, use covariance_matrix_inverse().

order_algorithm_idxint, optional

An integer specifying which polynomial order validation algorithm to use. The default (1), will always be the more robust of all available options. For further information, please see check_edges().

term_indicesnumpy.ndarray (> max(order) + 1,), optional

A 1-dimensional lookup array for use in determining the correct phi terms to use for a given polynomial order. The order validation algorithm ensures a fit of the requested order is possible. If not, and the orders are equal in all dimensions, it may also optionally return a suggested order. In this case, order_term_indices is used to select the correct window_phi and fit_phi for a given order (k), where terms are extracted via phi[order_term_indices[k]:order_term_indices[k+1]].

derivative_term_mapnumpy.ndarray, optional

A mapping array for the determination of derivatives from the coefficients of the fit, and available terms in “phi”. The shape of the array is (n_dimensions, 3, n_derivative_terms). This is only required if the gradient is required as an output. For a full description of the derivative map, please see polynomial_derivative_map().

derivative_term_indicesnumpy.ndarray (max(order) + 1,), optional

If the fit order is allowed to vary, gives the indices in derivative_term_map for a given symmetrical order. The correct derivative_term_map mapping for order k is given as derivative_term_map[:, :, indices[k]:indices[k + 2]].

edge_algorithm_idxint, optional

Integer specifying the algorithm used to determine whether a fit should be attempted with respect to the sample distribution. Please see check_edges() for further information. The default (1), is always the most robust of the available algorithms.

edge_thresholdnumpy.ndarray (n_dimensions,)

A threshold parameter determining how close an edge should be to the center of the distribution during check_edges(). Higher values result in an edge closer to the sample mean. A value should be provided for each dimension. A zero value in any dimension will result in an infinite edge for that dimension.

minimum_pointsint, optional

Certain order validation algorithms check the number of available samples as a means to determine what order of fit is appropriate. If pre-calculated for the base order, it may be passed in here for a slight speed advantage.

get_errorbool, optional

If True, return the error on the fit.

get_weightsbool, optional

If True, return the sum of all sample weights used in determining the fit.

get_distance_weightsbool, optional

If True, return the sum of only the distance weights used in determining the fit.

get_rchi2bool, optional

If True, return the reduced chi-squared statistic for each of the fits.

get_cross_derivativesbool, optional

If True, return the derivative mean-squared-cross-products of the samples in the fit. See derivative_mscp() for further information.

get_offset_variancebool optional

If True, return the offset of the fitting point from the sample distribution. See offset_variance() for further information.

Returns:
fit_result8-tuple
fit_result[0]: Fitted value (float).

Set to cval on fit failure.

fit_result[1]: Error on the fit (float).

Set to NaN on fit failure.

fit_result[2]: Number of samples included in the fit (int).

Set to 0 on fit failure.

fit_result[3]: Weight sum (float).

Set to 0.0 on fit failure.

fit_result[4]: Distance weight sum (float).

Set to 0.0 on fit failure.

fit_result[5]: Reduced chi-squared statistic (float).

Set to NaN on fit failure.

fit_result[6]: Derivative mean-squared-cross-product (numpy.ndarray).

Set to shape (0, 0) on fit failure, and (n_dimensions, n_dimensions) otherwise.

fit_result[7]: Offset variance from the distribution center (float).

Set to NaN on fit failure.