shaped_adaptive_weight_matrix

sofia_redux.toolkit.resampling.shaped_adaptive_weight_matrix(sigma, rchi2, gradient_mscp, density=1.0, variance_offset=0.0, fixed=None, tolerance=None)[source]

Shape and scale the weighting kernel based on a prior fit.

In the standard resampling algorithm, a polynomial fit may weight each sample coordinate (\(x\)) according to its distance from the reference position at which the fit is derived (\(x_{ref}\)) such that samples closer to the reference position have more influence on the fit than those that are farther. The weighting function used is:

\[W = exp(-\Delta x^T A^{-1} \Delta x)\]

where \({\Delta x}_k = x_{ref, k} - x_k\) for dimension \(k\), and \(A\) is a symmetric positive definite matrix defining the “shape” of the weighting kernel. This effectively defines a multivariate Gaussian centered on the reference point where overall size, rotation, and stretch of each principle axis may be altered. The goal of shaping the weighting kernel \(A\), is to produce a fit where the reduced chi-squared statistic of the fit equal to one (\(\chi_r^2 = 1\)). To derive the shape \(A\), an initial fit must have first been performed using a square diagonal matrix \(A_0\), whose diagonal elements are the square of the sigma parameter (\(diag(A_0) = 2 \sigma^2\)). It is then easy to define \(A_0^{-1}\) in \(K\) dimensions as:

\[\begin{split}A_0^{-1} = \frac{1}{2} \begin{bmatrix} \frac{1}{\sigma_0^2} & 0 & \dots & 0 \\ 0 & \frac{1}{\sigma_1^2} & \ddots & \vdots \\ \vdots & \ddots & \ddots & 0 \\ 0 & \dots & 0 & \frac{1}{\sigma_K^2} \end{bmatrix}\end{split}\]

Following (or during) the initial fit with \(A_0\), the mean square cross products of the derivatives evaluated at the sample coordinates should be calculated using derivative_mscp(), where the returned matrix has values:

\[\bar{g}_{ij}^2 = \frac{\partial \bar{f}}{\partial X_i} \frac{\partial \bar{f}}{\partial X_j}\]

for dimensions \(X_i\) and \(X_j\) in K-dimensions, where \(\partial \bar{f} / \partial X_i\) is the weighted mean of the partial derivatives over all samples in the fit with respect to \(X_i\), with weighting defined by \(W\) (above) using \(A_0^{-1}\).

The matrix \(\bar{g}^2\) is only used to define the shape of new weighting kernel, not the overall size. Therefore, it is normalized such that \(|\bar{g}^2| = 1\).

We can then use singular value decomposition to factorize \(\bar{g}^2\) into:

\[\bar{g}^2 = U S V^T\]

Here, the matrices \(U\) and \(V^T\) represent rotations (since \(|\bar{g}^2| > 0\)), and the singular values (\(S\)) can be thought of as the magnitudes of the semi-axes of an ellipsoid in K-dimensional space. Naively, this provides us with a basis from which to determine the final “shape” matrix where:

\[A^{-1} = \beta\, \bar{g}^2\]

and \(\beta\) is an as yet undetermined scaling factor representing the overall size of the new kernel. The resulting weighting kernel has the shape of an ellipsoid with the smallest semi-axis oriented parallel to the gradient. In other words, the new weighting will result in a fit that is less sensitive to distant samples in directions where the gradient is high, and more sensitive to distant samples in directions where the gradient is low.

However, we must still keep in mind that our overall goal is to get \(\chi_r^2 \rightarrow 1\) when a fit is performed using the new kernel \(A\). For example, if \(\chi_r=1\) in the initial fit, there is no need to modify the kernel, and if \(\chi_r < 1\), then we do not want to get an even better fit.

Another factor to consider is that we cannot be completely confident in this new shape due to the distribution of samples. If we are fitting at a point that is away from the center of the sample distribution, it is unadvisable to use a highly shaped kernel due to increased uncertainty in the mean partial derivatives. Furthermore, even if we are fitting close to the center of the distribution, that does not mean that the derivative calculations were not skewed by a few nearby samples when fitting in a local depression of the sample density (for example, near the center of a donut-like distribution).

We model our confidence (\(\gamma\)) in the “shape” using a logistic function (see stretch_correction()) that factors in \(\chi_r^2\), a measure of the sample density profile (\(\rho\)) at the fit coordinate, and from the deviation of the fit coordinate from the center of the sample distribution (\(\sigma_d\)):

\[\gamma = \frac{2} {\left( {1 + (2^{exp(\sigma)} - 1)e^{\rho (1 - \chi_r^2)}} \right)^{1/exp(\sigma)}} - 1\]

The density \(\rho\) is calculated using relative_density() which sets \(\rho=1\) when the samples are uniformly distributed, \(\rho > 1\) when the distribution is concentrated on the fit coordinate, and \(0 < \rho < 1\) when the fitting in a local depression of the distribution density. The deviation (\(\sigma_d\)) is calculated using offset_variance().

The confidence parameter has asymptotes at \(\gamma = \pm 1\), is equal to zero at \(\chi_r^2=1\), is positive for \(\chi_r^2 > 1\), and is negative for \(\chi_r^2 < 1\). Also, the magnitude increases with \(\rho\), and decreases with \(\sigma_d\). The final shape matrix is then defined as:

\[A^{-1} = \beta\, U S^{\gamma} V^T\]

Note that as \(\gamma \rightarrow 0\), the kernel approaches that of a spheroid. For \(\chi_r^2 > 1\), the kernel approaches \(\bar{g}^2\). When \(\chi_r^2 < 1\), the kernel effectively rotates so that the fit becomes increasingly sensitive to samples along the direction of the derivative. The overall size of the kernel remains constant since \(|S^{\gamma}| = |\bar{g}^2| = 1\).

Finally, the only remaining factor to calculate is the scaling factor \(\beta\) which is given as:

\[\beta = \left( \frac{\chi_r}{|A_0|} \right)^{1/K}\]

Note that this scaling has the effect of setting

\[\frac{|A_0|}{|A|} = \chi_r\]

The user has the option of fixing the kernel in certain dimensions such that \({A_0}_{k, k} = A_{k, k}\). If this is the case, for any fixed dimension \(k\) we set:

\[ \begin{align}\begin{aligned}{U S^{\gamma} V^T}_{k, k} = 1\\{U S^{\gamma} V^T}_{k, i \neq k}^2 = 0\\{U S^{\gamma} V^T}_{i \neq k, k}^2 = 0\end{aligned}\end{align} \]

meaning the scaling is unaltered for dimensions \(k\), and no rotation will be applied to any other dimension with respect to \(k\). Since the overall size must be controlled through fewer dimensions, \(\beta\) must take the form of a diagonal matrix:

\[diag(\beta)_{i \in fixed} = \frac{1}{2 \sigma_i^2} ,\,\, diag(\beta)_{i \notin fixed} = \left( \frac{\chi_r |A_0|}{|{U S^{\gamma} V^T}|} \prod_{i \in fixed}{2 \sigma_i^2} \right)^{1 / (K - K_{fixed})}\]

Once again \(A^{-1} = \beta\, U S^{\gamma} V^T\).

Parameters:
sigmanumpy.ndarray (n_dimensions,)

The standard deviations of the Gaussian for each dimensional component used for the distance weighting of each sample in the initial fit.

rchi2float

The reduced chi-squared statistic of the fit.

gradient_mscpnumpy.ndarray (n_dimensions, n_dimensions)

An array where gradient_mscp[i, j] = derivative[i] * derivative[j] in dimensions i and j. Please see derivative_mscp() for further information. Must be Hermitian and real-valued (symmetric).

densityfloat, optional

The local relative density of the samples around the fit coordinate. A value of 1 represents uniform distribution. Values greater than 1 indicate clustering around the fitting point, and values less than 1 indicate that samples are sparsely distributed around the fitting point. Please see relative_density() for further information.

variance_offsetfloat, optional

The variance at the fit coordinate determined from the sample coordinate distribution. i.e., if a fit is performed at the center of the sample distribution, the variance is zero. If done at 2-sigma from the sample distribution center, the variance is 4.

fixednumpy.ndarray of bool (n_dimensions,), optional

If supplied, True values indicate that the width of the Gaussian along the corresponding axis should not be altered in the output result.

tolerancefloat, optional

The threshold below which SVD values are considered zero when determining the matrix rank of derivative_mscp. Please see numpy.linalg.matrix_rank() for further information.

Returns:
shape_matrixnumpy.ndarray (n_dimensions, n_dimensions)

A matrix defining the shape of the weighting kernel required by calculate_adaptive_distance_weights_shaped() to create a set of weighting factors.