PolynomialTree

class sofia_redux.toolkit.resampling.PolynomialTree(argument, shape=None, build_type='all', order=None, fix_order=True, leaf_size=40, large_data=False, **distance_kwargs)[source]

Bases: BaseTree

Create a tree structure for use with the resampling algorithm.

The resampling tree is primarily responsible for deriving and storing all independent variables necessary for polynomial fitting, as well as allowing fast access to those variables that belong to coordinates within a certain radius of a given point.

TREE STRUCTURE AND ACCESS

The tree itself is divided into N-dimensional blocks, each of which is allocated a set of coordinates. The width of these blocks should correspond to the window (\(\Omega\)) defined in the resampling algorithm, and coordinates should be scaled accordingly. For example, if the window radius is set to \(\Omega=4\) in (arbitrary) units for the purposes of resampling 1-dimensional data, and the independent values are:

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

They should be supplied to the tree as \(x^\prime = x / \Omega\).

\[x^\prime = \frac{x}{\Omega} = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5]\]

The tree defines blocks by grouping all coordinates with the same floored values into a single block. Therefore, in this case the tree will contain 3 blocks. The first contains [0.25, 0.5, 0.75], the second contains [1, 1.25, 1.5, 1.75], and the third contains [2, 2.25, 2.5].

The reasoning behind the whole tree structure is to allow for easy extraction of all coordinates within range of a user supplied coordinate. This is done in two stages: The first is to find out which block the user supplied coordinate belongs to. We can then quickly narrow down the search by recognizing that coordinates in the tree population inside the window region of the supplied coordinate must either belong to the same block, or to immediately neighboring blocks since each block of the tree is the same width as the window radius.

Once all candidates have been identified, the next step is to keep only those that are within a radius \(\Omega\) of the user supplied coordinate. This can be accomplished quickly using the ball-tree algorithm (see sklearn.neighbors.BallTree().

In practice, the resampling algorithm loops through each block of the tree in parallel. For each block, all user supplied coordinates (points at which a fit is required) within that block, and all tree members within the neighborhood (the block and all adjacent blocks including diagonals) are evaluated in one step by the ball-tree algorithm so that for each point, we quickly get all tree members within that point’s window region.

POLYNOMIAL ORDER AND TERMS

The resampling tree is also responsible for deriving all polynomial terms for the fit as well as mapping these terms to the derivative of the function. Please see power_product(), polynomial_exponents(), and polynomial_derivative_map() for further details. It is also possible to calculate terms for a range of orders in the case where order may vary to accommodate fits where insufficient samples are present. To allow order to vary, set fix_order to False.

Parameters:
argumentnumpy.ndarray (n_features, n_samples) or n-tuple

Either the independent coordinates of samples in n_features-space, or the shape defining the skeleton of the tree.

shapen-tuple, optional

If coordinates were supplied with argument, the shape of the tree to build. Otherwise, the shape will be determined from the coordinate values in each dimension as floor(max(coordinates[i])) + 1 for dimension i.

build_typestr, optional

Must be one of {‘hood’, ‘balltree’, ‘all’, None}. Defines the type of tree structures to create.

orderint or array_like (n_features,), optional

The symmetrical or asymmetrical orders respectively. Symmetrical orders are selected by supplying an integer to this parameter.

fix_orderbool, optional

If order is symmetrical, allow for a varying order in an attempt to pass the order validation algorithm (the order can only decrease).

balltree_metricstr or sklearn.neighbors.DistanceMetric object

The distance metric to use for the tree. Default=’minkowski’ with p=2 (that is, a euclidean metric). See the documentation of the sklearn.neighbors.DistanceMetric() class for a list of available metrics. ball_tree.valid_metrics gives a list of the metrics which are valid for BallTree.

leaf_sizeint, optional

If build_type was set to ‘all’ or ‘balltree’, defines the leaf size of the BallTree. Please see sklearn.neighbors.BallTree() for further details.

large_databool, optional

If True, indicates that this resampling algorithm will run on a large set of data, and the ball tree should be created on subsets of the data.

distance_kwargsdict, optional

Optional keyword arguments passed into sklearn.neighbors.DistanceMetric(). The default is to use the “minkowski” definition with p=2, i.e., the Euclidean definition.

Attributes Summary

derivative_term_map

Returns mapping necessary to describe the 1st derivative of the tree.

exponents

Return the polynomial exponents for the tree.

order

Return the order of the polynomial for the tree.

order_symmetry

Return whether the orders are symmetrical.

order_varies

Return whether the order may be decreased.

phi_terms_precalculated

Return whether the polynomial terms have been calculated.

Methods Summary

block_members(block[, get_locations, get_terms])

Find all members within a single block.

create_phi_terms_for([block])

Create the phi terms on demand.

estimate_max_bytes(coordinates[, window, ...])

Estimate the maximum number of bytes required by the tree.

hood_members(center_block[, get_locations, ...])

Find all members within the neighborhood of a single block.

precalculate_phi_terms()

Calculates polynomial terms for the tree coordinates and order.

set_order(order[, fix_order])

Set the order(s) of polynomial fit for the resampling algorithm tree.

Attributes Documentation

derivative_term_map

Returns mapping necessary to describe the 1st derivative of the tree.

The terms describing the derivative of a polynomial function should all be present in the exponent terms of the tree, but need to be reordered in a specific way and may also contain an additional coefficient. Once the coefficients of the fit are known (c), the derivative of feature k may be be calculated by:

sum(h[k, 0] * c[h[k, 1]] * p[h[k, 2], k])

where p are the exponent terms, and h is the map returned by this property.

Returns:
mappingnumpy.ndarray (int)

An array of shape (n_features, 3, n_terms).

exponents

Return the polynomial exponents for the tree.

The polynomial exponents describe how the polynomial terms should be calculated. For example, the exponents [[0, 0], [1, 0], [2, 0], [0, 1], [1, 1], [0, 2]] represent the equation:

f(x, y) = c0 + c1.x + c2.x^2 + c3.y + c4.xy + c5.y^2

Returns:
numpy.ndarray (int)

An array of shape (n_exponents, n_features)

order

Return the order of the polynomial for the tree.

Returns:
int or numpy.ndarray (int)

A scalar value if orders for each feature are identical, or an array of values of shape (n_features,) if values are not identical.

order_symmetry

Return whether the orders are symmetrical.

Orders are considered symmetrical when the orders of each feature are identical.

Returns:
bool
order_varies

Return whether the order may be decreased.

The order of the polynomial may be reduced if specifically set by the user (fix_order=False) and the order is symmetrical. In this case, a subset of polynomial terms can be extracted from the original terms to describe a lower order polynomial.

Returns:
bool
phi_terms_precalculated

Return whether the polynomial terms have been calculated.

Returns:
bool

Methods Documentation

block_members(block, get_locations=False, get_terms=False)[source]

Find all members within a single block.

Parameters:
blockint

The index of the block.

get_locationsbool, optional

If True, return the coordinates of the hood members.

get_termsbool, optional

If True, return the calculated “phi” terms of hood members. Note that a polynomial order must have been set, and terms calculated. See PolynomialTree.set_order(), and PolynomialTree.precalculate_phi_terms() for further information.

Returns:
members, [coordinates, terms]

members is a numpy.ndarray of int and shape (found_members,). If get_locations was set to True, coordinates is of shape (n_features, found_members). If get_terms was set to True, terms is of shape (n_terms, found_members).

create_phi_terms_for(block=None)[source]

Create the phi terms on demand.

If block is None, creates the phi terms over all samples. Otherwise, creates phi terms for all samples in the neighborhood of block.

Parameters:
blockint, optional

The block for which to create the phi terms. If None is supplied, will calculate for all coordinates.

Returns:
hood_membersnumpy.ndarray (int) or slice or None

The samples within the neighborhood of block, all indices or None if not calculated.

classmethod estimate_max_bytes(coordinates, window=1, leaf_size=40, full_tree=True, order=2)[source]

Estimate the maximum number of bytes required by the tree.

This includes pre-calculation of the phi terms for polynomial representation.

Parameters:
coordinatesnumpy.ndarray

The coordinates for the tree of shape (n_dimensions, n)

windowint or float, optional

The size of the tree windows

leaf_sizeint, optional

The number of leaves used to construct the ball-tree

full_treebool, optional

Calculate the maximum number of bytes if the full ball-tree is pre-calculated. Otherwise, calculates the size of a single neighborhood sized ball-tree.

orderint or numpy.ndarray (int), optional

The order of polynomial to fit as an integer for all dimensions or an array of shape (n_dimensions,).

Returns:
max_sizeint

An upper limit for the maximum number of bytes used to construct the tree.

hood_members(center_block, get_locations=False, get_terms=False)[source]

Find all members within the neighborhood of a single block.

Parameters:
center_blockint

The index of the block at the center of the neighborhood.

get_locationsbool, optional

If True, return the coordinates of the hood members.

get_termsbool, optional

If True, return the calculated “phi” terms of hood members. Note that a polynomial order must have been set, and terms calculated. See PolynomialTree.set_order(), and PolynomialTree.precalculate_phi_terms() for further information.

Returns:
members, [coordinates, terms]

members is a numpy.ndarray of int and shape (found_members,). If get_locations was set to True, coordinates is of shape (n_features, found_members). If get_terms was set to True, terms is of shape (n_terms, found_members).

precalculate_phi_terms()[source]

Calculates polynomial terms for the tree coordinates and order.

Calculates the \(\Phi\) terms for the equation:

\[f(\Phi) = \hat{c} \cdot \Phi\]

The polynomial terms are dependent on the tree coordinates, and the order of polynomial fit. Please see polynomial_exponents() for a description on how terms are derived.

If the order is not fixed (i.e., is allowed to vary as marked by the order_varies attribute), a set of terms will be derived for all orders up to the order defined by the user. Please note that orders may only vary if the order is “symmetrical” as defined in PolynomialTree.set_order. In addition, a mapping relating terms to derivatives of the fit is also created in the derivative_term_map attribute, details of which can be found in polynomial_derivative_map(). When the order does vary, the terms for order k are given as:

tree.phi_terms[tree.order_term_indices[k]:

tree.order_term_indices[k+1]]

Returns:
None
set_order(order, fix_order=True)[source]

Set the order(s) of polynomial fit for the resampling algorithm tree.

In addition to declaring the desired order of polynomial fit, the user may also opt to select an algorithm to validate the desired order of fit is possible given a distribution of samples. Please see resample_utils.check_orders() for a full description of available algorithms.

Throughout the resampling algorithm, orders are classified as either “symmetrical” or “asymmetrical”. An order is termed as symmetrical if it is the same for all features of the polynomial fit. i.e., the maximum exponent for each feature in a polynomial equation will be equal to the order. For example, a 2-dimensional 2nd order polynomial is of the form:

\[f(x, y) = c_0 + c_1 x + c_2 x^2 + c_3 y + c_4 x y + c_5 y^2\]

where the maximum exponent for \(x\) and \(y\) in any term is 2.

An order is considered “asymmetrical” if orders are not equal across features or dimensions. For example, a 2-dimensional polynomial with an order of 1 in \(x\), and 2 in \(y\) would have the form:

\[f(x, y) = c_0 + c_1 x + c_2 y + c_3 x y + c_4 y^2\]

where the maximum exponent of \(x\) is 1, and the maximum exponent of \(y\) is 2.

If orders are symmetrical, the user can allow the order to vary such that it passes the order validation algorithm. For instance, if only 2 samples exist for a 1-dimensional polynomial fit, and the user requested a 2nd order polynomial fit, the resampling algorithm will lower the order to 1, so that a fit can be performed (if the algorithm worked by counting the number of available samples). If orders are asymmetrical, this option is not available and fits will be aborted if they fail the order validation algorithm.

Parameters:
orderint or array_like (n_features,)

The symmetrical or asymmetrical orders respectively.

fix_orderbool, optional

If order is symmetrical, allow for a varying order in an attempt to pass the order validation algorithm (the order can only decrease).

Returns:
None