fitpeaks1d

sofia_redux.toolkit.fitting.fitpeaks1d.fitpeaks1d(x, y, npeaks=1, xrange=None, min_width=(None, None), peak_class=<class 'astropy.modeling.functional_models.Gaussian1D'> Name: Gaussian1D N_inputs: 1 N_outputs: 1 Fittable parameters: ('amplitude', 'mean', 'stddev'), box_class=<class 'astropy.modeling.functional_models.Box1D'> Name: Box1D N_inputs: 1 N_outputs: 1 Fittable parameters: ('amplitude', 'x_0', 'width'), box_width=(None, 6), background_class=<class 'astropy.modeling.functional_models.Const1D'> Name: Const1D N_inputs: 1 N_outputs: 1 Fittable parameters: ('amplitude', ), binit=None, bg_args=None, bg_kwargs=None, fitting_class=<class 'astropy.modeling.fitting.LevMarLSQFitter'>, maxiter=None, outlier_func=<function robust_masking>, robust=None, outlier_iter=3, baseline_func=<function medabs_baseline>, guess=None, guess_func=<function guess_xy_mad>, optional_func=None, fitter_kwargs=None, fitopts=None, search_kwargs=None, **kwargs)[source]

Fit peaks (and optionally background) to a 1D set of data.

The user may fit for 1 or multiple peaks in the input data using using an array of different peak models and also fit for a model of the background. There are a large number of standard models available for both, or the user may define their own models if necessary. By default, we fit for Gaussian peaks and a constant background.

The intention of fitpeaks1d is to provide a highly customizable peak fitting utility for one dimensional data. The basic procedure is split into two phases: the search phase and the refinement phase.

During the search phase an attempt is made to get a good first estimate of where the most prominent peaks lie in the data along with estimates of their properties. Initial positions can be supplied by the user via guess, but may also be derived via the guess_func algorithm which is also definable by the user. By default, candidate positions are determined as max(abs(y - median(y))). The basic procedure is as follows:

  1. Remove a baseline from the y data via the baseline_func function. At this early stage, we are only really interested in peak characteristics or even just where they kind-of are. The default baseline removal function is to subtract the median although it is completely reasonable to use many other methods depending on your data. This baseline_func may also transform the initial data set to something a little more suitable for finding peaks. Users may wish to smooth or filter the data, or perform some other kind of transform. By default the data is set to y = abs(y - median(y)) as we do not know if we’re searching for positive or negative peaks.

  2. Guess where peaks may lie along the x-axis. These estimates can be provided directly by the user, or derived via the guess_func algorithm. This algorithm should take the (x, y) data set and return a single (x0, y0) coordinate giving the x position and amplitude of the most prominent peak candidate in the data. In the default case this is simply y0 = max(y) and x0 = where(y0).

  3. Fit a peak using (x0, y0) as an initial condition. Store the peak properties and then subtract a model of the peak from the data and go back to step 2. Only break this loop once the cycle has completed npeaks times. This will hopefully yield the npeaks most prominent peaks in the data. After fitting we should have a more exact (x1, y1) position for each peak.

  4. Take the x1 position of each peak derived in step 3 and interpolate it back onto the original data set before we changed it in step 1. Subtract the derived baseline from this value and we should then have a good estimate of the (x, y) position for each peak in relation to the real dataset.

  5. If the user wishes to fit a background to the data, we get an estimate of the background fit parameters by subtracting models of each peak from the original dataset and then fitting the desired background model to the residual.

During the refinement phase, a final model is created from all initial peak fits and an optional background. If a background is requested, an initial estimate of its parameters is derived by fitting the model on y - fit(peak_estimates).

Parameters:
x: array_like of float

(N,) array of independent values.

yarray_like of float

(N,) array of independent values.

npeaksint, optional

The number of peaks to find. Set to None to find a single (most prominent) peak. Peaks are returned in the order most prominent -> least prominent.

xrange2-tuple of (int or float), optional

The (minimum, maximum) range of values in x where a fitted peak will be valid. Set to None to use the full range of valid (finite) data.

peak_classastropy.modeling.Fittable1DModel, optional

Class of peak model. The default is astropy.modeling.models.Gaussian1D.

box_widthfloat or int or (2-tuple of (str, float)), optional

If int or float, specifies the width of box_model. If specifed as a 2-tuple of (str, float), the first specifies a parameter of the model that will be taken and multiplied by the second element of the 2-tuple (float) to define the width of the box. For example (‘stddev’, 3) for the Gaussian1D model would limit the range of the model to 3 * FWHM. This is highly recommended when fitting continuous or wide functions. Note that this will not be applied when the refined model is fitted in parallel, since that kind of negates the whole point of fitting in parallel in the first place.

min_width2-tuple of (str or None, float or None), optional

Set a lower limit on the width of each peak. Set to None for no lower limit. min_width[0] is the name of the parameter in the peak model governing width (str). Set this to None to autodetect the parameter name if using a standard model. min_width[1] is the value to set. If set to None, half of the median separation of sorted unique x samples will be used.

box_classastropy.modeling.Fittable1DModel, optional

Class for filtering the peak model. Will be applied as peak_model * box_model. This is only applied during the initial search stage. The width of the box is defined by max_region. The default is astropy.modeling.models.Box1D.

background_classastropy.modeling.Fittable1DModel, optional

Class for describing the background. Will be applied as peak_model + background in the final refined fitting. Note that it will only be fitted to the area of interest around each peak (max_region). Disable background fitting by supplying None. The default is to fit a constant background using astropy.modeling.models.Const1D. NOTE: you can supply a model instance rather than class in order to fit polynomial backgrounds as the number of polynomial parameters are dependent on the polynomial order. For example, in order to fit a 3rd order polynomial, set background_class=Polynomial1D(3).

binitarray_like of float, optional

A user guess at background parameters

bg_argstuple, optional

If the background class requires arguments, they should be supplied here as a tuple.

bg_kwargsdict, optional

Optional keyword arguments to pass into the background model during initialization.

fitting_classclass of fitting object

Typically a “solver” from scipy.optimize or astropy.modeling that when instatiated will create a solver such that:

fitted_model = solver(model, x, y)

maxiterint, optional

Maximum number of iterations for the solver to attempt before quitting. Set to None to use the default for each solver class.

outlier_funcfunction, optional

A function of the form data_out = outlier_func(data_in). You may do anything at all to the data such as clipping, but perhaps the simplest option is to convert the data to a np.ma.MaskedArray and set the mask (True=bad) while returning the original data untouched. The default is to use robust_masking which is a basic wrapper for mc.find_outliers.

robustdict or anything, optional

Supplying anything other than None will turn on outlier rejection for the solver. If robust is a dict, then it should contain optional keyword arguments that would be supplied to outlier_func'. For example, the default `robust_masking outlier rejection function takes the threshold keyword. To turn on outlier rejection you could just set robust=True, but if you wanted to change threshold to 10, then set robust={‘threshold’: 10}. Setting robust=None turns off outlier rejection (default).

outlier_iterint, optional

The maximum number of iterations for the solver when rejecting outlers. Defaults to 3.

guessarray_like of float, optional

An array where each element gives a guess at an initial x position for the peak. If there are less guesses than npeaks, guess_func will be used after the inital guesses have been taken. If there are more guesses than npeaks, then only guess[:npeaks] will be used.

baseline_funcfunction, optional

A function of the form used during the initial search phase:

ysearch, baseline = baseline_func(x, y)

This function should remove a baseline from the data so that we mitigate effects from the background as much as possible while keeping the structure of the peaks. In the default case, the baseline is taken as the median in y. This baseline should be subtracted from y and then the user may transform y as they see fit to allow guess_func and the solver to get the most reliable guess possible at each peaks model parameters. The default baseline_func returns ysearch as ysearch = abs(y - median(y)). Note that the most important parameter to guess at during this stage is the x center position of the peak. Amplitude is not important, but we do want to get fairly accurate representations of the rest of the fit parameters such as width or FWHM.

guess_funcfunction

A function of the form:

(x0, y0) = guess_func(x, y_modified)

(x0, y0) is the peak (x_center, y_amplitude) of the most prominent peak in the data after baseline_func has been applied. These results are used to initialize peak parameters during the search phase. The default function returns the (x, y) coordinate of the datum that satisfies y0 = max(abs(y - median(y))).

optional_funcfunction, optional

An optional function applied to the data prior to the final fit. If fitting a background, then no attempt should be made to remove the background unless it is being accounted for in some way. If you are not fitting a background then feel free to go to town on removing said background. Preserve whatever properties of the peak you wish to have appear in the output results. Good choices may involve filtering the data. The default is to do nothing and fit the final model on the original data.

fitter_kwargsdict, optional

Additional keywords to pass into the solver during initialization.

fitoptsdict, optional

Optional keywords to pass into solver at runtime.

search_kwargsdict, optional

Additional keyword arguments to supply during model instantiation during the initial search stage (peak * optional(box)). Note that the model parameter names are suffixed with “_0” and box parameter names are suffixed with “_1”, but only if a box is being applied. So for example, to say that you want parameter “p” of the model to remain fixed:

kwargs = {“fixed”: {“p_0”: True}}

As another example, to set parameter “a” of the box equal to parameter “b” of the model:

kwargs = {“tied”: {“b_1”: lamba x: getattr(x, “a_0”)}}

There are lots of other things you could do here, and I’ve tried to include as many hooks as possible. You don’t even need to treat this thing as a box function. It could be any kind of filtering function etc. and you could use kwargs to supply it with the necessary parameters or behaviours as you wish.

kwargsdict, optional

Additional keyword arguments to supply during model instantation during the final model refinement stage. Exactly the same type of thing as search_kwargs. However, the following suffix rules apply for parallel fitting (default):

  1. If fitting only a single peak and no background, do not suffix any parameters. “amplitude” remains “amplitude”.

  2. If fitting multiple peaks, begin suffixing at _0 for the first peak upto _(npeaks-1) for the last. i.e. the amplitude of the 5th peak should be referred to as “amplitude_4”.

  3. If fitting a single peak with a background then peak parameters should be suffixed with “_0” and background parameters with “_1”.

  4. If fitting multiple peaks with a single background then peak parameters should be suffixed according to step 2, while background parameters should be suffixed with _(npeaks). i.e. if there were 5 peaks, the background “intercept” for a linear background fit should be referred to as “intercept_5”.

and for serial fitting:

  1. If fitting a single peak with no background, do no suffix any parameters. “amplitude” remains “amplitude”.

  2. If fitting one or more peaks with a background then there is no distinction available between peaks. Each peak parameter rule must be applied to all peaks. For example, setting {‘bounds’: {‘amplitude_0’: (0, 5)} will limit the amplitude of all peaks to within the range 0->5. The background parameters are suffixed with “_1”.

Returns:
modelastropy.modeling.Fittable1DModel instance

The final fitted model. If more than one peak or a background was fitted for, then the returned model will be a compound model. All peaks are located in model[:npeaks]. If a background was included it can be found at model[npeaks]. If a single peak and no background was fitted, then an instance of the peak_class will be returned containing the fit parameters. No indexing will be possible.

Examples

Create compound peaks on a sloped background >>> import numpy as np >>> from astropy.modeling import models >>> from sofia_redux.toolkit.fitting.fitpeaks1d import fitpeaks1d >>> x = np.linspace(0, 10, 1001) >>> model1 = models.Gaussian1D(amplitude=3, mean=5, stddev=0.5) >>> model2 = models.Gaussian1D(amplitude=4, mean=4, stddev=0.2) >>> y = model1(x) + model2(x) + 0.05 * x + 100 >>> rand = np.random.RandomState(42) >>> y += rand.normal(0, 0.01, y.size) >>> fit = fitpeaks1d(x, y, background_class=models.Linear1D, … npeaks=2, min_width=0.1) >>> rmse = np.sqrt(np.mean(fit.fit_info[‘fvec’] ** 2)) >>> assert np.allclose(fit[0].parameters, [3, 5, 0.5], atol=0.02) # peak 1 >>> assert np.allclose(fit[1].parameters, [4, 4, 0.2], atol=0.02) # peak 2 >>> assert np.allclose(fit[2].parameters, [0.1, 100], atol=0.2) # bg >>> assert np.isclose(rmse, 0.01, rtol=1) # residuals