grig.toolkit package¶
Subpackages¶
- grig.toolkit.splines package
- Submodules
- grig.toolkit.splines.spline module
- grig.toolkit.splines.spline_utils module
add_knot()
back_substitute()
build_observation()
calculate_minimum_bandwidth()
check_input_arrays()
create_ordering()
determine_smoothing_spline()
discontinuity_jumps()
evaluate_bspline()
find_knot()
find_knots()
fit_point()
flat_index_mapping()
givens_parameters()
givens_rotate()
knot_fit()
perform_fit()
rational_interp_zero()
single_fit()
solve_observation()
solve_rank_deficiency()
- Submodules
Submodules¶
grig.toolkit.func module¶
- grig.toolkit.func.byte_size_of_object(obj)[source]¶
Return the size of a Python object in bytes.
- Parameters:
- objobject
- Returns:
- byte_sizeint
- grig.toolkit.func.julia_fractal(sy, sx, c0=-0.4, c1=0.6, iterations=256, xrange=(-1, 1), yrange=(-1, 1), normalize=True)[source]¶
Generate a 2-D Julia fractal image
- Parameters:
- syint
y dimension size.
- sxint
x dimension size.
- c0float, optional
The c0 coefficient.
- c1float, optional
The c1 coefficient.
- iterationsint, optional
The number of steps.
- xrangearray_like of int or float, optional
The range of x values.
- yrangearray_like of int or float, optional
The range of y values.
- normalizebool, optional
- Returns:
- grig.toolkit.func.moments(data, threshold=None, mask=None, axis=None, get_mask=False)[source]¶
Computes statistics on a data set avoiding deviant points if requested
Moments are calculated for a given set of data. If a value is passed to threshold, then the dataset is searched for outliers. A data point is identified as an outlier if abs(x_i - x_med)/MAD > threshold, where x_med is the median, MAD is the median absolute deviation defined as 1.482 * median(abs(x_i - x_med)).
- Parameters:
- dataarray_like of float
(shape1) Data on which to calculate moments
- maskarray_like of bool
(shape1) Mask to apply to data
- thresholdfloat, optional
Sigma threshold over which values are identified as outliers
- axisint, optional
Axis over which to calculate statistics
- get_maskbool, optional
If True, only return the output mask
- Returns:
- dict or numpy.ndarray
If get_mask is False, returns a dictionary containing the following statistics: mean, var, stddev, skew, kurt, stderr, mask. Otherwise, returns the output mask.
- grig.toolkit.func.robust_mask(data, threshold, mask=None, axis=None, mask_data=False, cval=nan)[source]¶
Computes a mask derived from data Median Absolute Deviation (MAD).
Calculates a robust mask based on the input data and optional input mask. If \(threshold > 0\), the dataset is searched for outliers. Outliers are identified for point \(i\) if
\[\frac{|y_i - median[y]|}{MAD} > threshold\]where \(MAD\) is the Median Absolute Deviation defined as
\[MAD = 1.482 * median[|y_i - median[y]|]\]- Parameters:
- dataarray_like of float
The data on which to derive a robust mask.
- thresholdfloat
Threshold as described above.
- maskarray_like of bool, optional
If supplied, must be the same shape as data. Any masked (False) data values will not be included in the \(MAD\) calculation. Additionally, masked elements will also be masked (False) in the output mask.
- axisint, optional
Axis over which to calculate the \(MAD\). The default (None) derives the \(MAD\) from the entire set of data.
- mask_databool, optional
If True, return a copy of data with masked values replaced by cval in addition to the output mask. The default is False. Note that the output type will
- cvalint or float, optional
if mask_data is set to True, masked values will be replaced by cval. The default is numpy.nan.
- Returns:
- numpy.ndarray of bool, [numpy.ndarray of numpy.float64]
The output mask where False indicates a masked value, while True indicates that associated data deviation is below the threshold limit. If mask_data was True, also returns a copy of data with masked values replaced by cval.
- grig.toolkit.func.slicer(array, axis, index, ind=False)[source]¶
Returns a slice of an array in arbitrary dimension.
- Parameters:
- arraynumpy.ndarray
array to slice
- axisint or array_like
axis to slice on
- indexint or array_like of int
index retrieved
- indbool, optional
If True, return the slices rather than sliced array
- Returns:
- numpy.ndarray or tuple of slice
grig.toolkit.multiproc module¶
- class grig.toolkit.multiproc.MultitaskHandler[source]¶
Bases:
Handler
A log handler for multitask.
- Attributes:
- name
Methods
acquire
()Acquire the I/O thread lock.
addFilter
(filter)Add the specified filter to this handler.
close
()Tidy up any resources used by the handler.
createLock
()Acquire a thread lock for serializing access to the underlying I/O.
emit
(record)Emit a log record.
filter
(record)Determine if a record is loggable by consulting all the filters.
flush
()Ensure all logging output has been flushed.
format
(record)Format the specified record.
handle
(record)Conditionally emit the specified logging record.
handleError
(record)Handle errors which occur during an emit() call.
release
()Release the I/O thread lock.
removeFilter
(filter)Remove the specified filter from this handler.
Re-order the records in a sensible order.
setFormatter
(fmt)Set the formatter for this handler.
setLevel
(level)Set the logging level of this handler.
get_name
set_name
- emit(record)[source]¶
Emit a log record.
Stores the record in the lookup dictionary for the given process/thread. Each message is stored in the received order for later retrieval once whatever multiprocessing job is complete.
- Parameters:
- recordlogging.LogRecord
The record to emit.
- Returns:
- None
- reorder_records()[source]¶
Re-order the records in a sensible order.
The records are sorted by process and then thread in chronological order. I.e., records are grouped by processes, starting with the first process that appears in the logs and then within that process group, a similar grouping is performed for each child thread. Each process-thread grouping will contain a list of log records in the order that they where emitted.
Note that each record is a tuple of the form (time, process, thread, log_record).
- Returns:
- None
- grig.toolkit.multiproc.get_core_number(cores=True)[source]¶
Returns the maximum number of CPU cores available
- Parameters:
- coresbool or int, optional
If False, returns 1. If True, returns the maximum number of cores available. An integer specifies an upper limit on the maximum number of cores to return
- Returns:
- coresint
The maximum number of cores to use for parallel processing
- grig.toolkit.multiproc.in_main_thread()[source]¶
Return whether the process is running in the main thread.
- Returns:
- main_thread: bool
True if this process is running in the main thread, and False if it is running in a child process.
- grig.toolkit.multiproc.log_for_multitask(logger)[source]¶
Context manager to output log messages during multiprocessing.
Stores all log messages during multiprocessing, and emits them using the given logger once complete.
- Parameters:
- loggerlogging.Logger
- Yields:
- None
- grig.toolkit.multiproc.log_records_to_pickle_file(logger, pickle_file)[source]¶
Store the log records in a pickle file rather than emitting.
- Parameters:
- loggerlogging.Logger
- pickle_filestr
The path to the pickle file that will contain the log records.
- Yields:
- None
- grig.toolkit.multiproc.log_with_multi_handler(logger)[source]¶
Context manager to temporarily log messages for unique processes/threads
Temporarily disables all log handlers and outputs the results to a dictionary of the form {(process, thread): list(records)} where process is returned by
multiprocessing.current_process()
and thread is returned bythreading.current_thread()
.- Parameters:
- loggerlogging.Logger
- Yields:
- multi_handlerMultitaskHandler
- grig.toolkit.multiproc.multitask(func, iterable, args, kwargs, jobs=None, skip=None, max_nbytes='1M', force_threading=False, force_processes=False, logger=None)[source]¶
Process a series of tasks in serial, or in parallel using joblib.
multitask is used to run a function multiple times on a series of arguments. Tasks may be run in series (default), or in parallel using multi-processing via the joblib package.
If an error is encountered while attempting to process in parallel with joblib, an attempt will be made to process the tasks in series.
The function to process multiple times (func) must take one of the following forms:
1. result[i] = func(args, iterable[i]) 2. result[i] = func(args, kwargs, iterable[i])
Here, the above “args” is the same as the args argument. i.e., the full argument list. Setting the argument kwargs to None implies that func takes form 1, while anything else sets multitask to assume the function is of form 2.
Since this is a non-standard method of specifying a function, it is highly likely the user will have to define their own func. For example, to use multitask to add ten to a series of numbers:
>>> from grig.toolkit.multiproc import multitask >>> numbers = list(range(5)) >>> numbers [0, 1, 2, 3, 4] >>> def add_ten(args, i): ... return args[i] + 10 >>> multitask(add_ten, range(len(numbers)), numbers, None) [10, 11, 12, 13, 14]
In the above example, iterable has been set to range(len(numbers)) indicating that multitask should supply i to add_ten multiple times (0, 1, 2, 3, 4). Note that kwargs is explicitly set to None indicating that add_ten is of form 1. While multitask may seem like overkill in this example, it can be highly adaptable for complex procedures.
The skip parameter can be used to skip processing of certain tasks. For example:
>>> skip = [False] * len(numbers) >>> skip[2] = True >>> multitask(add_ten, range(len(numbers)), numbers, None, skip=skip) [10, 11, 13, 14]
By default, parallelization is managed with the loky backend. If calling code is known to perform better with the threading backend, it should be called within the joblib parallel_backend context manager:
>>> from joblib import parallel_backend >>> with parallel_backend('threading', n_jobs=2): ... multitask(add_ten, range(len(numbers)), numbers, None) [10, 11, 12, 13, 14]
- Parameters:
- funcfunction
The function to repeat multiple times on different sets of arguments. Must be of the form func(args, i) if kwargs is None, or func(args, kwargs, i) otherwise, where i is a member of iterable.
- iterableiterable
A Python object that can be iterated though such that each member can be passed to func as the final argument.
- args
Anything that should be passed to func as the first argument. The intended use is such that the output of func is result[i] = f(args[iterable[i]]).
- kwargsNone or anything
If set to None, multitask assumes func is of the form func(args, i). Otherwise, multitask assumes func is of the form func(args, kwargs, i).
- jobsint, optional
If set to a positive integer, processes tasks in parallel using jobs threads. If negative, sets the number of threads to the number of CPUs - jobs + 1. i.e., if jobs = -1, multitask will use all available CPUs.
- skiparray_like of bool, optional
Should be of len(iterable), where a value of True signifies that processing should be skipped for that task and omitted from the final result.
- max_nbytesint or str or None, optional
Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend=”loky” or “multiprocessing”. The default is currently set to ‘1M’ for consistency with the joblib library. Note that memmapping disallows in-place modification of data, so if this functionality is required set max_nbytes to None.
- force_threadingbool, optional
If True, force joblib to run parrallel jobs using threads so that shared memory can be used. Otherwise, threading will only occur when parallel processing is spawned from a child process of the main thread.
- force_processesbool, optional
If True, force joblib to run parallel jobs using CPUs rather than threads. This can sometimes lead to unexpected outcomes when the multiprocessing is launched from a non-main thread. Pickling arguments prior and return values during processing is recommended in this case.
- loggerlogging.Logger, optional
If supplied, will attempt to produce sensibly ordered logs for the multiprocessing tasks for all handlers.
- Returns:
- resultlist
The final output where result[i] = func(args, iterable[i]). Will be of length len(iterable) if skip is None, otherwise len(iterable) - sum(skip).
- grig.toolkit.multiproc.pickle_list(object_list, prefix=None, naming_attribute=None, class_type=None)[source]¶
Pickle a list of objects to a temporary directory.
The list will be updated in-place, with each element being replaced by the on-disk file path to the pickle file in which it is saved.
- Parameters:
- object_listlist (object)
A list of things to pickle.
- prefixstr, optional
The prefix for the temporary directory in which to store the pickle files. See
tempfile.mkdtemp()
for further information.- naming_attributestr, optional
The attribute used to name the pickle file. If not supplied, defaults to id(object).
- class_typeclass, optional
If supplied, only objects of this class type will be pickled.
- Returns:
- temporary_directorystr
The temporary directory in which the objects are saved as pickle files.
- grig.toolkit.multiproc.pickle_object(obj, filename)[source]¶
Pickle a object and save to the given filename.
- Parameters:
- objobject
The object to pickle.
- filenamestr or None
If filename points to a writeable on-disk location, obj will be pickled and saved to that location. If None, nothing will happen.
- Returns:
- outputstr or object
Either the filename if the object was pickled, or obj if it wasn’t.
- grig.toolkit.multiproc.purge_multitask_logs(log_directory, log_pickle_file, use_logger=None)[source]¶
Remove all temporary logging files/directories and handle log records.
The user must supply a log_directory containing pickle files of the log records to handle. The log_pickle_file contains the logger used to handle these records. Following completion, both the ‘log_directory` and log_pickle_file will be deleted from the file system.
- Parameters:
- log_directorystr
The directory in which the log records for each run were stored. This directory will be removed.
- log_pickle_filestr
The pickle file containing the logger for multitask. This will be removed.
- use_loggerLogger, optional
The logger to handle any log records. If not supplied, defaults to that found in the log_pickle_file.
- Returns:
- None
- grig.toolkit.multiproc.relative_cores(jobs)[source]¶
Return the actual number of cores to use for a given number of jobs.
Returns 1 in cases where jobs is None or 0. If jobs is less than zero, the returned value will be max_available_cores + jobs + 1. i.e., -1 will use all available cores.
- Parameters:
- jobsint or float or None
- Returns:
- n_coresint
The number of cores to use which will always be in the range 1 -> max_available_cores.
- grig.toolkit.multiproc.unpickle_file(filename)[source]¶
Unpickle a string argument if it is a file, and return the result.
- Parameters:
- filenameobject or str
If the argument is a string and a valid file path, it will be unpickled and the result will be available in the result.
- Returns:
- obj, pickle_fileobject, str
If the argument passed in was not a string or an invalid file, the resulting output obj will be argument and pickle_file will be None. If argument was a valid file path to a pickle file, obj will be the unpickled result, and pickle`file will be argument.
- grig.toolkit.multiproc.unpickle_list(pickle_files, delete=True)[source]¶
Restore pickle files to objects in-place.
- Parameters:
- pickle_fileslist (str)
A list of on-disk pickle files to restore. The restored objects will replace the filepath for each element in the list.
- deletebool, optional
If True, delete each pickle file once it has been restored.
- Returns:
- None
- grig.toolkit.multiproc.valid_relative_jobs(jobs)[source]¶
Return a valid number of jobs in the range 1 <= jobs <= max_cores.
- Parameters:
- jobsint
An positive or negative integer. Negative values are processed as max_cores - jobs + 1.
- Returns:
- valid_jobsint
The number of jobs available to process.
- grig.toolkit.multiproc.wrap_function(func, args, kwargs=None, logger=None, log_directory=None)[source]¶
Wrap a function for use with
multitask()
.- Parameters:
- funcfunction
The function to wrap.
- argstuple
The function arguments.
- kwargsdict, optional
Any function keyword arguments.
- loggerlogging.Logger or str, optional
A logger used to output any log messages once complete. If supplied, a valid log_directory must also be supplied. A path to a pickle file containing the logger may also be supplied.
- log_directorystr, optional
If supplied together with a logger, will store all log records to a pickle file in the given directory.
- Returns:
- wrapped_function, log_pickle_filefunction, str
The wrapped function and the file location of the pickle file for any supplied logger. If no logger was supplied, this value will be None.
- grig.toolkit.multiproc.wrapped_with_logger(func, logger_pickle_file, log_directory, run_arg_and_identifier)[source]¶
Return the results of the function in multitask and save log records.
- Parameters:
- funcfunction
The function to wrap.
- logger_pickle_filestr
The file path to the pickled logger (
logging.Logger
).- log_directorystr
The directory in which to store the log records for each run.
- run_arg_and_identifier2-tuple
Any run time argument that the wrapped function returned by
_wrap_function()
requires (object), and an integer identifier signifying it’s position in a list of run arguments.
- Returns:
- resultsobject
The results of running func on a given run time argument.