grig.toolkit package

Subpackages

Submodules

grig.toolkit.func module

grig.toolkit.func.byte_size_of_object(obj)[source]

Return the size of a Python object in bytes.

Parameters:
objobject
Returns:
byte_sizeint
grig.toolkit.func.julia_fractal(sy, sx, c0=-0.4, c1=0.6, iterations=256, xrange=(-1, 1), yrange=(-1, 1), normalize=True)[source]

Generate a 2-D Julia fractal image

Parameters:
syint

y dimension size.

sxint

x dimension size.

c0float, optional

The c0 coefficient.

c1float, optional

The c1 coefficient.

iterationsint, optional

The number of steps.

xrangearray_like of int or float, optional

The range of x values.

yrangearray_like of int or float, optional

The range of y values.

normalizebool, optional
Returns:
grig.toolkit.func.moments(data, threshold=None, mask=None, axis=None, get_mask=False)[source]

Computes statistics on a data set avoiding deviant points if requested

Moments are calculated for a given set of data. If a value is passed to threshold, then the dataset is searched for outliers. A data point is identified as an outlier if abs(x_i - x_med)/MAD > threshold, where x_med is the median, MAD is the median absolute deviation defined as 1.482 * median(abs(x_i - x_med)).

Parameters:
dataarray_like of float

(shape1) Data on which to calculate moments

maskarray_like of bool

(shape1) Mask to apply to data

thresholdfloat, optional

Sigma threshold over which values are identified as outliers

axisint, optional

Axis over which to calculate statistics

get_maskbool, optional

If True, only return the output mask

Returns:
dict or numpy.ndarray

If get_mask is False, returns a dictionary containing the following statistics: mean, var, stddev, skew, kurt, stderr, mask. Otherwise, returns the output mask.

grig.toolkit.func.robust_mask(data, threshold, mask=None, axis=None, mask_data=False, cval=nan)[source]

Computes a mask derived from data Median Absolute Deviation (MAD).

Calculates a robust mask based on the input data and optional input mask. If \(threshold > 0\), the dataset is searched for outliers. Outliers are identified for point \(i\) if

\[\frac{|y_i - median[y]|}{MAD} > threshold\]

where \(MAD\) is the Median Absolute Deviation defined as

\[MAD = 1.482 * median[|y_i - median[y]|]\]
Parameters:
dataarray_like of float

The data on which to derive a robust mask.

thresholdfloat

Threshold as described above.

maskarray_like of bool, optional

If supplied, must be the same shape as data. Any masked (False) data values will not be included in the \(MAD\) calculation. Additionally, masked elements will also be masked (False) in the output mask.

axisint, optional

Axis over which to calculate the \(MAD\). The default (None) derives the \(MAD\) from the entire set of data.

mask_databool, optional

If True, return a copy of data with masked values replaced by cval in addition to the output mask. The default is False. Note that the output type will

cvalint or float, optional

if mask_data is set to True, masked values will be replaced by cval. The default is numpy.nan.

Returns:
numpy.ndarray of bool, [numpy.ndarray of numpy.float64]

The output mask where False indicates a masked value, while True indicates that associated data deviation is below the threshold limit. If mask_data was True, also returns a copy of data with masked values replaced by cval.

grig.toolkit.func.slicer(array, axis, index, ind=False)[source]

Returns a slice of an array in arbitrary dimension.

Parameters:
arraynumpy.ndarray

array to slice

axisint or array_like

axis to slice on

indexint or array_like of int

index retrieved

indbool, optional

If True, return the slices rather than sliced array

Returns:
numpy.ndarray or tuple of slice
grig.toolkit.func.taylor(order, n)[source]

Taylor expansion generator for Polynomial exponents

Parameters:
orderint

Order of Polynomial

nint

Number of variables to solve for

Yields:
n-tuple of int

The next polynomial exponent

grig.toolkit.multiproc module

class grig.toolkit.multiproc.MultitaskHandler[source]

Bases: Handler

A log handler for multitask.

Attributes:
name

Methods

acquire()

Acquire the I/O thread lock.

addFilter(filter)

Add the specified filter to this handler.

close()

Tidy up any resources used by the handler.

createLock()

Acquire a thread lock for serializing access to the underlying I/O.

emit(record)

Emit a log record.

filter(record)

Determine if a record is loggable by consulting all the filters.

flush()

Ensure all logging output has been flushed.

format(record)

Format the specified record.

handle(record)

Conditionally emit the specified logging record.

handleError(record)

Handle errors which occur during an emit() call.

release()

Release the I/O thread lock.

removeFilter(filter)

Remove the specified filter from this handler.

reorder_records()

Re-order the records in a sensible order.

setFormatter(fmt)

Set the formatter for this handler.

setLevel(level)

Set the logging level of this handler.

get_name

set_name

emit(record)[source]

Emit a log record.

Stores the record in the lookup dictionary for the given process/thread. Each message is stored in the received order for later retrieval once whatever multiprocessing job is complete.

Parameters:
recordlogging.LogRecord

The record to emit.

Returns:
None
reorder_records()[source]

Re-order the records in a sensible order.

The records are sorted by process and then thread in chronological order. I.e., records are grouped by processes, starting with the first process that appears in the logs and then within that process group, a similar grouping is performed for each child thread. Each process-thread grouping will contain a list of log records in the order that they where emitted.

Note that each record is a tuple of the form (time, process, thread, log_record).

Returns:
None
grig.toolkit.multiproc.get_core_number(cores=True)[source]

Returns the maximum number of CPU cores available

Parameters:
coresbool or int, optional

If False, returns 1. If True, returns the maximum number of cores available. An integer specifies an upper limit on the maximum number of cores to return

Returns:
coresint

The maximum number of cores to use for parallel processing

grig.toolkit.multiproc.in_main_thread()[source]

Return whether the process is running in the main thread.

Returns:
main_thread: bool

True if this process is running in the main thread, and False if it is running in a child process.

grig.toolkit.multiproc.log_for_multitask(logger)[source]

Context manager to output log messages during multiprocessing.

Stores all log messages during multiprocessing, and emits them using the given logger once complete.

Parameters:
loggerlogging.Logger
Yields:
None
grig.toolkit.multiproc.log_records_to_pickle_file(logger, pickle_file)[source]

Store the log records in a pickle file rather than emitting.

Parameters:
loggerlogging.Logger
pickle_filestr

The path to the pickle file that will contain the log records.

Yields:
None
grig.toolkit.multiproc.log_with_multi_handler(logger)[source]

Context manager to temporarily log messages for unique processes/threads

Temporarily disables all log handlers and outputs the results to a dictionary of the form {(process, thread): list(records)} where process is returned by multiprocessing.current_process() and thread is returned by threading.current_thread().

Parameters:
loggerlogging.Logger
Yields:
multi_handlerMultitaskHandler
grig.toolkit.multiproc.multitask(func, iterable, args, kwargs, jobs=None, skip=None, max_nbytes='1M', force_threading=False, force_processes=False, logger=None)[source]

Process a series of tasks in serial, or in parallel using joblib.

multitask is used to run a function multiple times on a series of arguments. Tasks may be run in series (default), or in parallel using multi-processing via the joblib package.

If an error is encountered while attempting to process in parallel with joblib, an attempt will be made to process the tasks in series.

The function to process multiple times (func) must take one of the following forms:

1. result[i] = func(args, iterable[i])
2. result[i] = func(args, kwargs, iterable[i])

Here, the above “args” is the same as the args argument. i.e., the full argument list. Setting the argument kwargs to None implies that func takes form 1, while anything else sets multitask to assume the function is of form 2.

Since this is a non-standard method of specifying a function, it is highly likely the user will have to define their own func. For example, to use multitask to add ten to a series of numbers:

>>> from grig.toolkit.multiproc import multitask
>>> numbers = list(range(5))
>>> numbers
[0, 1, 2, 3, 4]
>>> def add_ten(args, i):
...     return args[i] + 10
>>> multitask(add_ten, range(len(numbers)), numbers, None)
[10, 11, 12, 13, 14]

In the above example, iterable has been set to range(len(numbers)) indicating that multitask should supply i to add_ten multiple times (0, 1, 2, 3, 4). Note that kwargs is explicitly set to None indicating that add_ten is of form 1. While multitask may seem like overkill in this example, it can be highly adaptable for complex procedures.

The skip parameter can be used to skip processing of certain tasks. For example:

>>> skip = [False] * len(numbers)
>>> skip[2] = True
>>> multitask(add_ten, range(len(numbers)), numbers, None, skip=skip)
[10, 11, 13, 14]

By default, parallelization is managed with the loky backend. If calling code is known to perform better with the threading backend, it should be called within the joblib parallel_backend context manager:

>>> from joblib import parallel_backend
>>> with parallel_backend('threading', n_jobs=2):
...    multitask(add_ten, range(len(numbers)), numbers, None)
[10, 11, 12, 13, 14]
Parameters:
funcfunction

The function to repeat multiple times on different sets of arguments. Must be of the form func(args, i) if kwargs is None, or func(args, kwargs, i) otherwise, where i is a member of iterable.

iterableiterable

A Python object that can be iterated though such that each member can be passed to func as the final argument.

args

Anything that should be passed to func as the first argument. The intended use is such that the output of func is result[i] = f(args[iterable[i]]).

kwargsNone or anything

If set to None, multitask assumes func is of the form func(args, i). Otherwise, multitask assumes func is of the form func(args, kwargs, i).

jobsint, optional

If set to a positive integer, processes tasks in parallel using jobs threads. If negative, sets the number of threads to the number of CPUs - jobs + 1. i.e., if jobs = -1, multitask will use all available CPUs.

skiparray_like of bool, optional

Should be of len(iterable), where a value of True signifies that processing should be skipped for that task and omitted from the final result.

max_nbytesint or str or None, optional

Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend=”loky” or “multiprocessing”. The default is currently set to ‘1M’ for consistency with the joblib library. Note that memmapping disallows in-place modification of data, so if this functionality is required set max_nbytes to None.

force_threadingbool, optional

If True, force joblib to run parrallel jobs using threads so that shared memory can be used. Otherwise, threading will only occur when parallel processing is spawned from a child process of the main thread.

force_processesbool, optional

If True, force joblib to run parallel jobs using CPUs rather than threads. This can sometimes lead to unexpected outcomes when the multiprocessing is launched from a non-main thread. Pickling arguments prior and return values during processing is recommended in this case.

loggerlogging.Logger, optional

If supplied, will attempt to produce sensibly ordered logs for the multiprocessing tasks for all handlers.

Returns:
resultlist

The final output where result[i] = func(args, iterable[i]). Will be of length len(iterable) if skip is None, otherwise len(iterable) - sum(skip).

grig.toolkit.multiproc.pickle_list(object_list, prefix=None, naming_attribute=None, class_type=None)[source]

Pickle a list of objects to a temporary directory.

The list will be updated in-place, with each element being replaced by the on-disk file path to the pickle file in which it is saved.

Parameters:
object_listlist (object)

A list of things to pickle.

prefixstr, optional

The prefix for the temporary directory in which to store the pickle files. See tempfile.mkdtemp() for further information.

naming_attributestr, optional

The attribute used to name the pickle file. If not supplied, defaults to id(object).

class_typeclass, optional

If supplied, only objects of this class type will be pickled.

Returns:
temporary_directorystr

The temporary directory in which the objects are saved as pickle files.

grig.toolkit.multiproc.pickle_object(obj, filename)[source]

Pickle a object and save to the given filename.

Parameters:
objobject

The object to pickle.

filenamestr or None

If filename points to a writeable on-disk location, obj will be pickled and saved to that location. If None, nothing will happen.

Returns:
outputstr or object

Either the filename if the object was pickled, or obj if it wasn’t.

grig.toolkit.multiproc.purge_multitask_logs(log_directory, log_pickle_file, use_logger=None)[source]

Remove all temporary logging files/directories and handle log records.

The user must supply a log_directory containing pickle files of the log records to handle. The log_pickle_file contains the logger used to handle these records. Following completion, both the ‘log_directory` and log_pickle_file will be deleted from the file system.

Parameters:
log_directorystr

The directory in which the log records for each run were stored. This directory will be removed.

log_pickle_filestr

The pickle file containing the logger for multitask. This will be removed.

use_loggerLogger, optional

The logger to handle any log records. If not supplied, defaults to that found in the log_pickle_file.

Returns:
None
grig.toolkit.multiproc.relative_cores(jobs)[source]

Return the actual number of cores to use for a given number of jobs.

Returns 1 in cases where jobs is None or 0. If jobs is less than zero, the returned value will be max_available_cores + jobs + 1. i.e., -1 will use all available cores.

Parameters:
jobsint or float or None
Returns:
n_coresint

The number of cores to use which will always be in the range 1 -> max_available_cores.

grig.toolkit.multiproc.unpickle_file(filename)[source]

Unpickle a string argument if it is a file, and return the result.

Parameters:
filenameobject or str

If the argument is a string and a valid file path, it will be unpickled and the result will be available in the result.

Returns:
obj, pickle_fileobject, str

If the argument passed in was not a string or an invalid file, the resulting output obj will be argument and pickle_file will be None. If argument was a valid file path to a pickle file, obj will be the unpickled result, and pickle`file will be argument.

grig.toolkit.multiproc.unpickle_list(pickle_files, delete=True)[source]

Restore pickle files to objects in-place.

Parameters:
pickle_fileslist (str)

A list of on-disk pickle files to restore. The restored objects will replace the filepath for each element in the list.

deletebool, optional

If True, delete each pickle file once it has been restored.

Returns:
None
grig.toolkit.multiproc.valid_relative_jobs(jobs)[source]

Return a valid number of jobs in the range 1 <= jobs <= max_cores.

Parameters:
jobsint

An positive or negative integer. Negative values are processed as max_cores - jobs + 1.

Returns:
valid_jobsint

The number of jobs available to process.

grig.toolkit.multiproc.wrap_function(func, args, kwargs=None, logger=None, log_directory=None)[source]

Wrap a function for use with multitask().

Parameters:
funcfunction

The function to wrap.

argstuple

The function arguments.

kwargsdict, optional

Any function keyword arguments.

loggerlogging.Logger or str, optional

A logger used to output any log messages once complete. If supplied, a valid log_directory must also be supplied. A path to a pickle file containing the logger may also be supplied.

log_directorystr, optional

If supplied together with a logger, will store all log records to a pickle file in the given directory.

Returns:
wrapped_function, log_pickle_filefunction, str

The wrapped function and the file location of the pickle file for any supplied logger. If no logger was supplied, this value will be None.

grig.toolkit.multiproc.wrapped_with_logger(func, logger_pickle_file, log_directory, run_arg_and_identifier)[source]

Return the results of the function in multitask and save log records.

Parameters:
funcfunction

The function to wrap.

logger_pickle_filestr

The file path to the pickled logger (logging.Logger).

log_directorystr

The directory in which to store the log records for each run.

run_arg_and_identifier2-tuple

Any run time argument that the wrapped function returned by _wrap_function() requires (object), and an integer identifier signifying it’s position in a list of run arguments.

Returns:
resultsobject

The results of running func on a given run time argument.