grig.toolkit package¶

Submodules¶

grig.toolkit.func module¶

grig.toolkit.func.byte_size_of_object(obj)[source]¶

Return the size of a Python object in bytes.

Parameters:

objobject

Returns:

byte_sizeint

grig.toolkit.func.julia_fractal(sy, sx, c0=-0.4, c1=0.6, iterations=256, xrange=(-1, 1), yrange=(-1, 1), normalize=True)[source]¶

Generate a 2-D Julia fractal image

Parameters:

syint: y dimension size.
sxint: x dimension size.
c0float, optional: The c0 coefficient.
c1float, optional: The c1 coefficient.
iterationsint, optional: The number of steps.
xrangearray_like of int or float, optional: The range of x values.
yrangearray_like of int or float, optional: The range of y values.
normalizebool, optional

Returns:

grig.toolkit.func.moments(data, threshold=None, mask=None, axis=None, get_mask=False)[source]¶

Computes statistics on a data set avoiding deviant points if requested

Moments are calculated for a given set of data. If a value is passed to threshold, then the dataset is searched for outliers. A data point is identified as an outlier if abs(x_i - x_med)/MAD > threshold, where x_med is the median, MAD is the median absolute deviation defined as 1.482 * median(abs(x_i - x_med)).

Parameters:

dataarray_like of float: (shape1) Data on which to calculate moments
maskarray_like of bool: (shape1) Mask to apply to data
thresholdfloat, optional: Sigma threshold over which values are identified as outliers
axisint, optional: Axis over which to calculate statistics
get_maskbool, optional: If True, only return the output mask

Returns:

dict or numpy.ndarray: If get_mask is False, returns a dictionary containing the following statistics: mean, var, stddev, skew, kurt, stderr, mask. Otherwise, returns the output mask.

grig.toolkit.func.robust_mask(data, threshold, mask=None, axis=None, mask_data=False, cval=nan)[source]¶

Computes a mask derived from data Median Absolute Deviation (MAD).

Calculates a robust mask based on the input data and optional input mask. If \(threshold > 0\), the dataset is searched for outliers. Outliers are identified for point \(i\) if

\[\frac{|y_i - median[y]|}{MAD} > threshold\]

where \(MAD\) is the Median Absolute Deviation defined as

\[MAD = 1.482 * median[|y_i - median[y]|]\]

Parameters:

dataarray_like of float: The data on which to derive a robust mask.
thresholdfloat: Threshold as described above.
maskarray_like of bool, optional: If supplied, must be the same shape as data. Any masked (False) data values will not be included in the \(MAD\) calculation. Additionally, masked elements will also be masked (False) in the output mask.
axisint, optional: Axis over which to calculate the \(MAD\). The default (None) derives the \(MAD\) from the entire set of data.
mask_databool, optional: If True, return a copy of data with masked values replaced by cval in addition to the output mask. The default is False. Note that the output type will
cvalint or float, optional: if mask_data is set to True, masked values will be replaced by cval. The default is numpy.nan.

Returns:

numpy.ndarray of bool, [numpy.ndarray of numpy.float64]: The output mask where False indicates a masked value, while True indicates that associated data deviation is below the threshold limit. If mask_data was True, also returns a copy of data with masked values replaced by cval.

grig.toolkit.func.slicer(array, axis, index, ind=False)[source]¶

Returns a slice of an array in arbitrary dimension.

Parameters:

arraynumpy.ndarray: array to slice
axisint or array_like: axis to slice on
indexint or array_like of int: index retrieved
indbool, optional: If True, return the slices rather than sliced array

Returns:

numpy.ndarray or tuple of slice

grig.toolkit.func.taylor(order, n)[source]¶

Taylor expansion generator for Polynomial exponents

Parameters:

orderint: Order of Polynomial
nint: Number of variables to solve for

Yields:

n-tuple of int: The next polynomial exponent

grig.toolkit.multiproc module¶

class grig.toolkit.multiproc.MultitaskHandler[source]¶

Bases: Handler

A log handler for multitask.

Attributes:

name

Methods

`acquire`()	Acquire the I/O thread lock.
`addFilter`(filter)	Add the specified filter to this handler.
`close`()	Tidy up any resources used by the handler.
`createLock`()	Acquire a thread lock for serializing access to the underlying I/O.
`emit`(record)	Emit a log record.
`filter`(record)	Determine if a record is loggable by consulting all the filters.
`flush`()	Ensure all logging output has been flushed.
`format`(record)	Format the specified record.
`handle`(record)	Conditionally emit the specified logging record.
`handleError`(record)	Handle errors which occur during an emit() call.
`release`()	Release the I/O thread lock.
`removeFilter`(filter)	Remove the specified filter from this handler.
`reorder_records`()	Re-order the records in a sensible order.
`setFormatter`(fmt)	Set the formatter for this handler.
`setLevel`(level)	Set the logging level of this handler.

get_name
set_name

emit(record)[source]¶

Emit a log record.

Stores the record in the lookup dictionary for the given process/thread. Each message is stored in the received order for later retrieval once whatever multiprocessing job is complete.

Parameters:

recordlogging.LogRecord: The record to emit.

Returns:

None

reorder_records()[source]¶

Re-order the records in a sensible order.

The records are sorted by process and then thread in chronological order. I.e., records are grouped by processes, starting with the first process that appears in the logs and then within that process group, a similar grouping is performed for each child thread. Each process-thread grouping will contain a list of log records in the order that they where emitted.

Note that each record is a tuple of the form (time, process, thread, log_record).

Returns:

None

grig.toolkit.multiproc.get_core_number(cores=True)[source]¶

Returns the maximum number of CPU cores available

Parameters:

coresbool or int, optional: If False, returns 1. If True, returns the maximum number of cores available. An integer specifies an upper limit on the maximum number of cores to return

Returns:

coresint: The maximum number of cores to use for parallel processing

grig.toolkit.multiproc.in_main_thread()[source]¶

Return whether the process is running in the main thread.

Returns:

main_thread: bool: True if this process is running in the main thread, and False if it is running in a child process.

grig.toolkit.multiproc.log_for_multitask(logger)[source]¶

Context manager to output log messages during multiprocessing.

Stores all log messages during multiprocessing, and emits them using the given logger once complete.

Parameters:

loggerlogging.Logger

Yields:

None

grig.toolkit.multiproc.log_records_to_pickle_file(logger, pickle_file)[source]¶

Store the log records in a pickle file rather than emitting.

Parameters:

loggerlogging.Logger
pickle_filestr: The path to the pickle file that will contain the log records.

Yields:

None

grig.toolkit.multiproc.log_with_multi_handler(logger)[source]¶

Context manager to temporarily log messages for unique processes/threads

Temporarily disables all log handlers and outputs the results to a dictionary of the form {(process, thread): list(records)} where process is returned by multiprocessing.current_process() and thread is returned by threading.current_thread().

Parameters:

loggerlogging.Logger

Yields:

multi_handlerMultitaskHandler

grig.toolkit.multiproc.multitask(func, iterable, args, kwargs, jobs=None, skip=None, max_nbytes='1M', force_threading=False, force_processes=False, logger=None)[source]¶

Process a series of tasks in serial, or in parallel using joblib.

multitask is used to run a function multiple times on a series of arguments. Tasks may be run in series (default), or in parallel using multi-processing via the joblib package.

If an error is encountered while attempting to process in parallel with joblib, an attempt will be made to process the tasks in series.

The function to process multiple times (func) must take one of the following forms:

1. result[i] = func(args, iterable[i])
2. result[i] = func(args, kwargs, iterable[i])

Here, the above “args” is the same as the args argument. i.e., the full argument list. Setting the argument kwargs to None implies that func takes form 1, while anything else sets multitask to assume the function is of form 2.

Since this is a non-standard method of specifying a function, it is highly likely the user will have to define their own func. For example, to use multitask to add ten to a series of numbers:

>>> from grig.toolkit.multiproc import multitask
>>> numbers = list(range(5))
>>> numbers
[0, 1, 2, 3, 4]
>>> def add_ten(args, i):
...     return args[i] + 10
>>> multitask(add_ten, range(len(numbers)), numbers, None)
[10, 11, 12, 13, 14]

In the above example, iterable has been set to range(len(numbers)) indicating that multitask should supply i to add_ten multiple times (0, 1, 2, 3, 4). Note that kwargs is explicitly set to None indicating that add_ten is of form 1. While multitask may seem like overkill in this example, it can be highly adaptable for complex procedures.

The skip parameter can be used to skip processing of certain tasks. For example:

>>> skip = [False] * len(numbers)
>>> skip[2] = True
>>> multitask(add_ten, range(len(numbers)), numbers, None, skip=skip)
[10, 11, 13, 14]

By default, parallelization is managed with the loky backend. If calling code is known to perform better with the threading backend, it should be called within the joblib parallel_backend context manager:

>>> from joblib import parallel_backend
>>> with parallel_backend('threading', n_jobs=2):
...    multitask(add_ten, range(len(numbers)), numbers, None)
[10, 11, 12, 13, 14]

Parameters:

funcfunction: The function to repeat multiple times on different sets of arguments. Must be of the form func(args, i) if kwargs is None, or func(args, kwargs, i) otherwise, where i is a member of iterable.
iterableiterable: A Python object that can be iterated though such that each member can be passed to func as the final argument.
args: Anything that should be passed to func as the first argument. The intended use is such that the output of func is result[i] = f(args[iterable[i]]).
kwargsNone or anything: If set to None, multitask assumes func is of the form func(args, i). Otherwise, multitask assumes func is of the form func(args, kwargs, i).
jobsint, optional: If set to a positive integer, processes tasks in parallel using jobs threads. If negative, sets the number of threads to the number of CPUs - jobs + 1. i.e., if jobs = -1, multitask will use all available CPUs.
skiparray_like of bool, optional: Should be of len(iterable), where a value of True signifies that processing should be skipped for that task and omitted from the final result.
max_nbytesint or str or None, optional: Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., ‘1M’ for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend=”loky” or “multiprocessing”. The default is currently set to ‘1M’ for consistency with the joblib library. Note that memmapping disallows in-place modification of data, so if this functionality is required set max_nbytes to None.
force_threadingbool, optional: If True, force joblib to run parrallel jobs using threads so that shared memory can be used. Otherwise, threading will only occur when parallel processing is spawned from a child process of the main thread.
force_processesbool, optional: If True, force joblib to run parallel jobs using CPUs rather than threads. This can sometimes lead to unexpected outcomes when the multiprocessing is launched from a non-main thread. Pickling arguments prior and return values during processing is recommended in this case.
loggerlogging.Logger, optional: If supplied, will attempt to produce sensibly ordered logs for the multiprocessing tasks for all handlers.

Returns:

resultlist: The final output where result[i] = func(args, iterable[i]). Will be of length len(iterable) if skip is None, otherwise len(iterable) - sum(skip).

grig.toolkit.multiproc.pickle_list(object_list, prefix=None, naming_attribute=None, class_type=None)[source]¶

Pickle a list of objects to a temporary directory.

The list will be updated in-place, with each element being replaced by the on-disk file path to the pickle file in which it is saved.

Parameters:

object_listlist (object): A list of things to pickle.
prefixstr, optional: The prefix for the temporary directory in which to store the pickle files. See tempfile.mkdtemp() for further information.
naming_attributestr, optional: The attribute used to name the pickle file. If not supplied, defaults to id(object).
class_typeclass, optional: If supplied, only objects of this class type will be pickled.

Returns:

temporary_directorystr: The temporary directory in which the objects are saved as pickle files.

grig.toolkit.multiproc.pickle_object(obj, filename)[source]¶

Pickle a object and save to the given filename.

Parameters:

objobject: The object to pickle.
filenamestr or None: If filename points to a writeable on-disk location, obj will be pickled and saved to that location. If None, nothing will happen.

Returns:

outputstr or object: Either the filename if the object was pickled, or obj if it wasn’t.

grig.toolkit.multiproc.purge_multitask_logs(log_directory, log_pickle_file, use_logger=None)[source]¶

Remove all temporary logging files/directories and handle log records.

The user must supply a log_directory containing pickle files of the log records to handle. The log_pickle_file contains the logger used to handle these records. Following completion, both the ‘log_directory` and log_pickle_file will be deleted from the file system.

Parameters:

log_directorystr: The directory in which the log records for each run were stored. This directory will be removed.
log_pickle_filestr: The pickle file containing the logger for multitask. This will be removed.
use_loggerLogger, optional: The logger to handle any log records. If not supplied, defaults to that found in the log_pickle_file.

Returns:

None

grig.toolkit.multiproc.relative_cores(jobs)[source]¶

Return the actual number of cores to use for a given number of jobs.

Returns 1 in cases where jobs is None or 0. If jobs is less than zero, the returned value will be max_available_cores + jobs + 1. i.e., -1 will use all available cores.

Parameters:

jobsint or float or None

Returns:

n_coresint: The number of cores to use which will always be in the range 1 -> max_available_cores.

grig.toolkit.multiproc.unpickle_file(filename)[source]¶

Unpickle a string argument if it is a file, and return the result.

Parameters:

filenameobject or str: If the argument is a string and a valid file path, it will be unpickled and the result will be available in the result.

Returns:

obj, pickle_fileobject, str: If the argument passed in was not a string or an invalid file, the resulting output obj will be argument and pickle_file will be None. If argument was a valid file path to a pickle file, obj will be the unpickled result, and pickle`file will be argument.

grig.toolkit.multiproc.unpickle_list(pickle_files, delete=True)[source]¶

Restore pickle files to objects in-place.

Parameters:

pickle_fileslist (str): A list of on-disk pickle files to restore. The restored objects will replace the filepath for each element in the list.
deletebool, optional: If True, delete each pickle file once it has been restored.

Returns:

None

grig.toolkit.multiproc.valid_relative_jobs(jobs)[source]¶

Return a valid number of jobs in the range 1 <= jobs <= max_cores.

Parameters:

jobsint: An positive or negative integer. Negative values are processed as max_cores - jobs + 1.

Returns:

valid_jobsint: The number of jobs available to process.

grig.toolkit.multiproc.wrap_function(func, args, kwargs=None, logger=None, log_directory=None)[source]¶

Wrap a function for use with multitask().

Parameters:

funcfunction: The function to wrap.
argstuple: The function arguments.
kwargsdict, optional: Any function keyword arguments.
loggerlogging.Logger or str, optional: A logger used to output any log messages once complete. If supplied, a valid log_directory must also be supplied. A path to a pickle file containing the logger may also be supplied.
log_directorystr, optional: If supplied together with a logger, will store all log records to a pickle file in the given directory.

Returns:

wrapped_function, log_pickle_filefunction, str: The wrapped function and the file location of the pickle file for any supplied logger. If no logger was supplied, this value will be None.

grig.toolkit.multiproc.wrapped_with_logger(func, logger_pickle_file, log_directory, run_arg_and_identifier)[source]¶

Return the results of the function in multitask and save log records.

Parameters:

funcfunction: The function to wrap.
logger_pickle_filestr: The file path to the pickled logger (logging.Logger).
log_directorystr: The directory in which to store the log records for each run.
run_arg_and_identifier2-tuple: Any run time argument that the wrapped function returned by _wrap_function() requires (object), and an integer identifier signifying it’s position in a list of run arguments.

Returns:

resultsobject: The results of running func on a given run time argument.

grig.toolkit package¶

Subpackages¶

Submodules¶

grig.toolkit.func module¶

grig.toolkit.multiproc module¶

Table of Contents

Previous topic

Next topic

This Page