serpentine package

Submodules

serpentine.serpentine module

Serpentine binning

An implementation of the so-called ‘serpentine binning’ procedure described in Scolari et al.

Command line:

Usage:
    serpentine.py [<matrixA>] [<matrixB>] [--threshold=auto] [--verbose]
                  [--min-threshold=auto] [--trend=high] [--triangular]
                  [--limit=3] [--demo] [--demo-size=500]

Arguments:
    matrixA                         The first input matrix, in plain text
                                    CSV format. Optional in demo mode.
    matrixB                         The second input matrix, in plain text
                                    CSV format. Optional in demo mode or
                                    single binning mode.

Options:
    -h, --help                      Display this help message.
    --version                       Display the program's current version.
    -t auto, --threshold auto       Threshold value to trigger binning.
                                    [default: auto]
    -m auto, --min-threshold auto   Minimum value to force trigger binning
                                    in either matrix. [default: auto]
    --trend high                    Trend to subtract to the differential
                                    matrix, possible values are "mean":
                                    equal amount of positive and negative
                                    differences, and "high": normalize
                                    at the regions with higher coverage.
                                    [default: high]
    --triangular                    Treat the matrix as triangular,
                                    useful when plotting matrices adjacent
                                    to the diagonal. [default: False]
    --limit 3                       Set the z-axis limit on the
                                    plot of the differential matrix.
                                    [default: 3]
    --demo                          Run a demo on randomly generated
                                    matrices. [default: False]
    --demo-size 500                 Size of the test matrix for the demo.
                                    [default: 500]
    -v, --verbose                   Show verbose output. [default: False]
serpentine.serpentine.MDafter(XA, XB, XD, s=10, xlim=None, ylim=None, triangular=False, show=True)

MD plot after binning

The MD plot is the main metric provided by the serpentin binning package, it visualized the variability in function of the mean coverage of a couple of matrices to be compared. This version is optimized to be run after the serpentin binning. The return values of this function should be generally ignored.

Parameters:
  • XB (XA,) – The input matrices
  • XD (array_like) – The differential matrix obtained by serpentin binnning
  • s (int, optional) – How many point use for the trend lines, depends on statistics and range
  • xlim (float, optional) – Limits for the x-axis
  • ylim (float, optional) – Limits for the y-axis
  • triangular (bool, optional) – Set triangular if you are interested in rebin only half of the matrix (for instance in the case of matrices which are already triangular, default is false)
  • show (bool, optional) – Set it to false if you are not interested in the graph but only in the return values of this function.
Returns:

trend, threshold – Normally something which should not bother you

Return type:

float

serpentine.serpentine.MDbefore(XA, XB, s=10, xlim=None, ylim=None, triangular=False, show=True)

MD plot before binning

The MD plot is the main metric provided by the serpentin binning package, it visualized the variability in function of the mean coverage of a couple of matrices to be compared. This version is optimized to be run before the serpentin binning. The return values of this funciton will be hints on the values to use to normalize the differential matrix and as a threshold for the serpentin binning algorithm.

Parameters:
  • XB (XA,) – The input matrices
  • s (int, optional) – How many point use for the trend lines, depends on statistics and range
  • xlim (float, optional) – Limits for the x-axis
  • ylim (float, optional) – Limits for the y-axis
  • triangular (bool, optional) – Set triangular if you are interested in rebin only half of the matrix (for instance in the case of matrices which are already triangular, default is false)
  • show (bool, optional) – Set it to false if you are not interested in the graph but only in the return values of this function.
Returns:

  • trend (float) – This is the extrapolated trend of the two matrices, it is measured by the ratio of the most covered part of the two matrices, use it to set the zero to the output of the differential analysis if you think this works better than the np.mean() function.
  • threshold (float) – If the statistics permits the automatical extimation of the threshold, this will be a good parameter to use as input to the serpentin binning algorithm

serpentine.serpentine.all_barycenters(M_serp, weights=None)

Compute all serpentine barycenters

Extract all serpentines from a serpentinized matrix, then compute all serpentine barycenters, optionally weigthed by the non-serpentinized matrix.

Parameters:
  • M_serp (numpy.ndarray) – The serpentinized matrix
  • weights (numpy.ndarray or None, optional) – The non-serpentinized (original) matrix acting as weights. Default is None.
serpentine.serpentine.barycenter(serp, weights=None)

Compute weighted serpentine barycenter

Compute the (optionally weighted) barycenter of a serpentine, where the weights would be equal to the values of the map itself.

Parameters:
  • serp (iterable) – An iterable of serpentine coordinates
  • weights (numpy.ndarray or None, optional) – If None, the barycenter is unweighted. Otherwise, if it is a contact map, the barycenter is weighted by the values of the map at the serpentine’s coordinates. Default is None.
Returns:

bary – The barycenter coordinates

Return type:

tuple

serpentine.serpentine.dshow(dif, trend, limit=3, triangular=False, colorbar=True, cmap=None, ax=<module 'matplotlib.pyplot' from '/home/docs/checkouts/readthedocs.org/user_builds/serpentine/envs/latest/lib/python3.7/site-packages/matplotlib-3.0.3-py3.7-linux-x86_64.egg/matplotlib/pyplot.py'>)

Show differential matrix

A boilerplate around the imshow matplotlib function to show the differential matrix

Parameters:
  • dif (array_like) – The differential matrix
  • trend (float) – The value of the zero, please use either the output of MDbefore function or the value of md.mean(dif)
  • limit (float, optional) – The colorscale limit of the log-ratio, setting it to 2 or 3 seems like a sensible choice. Defaults to 3
  • triangular (bool, optional) – Set triangular if you are interested in rebin only half of the matrix (for instance in the case of matrices which are already triangular, default is false)
  • cmap (str, optional) – Color map of the plotted matrix. Should be ideally diverging, default is sesismic.
  • ax (optional) – Set axis, defaults to matplotlib library
Returns:

Return type:

The plot

serpentine.serpentine.extract_serpentines(M)

Extract serpentine structure

Isolate serpentines based on shared pixel values in a contact map.

Parameters:M (numpy.ndarray) – The (serpentinized) input contact map
serpentine.serpentine.fltmatr(X: numpy.ndarray, flt: numpy.ndarray) → numpy.ndarray

Filter a 2D matrix in both dimensions according to an boolean vector.

Parameters:
  • X (array_like) – The input matrix
  • flr (array_like) – The boolean filter
Returns:

The filtered matrix

Return type:

array_like

Example

>>> import numpy as np
>>> M = np.ones((5, 5))
>>> M[2:4, 2:4] = 2
>>> print(M)
[[ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  2.  2.  1.]
 [ 1.  1.  2.  2.  1.]
 [ 1.  1.  1.  1.  1.]]
>>> flt = M.sum(axis=1) > 5
>>> X = fltmatr(M, flt)
>>> print(X)
[[ 2.  2.]
 [ 2.  2.]]
serpentine.serpentine.fromupdiag(filename)

Load a DADE matrix into a numpy array

serpentine.serpentine.mad(data: numpy.ndarray, axis: Optional[int] = None) → numpy.float64

Median absolute deviation

Calculates the median absolute deviation of data

Parameters:
  • data (array_like) – The dataset
  • axis (int, optional) – The axis over which perform the numpy.median function
Returns:

The median absolute deviation

Return type:

float

serpentine.serpentine.mshow(XX, subplot=<module 'matplotlib.pyplot' from '/home/docs/checkouts/readthedocs.org/user_builds/serpentine/envs/latest/lib/python3.7/site-packages/matplotlib-3.0.3-py3.7-linux-x86_64.egg/matplotlib/pyplot.py'>, colorbar=True, triangular=False)

Boilerplate around the imshow function to show a matrix.

serpentine.serpentine.outstanding_filter(X: numpy.ndarray) → numpy.ndarray

Generate filtering index that removes outstanding values (three standard MADs above or below the median).

Parameters:X (array_like) – The dataset
Returns:The boolean filter
Return type:array_like

Example

>>> import numpy as np
>>> X = np.arange(25).reshape((5,5))
>>> X += X.T
>>> X[2,2] += 10000
>>> print(X)
[[    0     6    12    18    24]
 [    6    12    18    24    30]
 [   12    18 10024    30    36]
 [   18    24    30    36    42]
 [   24    30    36    42    48]]
>>> O = outstanding_filter(X)
>>> print(O)
[False False  True False False]
serpentine.serpentine.serpentin_binning(A: numpy.ndarray, B: numpy.ndarray, threshold: float = 50.0, minthreshold: float = 10.0, iterations: float = 10.0, triangular: bool = False, verbose: bool = True, parallel: int = 4, sizes: bool = False) → Tuple

Perform the serpentin binning

The function will perform the algorithm to serpentin bin two matrices, iterations can be done in series or in parallel, convinient for multi-processor machines.

Parameters:
  • B (A,) – The matrices to be compared.
  • threshold (float, optional) – The threshold of rebinning for the highest coverage matrix. Default is set by the DEFAULT_THRESHOLD parameter, which is 50 if unchanged.
  • minthreshold (float, optional) – The threshold for both matrices. Default is set by the DEFAULT_MIN_THRESHOLD parameter, which is 10 if unchanged.
  • iterations (int, optional) – The number of iterations requested, more iterations will consume more time, but also will result in better and smoother results. Default is set by the DEFAULT_ITERATIONS parameter, which is 10 if unchanged.
  • triangular (bool, optional) – Set triangular if you are interested in rebinning only half of the matrix (for instance in the case of matrices which are already triangular, default is False).
  • verbose (bool, optional) – Whether to print additional output during the computation. Default is False.
  • parallel (int, optional) – Set it to the number of your processor if you want to attain maximum speeds. Default is 4.
  • sizes (bool, optional) – Whether to keep track of the serpentine size distribution, in which case it will be returned as a Counter. Default is False.
Returns:

  • sA, sB (array_like) – The rebinned matrices
  • sK (array_like) – The log-ratio matrix, expressend in base 2. Attention, the matrix needs to be normalized by subtracting an appropriate value for the zero (MDbefore or numpy.mean functions are there to help you in this task).
  • serp_size_distribution (collections.Counter, optional) – A counter keeping track of the serpentine size distribution. Only returned if the supplied ‘sizes’ parameter is True.

serpentine.serpentine.serpentin_iteration(A: numpy.ndarray, B: numpy.ndarray, threshold: float = 50.0, minthreshold: float = 10.0, triangular: bool = False, verbose: bool = True) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

Perform a single iteration of serpentin binning

Each serpentin binning is generally executed in multiple iterations in order to smooth the random variability in the bin aggregation. This funciton performs a single iteration.

Parameters:
  • B (A,) – The matrices to be compared.
  • threshold (float, optional) – The threshold of rebinning for the highest coverage matrix.
  • minthreshold (float, optional) – The threshold for both matrices
  • triangular (bool, optional) – Set triangular if you are interested in rebin only half of the matrix (for instance in the case of matrices which are already triangular, default is false)
  • verbose (bool, optional) – Set it false if you are annoyed by the printed output.
Returns:

  • Amod, Bmod (array_like) – The rebinned matrices
  • D (array_like) – The log-ratio matrix, expressend in base 2. Attention, the matrix need to be normalized by subtractiong an appropriate value for the zero (MDbefore or numpy.mean functions are there to help you in this task).

serpentine.version module

Module contents

Serpentine binning for Hi-C contact maps

Provides:
-An implementation of the serpentine binning procedure described in Scolari et al., usable on single or multiple contact maps -A set of functions for quickly visualizing the effects of serpentine binning in terms of contact distribution (median absolute deviation, etc.)
serpentine.gaussian_blurring(input, *, sigma=1, order=0, output=None, mode='reflect', cval=0.0, truncate=4.0)

Multidimensional Gaussian filter.

Parameters:
  • input (array_like) – The input array.
  • sigma (scalar or sequence of scalars) – Standard deviation for Gaussian kernel. The standard deviations of the Gaussian filter are given for each axis as a sequence, or as a single number, in which case it is equal for all axes.
  • order (int or sequence of ints, optional) – The order of the filter along each axis is given as a sequence of integers, or as a single number. An order of 0 corresponds to convolution with a Gaussian kernel. A positive order corresponds to convolution with that derivative of a Gaussian.
  • output (array or dtype, optional) – The array in which to place the output, or the dtype of the returned array. By default an array of the same dtype as input will be created.
  • mode (str or sequence, optional) –

    The mode parameter determines how the input array is extended when the filter overlaps a border. By passing a sequence of modes with length equal to the number of dimensions of the input array, different modes can be specified along each axis. Default value is ‘reflect’. The valid values and their behavior is as follows:

    ’reflect’ (d c b a | a b c d | d c b a)
    The input is extended by reflecting about the edge of the last pixel.
    ’constant’ (k k k k | a b c d | k k k k)
    The input is extended by filling all values beyond the edge with the same constant value, defined by the cval parameter.
    ’nearest’ (a a a a | a b c d | d d d d)
    The input is extended by replicating the last pixel.
    ’mirror’ (d c b | a b c d | c b a)
    The input is extended by reflecting about the center of the last pixel.
    ’wrap’ (a b c d | a b c d | a b c d)
    The input is extended by wrapping around to the opposite edge.
  • cval (scalar, optional) – Value to fill past edges of input if mode is ‘constant’. Default is 0.0.
  • truncate (float) – Truncate the filter at this many standard deviations. Default is 4.0.
Returns:

gaussian_filter – Returned array of same shape as input.

Return type:

ndarray

Notes

The multidimensional filter is implemented as a sequence of one-dimensional convolution filters. The intermediate arrays are stored in the same data type as the output. Therefore, for output types with a limited precision, the results may be imprecise because intermediate results may be stored with insufficient precision.

Examples

>>> from scipy.ndimage import gaussian_filter
>>> a = np.arange(50, step=2).reshape((5,5))
>>> a
array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28],
       [30, 32, 34, 36, 38],
       [40, 42, 44, 46, 48]])
>>> gaussian_filter(a, sigma=1)
array([[ 4,  6,  8,  9, 11],
       [10, 12, 14, 15, 17],
       [20, 22, 24, 25, 27],
       [29, 31, 33, 34, 36],
       [35, 37, 39, 40, 42]])
>>> from scipy import misc
>>> import matplotlib.pyplot as plt
>>> fig = plt.figure()
>>> plt.gray()  # show the filtered result in grayscale
>>> ax1 = fig.add_subplot(121)  # left side
>>> ax2 = fig.add_subplot(122)  # right side
>>> ascent = misc.ascent()
>>> result = gaussian_filter(ascent, sigma=5)
>>> ax1.imshow(ascent)
>>> ax2.imshow(result)
>>> plt.show()