Serpentine binning

This tutorial aims at demonstrating use cases and for improving Hi-C contact maps with distribution-aware binning, helping readers reproduce the steps in our papers and documenting readers with the implementation.

Loading the library

You can directly use the library’s functions, if you are already manipulating numpy contact maps in your analysis. First, you need to import the library with the following:

[28]:
%matplotlib notebook
[29]:
import numpy as np
from matplotlib import pyplot as plt
import serpentine as sp

After, you need to load your datasets with numpy, we provide a couple of demo datasets in the form of tables, corresponding to Yeast chromosome 7 in two different mutants, in the repository:

[30]:
# Load Yeast data
A = np.loadtxt('../../demos/A.csv')
B = np.loadtxt('../../demos/B.csv')

At this point you are working with raw Hi-C data, serpentine provides convenient functions to visualize your material:

The first dataset:

[31]:
fig = plt.figure()
sp.mshow(A)
[31]:
<matplotlib.image.AxesImage at 0x7fea6aad2320>

The second dataset:

[32]:
fig = plt.figure()
sp.mshow(B)
[32]:
<matplotlib.image.AxesImage at 0x7fea6ac87cc0>

Filtering of the data

The raw data needs to be filtered in order to clean the unmappable rows and columns, this kind of artifacts shows up in the distribution of reads per bin as outliers:

[33]:
plt.figure()
norm = np.log10(np.sum(A + B, axis=0)[np.sum(A + B, axis=0) > 0])
norm = norm[np.isnan(norm) == False]
norm = norm[np.isinf(np.abs(norm)) == False]
plt.hist(norm, bins=50)
plt.axvline(x=np.median(norm), color='g')
plt.axvline(x=np.median(norm) - 3 * 1.4826 * sp.mad(norm), color='r')
plt.axvline(x=np.median(norm) + 3 * 1.4826 * sp.mad(norm), color='r')
[33]:
<matplotlib.lines.Line2D at 0x7fea6a852c50>

with the serpentine library this is easily done achievable using the two included functions

[34]:
flt = sp.outstanding_filter(A) + sp.outstanding_filter(B)
flt = flt == False
A = sp.fltmatr(A, flt)
B = sp.fltmatr(B, flt)

resulting in:

[35]:
fig = plt.figure()
ax1 = fig.add_subplot(1, 2, 1); sp.mshow(A, subplot=ax1)
ax2 = fig.add_subplot(1, 2, 2); sp.mshow(B, subplot=ax2)
[35]:
<matplotlib.image.AxesImage at 0x7fea6a796780>

at this point, additional processing can be done before proceeding, such as iterative normalizations, removing speckles and other such operations.

Serpenting binning the data

Finally you can use the function to bin the data. The function takes two parameters: a threshold that constrains the coverage of the bin in at least one matrix, and the minthreshold that constrain it in both. The function uses multiple processors and can be configured by the optional parameters:

[37]:
sA, sB, sK = sp.serpentin_binning(A, B, threshold, threshold / 5)
Starting 10 binning processes in batches of 4...
0        Total serpentines: 40804 (100.0 %)
0        Total serpentines: 40804 (100.0 %)
0        Total serpentines: 40804 (100.0 %)
0        Total serpentines: 40804 (100.0 %)
1        Total serpentines: 28227 (69.1770414665229 %)
1        Total serpentines: 28188 (69.08146260170571 %)
1        Total serpentines: 28216 (69.1500833251642 %)
1        Total serpentines: 28315 (69.39270659739242 %)
2        Total serpentines: 12232 (29.977453190863642 %)
2        Total serpentines: 12248 (30.016665032839917 %)
2        Total serpentines: 12181 (29.85246544456426 %)
2        Total serpentines: 12288 (30.11469463778061 %)
3        Total serpentines: 7527 (18.446720909714735 %)
3        Total serpentines: 7475 (18.319282423291835 %)
3        Total serpentines: 7522 (18.434467209097146 %)
4        Total serpentines: 6530 (16.003333006567985 %)
4        Total serpentines: 6534 (16.013135967062052 %)
4        Total serpentines: 6533 (16.010685226938534 %)
3        Total serpentines: 7527 (18.446720909714735 %)
5        Total serpentines: 6459 (15.829330457798255 %)
5        Over: 2018-10-31 13:57:24.042602
5        Total serpentines: 6454 (15.817076757180669 %)
5        Total serpentines: 6450 (15.8072737966866 %)
6        Total serpentines: 6452 (15.812175276933633 %)
6        Over: 2018-10-31 13:57:24.099372
4        Total serpentines: 6522 (15.983727085579845 %)
6        Total serpentines: 6450 (15.8072737966866 %)
6        Over: 2018-10-31 13:57:24.135747
5        Total serpentines: 6447 (15.799921576316047 %)
6        Total serpentines: 6447 (15.799921576316047 %)
6        Over: 2018-10-31 13:57:24.195925
0        Total serpentines: 40804 (100.0 %)
0        Total serpentines: 40804 (100.0 %)
0        Total serpentines: 40804 (100.0 %)
0        Total serpentines: 40804 (100.0 %)
1        Total serpentines: 28225 (69.17213998627585 %)
1        Total serpentines: 28226 (69.17459072639937 %)
1        Total serpentines: 28246 (69.22360552886971 %)
1        Total serpentines: 28194 (69.09616704244682 %)
2        Total serpentines: 12231 (29.975002450740124 %)
3        Total serpentines: 7515 (18.417312028232526 %)
2        Total serpentines: 12236 (29.98725615135771 %)
2        Total serpentines: 12225 (29.96029800999902 %)
4        Total serpentines: 6509 (15.95186746397412 %)
5        Total serpentines: 6426 (15.748456033722183 %)
5        Over: 2018-10-31 13:57:24.859513
3        Total serpentines: 7526 (18.444270169591217 %)
2        Total serpentines: 12245 (30.009312812469364 %)
3        Total serpentines: 7542 (18.483482011567492 %)
4        Total serpentines: 6525 (15.991079305950397 %)
4        Total serpentines: 6522 (15.983727085579845 %)
5        Total serpentines: 6453 (15.81462601705715 %)
5        Total serpentines: 6444 (15.792569355945496 %)
3        Total serpentines: 7550 (18.503087932555633 %)
6        Total serpentines: 6453 (15.81462601705715 %)
6        Over: 2018-10-31 13:57:25.030012
6        Total serpentines: 6444 (15.792569355945496 %)
6        Over: 2018-10-31 13:57:25.035744
4        Total serpentines: 6525 (15.991079305950397 %)
5        Total serpentines: 6448 (15.802372316439564 %)
6        Total serpentines: 6447 (15.799921576316047 %)
6        Over: 2018-10-31 13:57:25.130998
0        Total serpentines: 40804 (100.0 %)
0        Total serpentines: 40804 (100.0 %)
1        Total serpentines: 28313 (69.38780511714538 %)
1        Total serpentines: 28130 (68.93931967454171 %)
2        Total serpentines: 12176 (29.84021174394667 %)
3        Total serpentines: 7532 (18.45897461033232 %)
4        Total serpentines: 6509 (15.95186746397412 %)
2        Total serpentines: 12253 (30.028918733457505 %)
5        Total serpentines: 6452 (15.812175276933633 %)
6        Total serpentines: 6450 (15.8072737966866 %)
6        Over: 2018-10-31 13:57:25.767884
3        Total serpentines: 7567 (18.544750514655426 %)
4        Total serpentines: 6581 (16.128320752867367 %)
5        Total serpentines: 6493 (15.912655621997843 %)
6        Total serpentines: 6490 (15.905303401627291 %)
6        Over: 2018-10-31 13:57:25.921796

Checking the results

The quality of the binning can be verified with an MD plot. This time, use the MDafter function: if the process was successful you would expect the effect of sampling to be reduced by binning, and an almost constant signal-to-noise value at all coverage values, similar to the one at large coverage:

[38]:
plt.figure()
sp.MDafter(sA, sB, sK, ylim=[-4, 4])
[38]:
(0.5107324222723643, 49.999999999999986)

Matrices have been rebinned, and the characteristic sampling noise present at small coverages has now been smoothed, while the crisp signal at large coverages that conveys the precious biological variations is preserved:

[39]:
fig = plt.figure();
ax1 = fig.add_subplot(1, 2, 1)
sp.mshow(sA, subplot=ax1)
ax2 = fig.add_subplot(1, 2, 2)
sp.mshow(sB, subplot=ax2)
[39]:
<matplotlib.image.AxesImage at 0x7fea6adc91d0>

Checking the differential analysis

Similarly, we improved the differential analysis, before the binning, we could have obtained this kind of results:

Before binning:

[40]:
plt.figure()
np.warnings.filterwarnings('ignore')
D = np.log2(B/A)
sp.dshow(D, trend)

Now,

After binning:

[41]:
plt.figure()
sp.dshow(sK, trend)
[27]:
from scipy.ndimage import gaussian_filter
sA, sB, sK = sp.serpentin_binning(A, B, threshold, threshold / 5)
[ ]:

[ ]: