Serpentine binning tutorial¶
This tutorial aims at demonstrating use cases and for improving Hi-C contact maps with distribution-aware binning and documenting readers with the implementation. For a detailed step-by-step analysis, see the serpentine notebook.
Testing¶
Run the following:
serpentine --test --threshold 30 --min-threshold 3 --size 100
This randomly generates two 100x100 matrices of pixels between 0 and 10, binning both such that serpentines are not below 3 on average value in each of them and not below 30 in total value. It then plots both matrices as well as their log-ratio, binned and unbinned. Tweak the parameters to see how the matrices evolve.
Binning prepared Hi-C datasets¶
Run the following:
wget https://github.com/koszullab/serpentine/blob/master/demos/A.csv
wget https://github.com/koszullab/serpentine/blob/master/demos/B.csv
serpentine --inputs A.csv B.csv
This performs the same binning on prepared datasets from Escherichia coli. Currently supported formats for Hi-C datasets generated with DADE. They are essentially triangular, dense matrix files with the first row and the first column being used to indicate genomic position.
Using the library¶
You can directly use the library’s functions if you are already manipulating numpy contact maps in your analysis. Open a Python 3 console, and run the following:
import numpy as np
from serpentine import serpentine_binning
from matplotlib import pyplot as plt
matrix3 = np.loadtxt("https://github.com/koszullab/serpentine/blob/master/demos/A.csv", dtype=np.float64)
matrix4 = np.loadtxt("https://github.com/koszullab/serpentine/blob/master/demos/B.csv", dtype=np.float64)
_, _, binned_matrix3, binned_matrix4, log_ratio, _ = serpentine_binning(matrix3, matrix4)
plt.imshow(log_ratio, cmap='seismic')
plt.show()
This is useful if your contact maps come from other sources and aren’t in a DADE format.