Serpentine binning tutorial

This tutorial aims at demonstrating use cases and for improving Hi-C contact maps with distribution-aware binning and documenting readers with the implementation. For a detailed step-by-step analysis, see the serpentine notebook.

Dependencies

Python 3 with the following libraries:

  • numpy
  • matplotlib

Testing

Run the following:

serpentine --test --threshold 30 --min-threshold 3 --size 100

This randomly generates two 100x100 matrices of pixels between 0 and 10, binning both such that serpentines are not below 3 on average value in each of them and not below 30 in total value. It then plots both matrices as well as their log-ratio, binned and unbinned. Tweak the parameters to see how the matrices evolve.

Binning prepared Hi-C datasets

Run the following:

wget https://github.com/koszullab/serpentine/blob/master/demos/A.csv
wget https://github.com/koszullab/serpentine/blob/master/demos/B.csv
serpentine --inputs A.csv B.csv

This performs the same binning on prepared datasets from Escherichia coli. Currently supported formats for Hi-C datasets generated with DADE. They are essentially triangular, dense matrix files with the first row and the first column being used to indicate genomic position.

Using the library

You can directly use the library’s functions if you are already manipulating numpy contact maps in your analysis. Open a Python 3 console, and run the following:

import numpy as np
from serpentine import serpentine_binning
from matplotlib import pyplot as plt

matrix3 = np.loadtxt("https://github.com/koszullab/serpentine/blob/master/demos/A.csv", dtype=np.float64)
matrix4 = np.loadtxt("https://github.com/koszullab/serpentine/blob/master/demos/B.csv", dtype=np.float64)
_, _, binned_matrix3, binned_matrix4, log_ratio, _ = serpentine_binning(matrix3, matrix4)

plt.imshow(log_ratio, cmap='seismic')
plt.show()

This is useful if your contact maps come from other sources and aren’t in a DADE format.