Datatrack Creation¶

Core Functions:¶

GrapHiC.Datatrack_creation.evaluate_tracks_over_cooler_bins(cooler_path: str, paths: List = [], names: List = [], stats_types: List[str] = ['max'], allowed_chroms: List = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX'], value_col: int = 3, region_cols: Tuple[int] = (1, 2), chrom_col: int = 0, verbose: bool = True) → pandas.core.frame.DataFrame[source]¶

Evaluate multiple tracks over all bins within a cooler object and return the results in a Pandas dataframe

Parameters

cooler_path (str) – path to cooler file
paths (List) – List of paths to (multiple) bigwig or BED files
names (List) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames
stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std
allowed_chroms (List) – List of chromosomes to retrieve from the cooler file
value_col (int) – Which collumn to collect values from in provided BED files (default = 3) (Note: zero-indexing is assumed)
region_cols (tuple) – Tuple detailing which columns (zero-indexed) to collect the region information from in provided BED files (default = (1,2))
chrom_col (int) – Which collumn to collect chromosome information from in provided BED files (default = 0) (Note: zero-indexing is assumed)
verbose (bool) – Whether to print progress/names etc.

Returns

Dataframe detailing evaluated tracks/statistics over cooler bins

Return type

pd.DataFrame

GrapHiC.Datatrack_creation.evaluate_tracks_over_bed_dataframe(df: pandas.core.frame.DataFrame, paths: List[str] = [], names: List[str] = [], stats_types: List[str] = ['max'], value_col: int = 3, region_cols: tuple = (1, 2), chrom_col: int = 0, verbose: bool = True)[source]¶

Evaluate multiple BED or bigwig style tracks over an arbitrary BED style dataframe in which the 0th column details the chromosome and the 1nd and 2nd column detail the regions. Contrary to evaluate_tracks_over_cooler_bins, this instead returns both new column names and a value array which can then be appended to the original BED-style dataframe for for ease of access later.

Parameters

df (pd.DataFrame) – BED style DataFrame
paths (List) – List of paths to (multiple) bigwig or BED files
names (List) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames
stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std
value_col (int) – Which collumn to collect values from in provided BED files (default = 3) (Note: zero-indexing is assumed)
region_cols (tuple) – Tuple detailing which columns (zero-indexed) to collect the region information from in provided BED files (default = (1,2))
chrom_col (int) – Which collumn to collect chromosome information from in provided BED files (default = 0) (Note: zero-indexing is assumed)
verbose (bool) – Whether to print progress/names etc.

Returns

list of column names of length len(paths) and a value array of shape (df.shape[0],len(paths))

Return type

list, array

Utilities and Special Usage:¶

GrapHiC.Datatrack_creation.evaluate_bigwigs_over_cooler_bins(cooler_path: str, bwpaths: List[str] = [], names: List[str] = [], stats_types: List[str] = ['max'], allowed_chroms: List = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX'], verbose: bool = True) → pandas.core.frame.DataFrame[source]¶

Evaluate multiple bigwigs over all bins within a cooler object and return the results in a Pandas dataframe

Parameters

cooler_path (str) – path to cooler file
bwpaths (List[str]) – List of paths to (multiple) bigwig files
names (List[str]) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames
stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std
allowed_chroms (List) – List of chromosomes to retrieve from the cooler file
verbose (bool) – Whether to print progress/names etc.

Returns

Dataframe detailing evaluated tracks/statistics over cooler bins

Return type

pd.DataFrame

GrapHiC.Datatrack_creation.evaluate_dtrvp_over_cooler_bins(cooler_path: str, bedpaths: List[str] = [], names: List[str] = [], stats_types: List[str] = ['max'], value_cols: List = [], region_cols: List = [], chrom_cols: List = [], allowed_chroms: List = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX'], verbose: bool = True) → pandas.core.frame.DataFrame[source]¶

Evaluate multiple BED (dtrvp = datatrack region-value-pairs) style tracks over all bins within a cooler object and return the results in a Pandas dataframe

Parameters

cooler_path (str) – path to cooler file
bedpaths (List) – List of paths to (multiple) BED files
names (List) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames
stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std
value_cols (int) – List detailing which collumns to collect values from in provided BED files (Note: zero-indexing is assumed and different value columns can be specified per BED file unlike evaluate_tracks_over_cooler_bins)
region_cols (tuple) – List of tuples detailing which columns (zero-indexed) to collect the region information from in provided BED files. Different region columns can be specified per BED file unlike evaluate_tracks_over_cooler_bins.
chrom_cols (int) – List detailing which collumn to collect chromosome information from in provided BED files (Note: zero-indexing is assumed and different chromosome columns can be specified per BED file unlike evaluate_tracks_over_cooler_bins)
allowed_chroms (List) – List of chromosomes to retrieve from the cooler file
verbose (bool) – Whether to print progress/names etc.

Returns

Dataframe detailing evaluated tracks/statistics over cooler bins

Return type

pd.DataFrame

GrapHiC.Datatrack_creation.evaluate_bigwigs_over_bed_dataframe(df: pandas.core.frame.DataFrame, bwpaths: List = [], names: List = [], stats_types: List[str] = ['max'], verbose: bool = True)[source]¶

Evaluate multiple bigwig style tracks over an arbitrary BED style dataframe in which the 0th column details the chromosome and the 1nd and 2nd column detail the regions. Contrary to evaluate_tracks_over_cooler_bins, this instead returns both new column names and a value array which can then be appended to the original BED-style dataframe for for ease of access later.

Parameters

df (pd.DataFrame) – BED style DataFrame
bwpaths (List) – List of paths to (multiple) bigwig files
names (List) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames
stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std
verbose (bool) – Whether to print progress/names etc.

Returns

list of column names of length len(paths) and a value array of shape (df.shape[0],len(paths))

Return type

list, array

GrapHiC.Datatrack_creation.cooler_bin_info(cooler_path: str, allowed_chroms: List = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX']) → Tuple[source]¶

Retrieve bin-level information from a cooler object

Parameters

cooler_path (str) – path to cooler file
allowed_chroms (List) – List of chromosomes to retrieve from the cooler file

Returns

Tuple containing a Dictionary, chrom_binregs, of regions associated with each bin; a Dictionary, chrom_stats, of cooler indices associated with each bin; an binsize

Return type

tuple