Datatrack Creation

Core Functions:

GrapHiC.Datatrack_creation.evaluate_tracks_over_cooler_bins(cooler_path: str, paths: List = [], names: List = [], stats_types: List[str] = ['max'], allowed_chroms: List = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX'], value_col: int = 3, region_cols: Tuple[int] = (1, 2), chrom_col: int = 0, verbose: bool = True) pandas.core.frame.DataFrame[source]

Evaluate multiple tracks over all bins within a cooler object and return the results in a Pandas dataframe

Parameters
  • cooler_path (str) – path to cooler file

  • paths (List) – List of paths to (multiple) bigwig or BED files

  • names (List) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames

  • stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std

  • allowed_chroms (List) – List of chromosomes to retrieve from the cooler file

  • value_col (int) – Which collumn to collect values from in provided BED files (default = 3) (Note: zero-indexing is assumed)

  • region_cols (tuple) – Tuple detailing which columns (zero-indexed) to collect the region information from in provided BED files (default = (1,2))

  • chrom_col (int) – Which collumn to collect chromosome information from in provided BED files (default = 0) (Note: zero-indexing is assumed)

  • verbose (bool) – Whether to print progress/names etc.

Returns

Dataframe detailing evaluated tracks/statistics over cooler bins

Return type

pd.DataFrame

GrapHiC.Datatrack_creation.evaluate_tracks_over_bed_dataframe(df: pandas.core.frame.DataFrame, paths: List[str] = [], names: List[str] = [], stats_types: List[str] = ['max'], value_col: int = 3, region_cols: tuple = (1, 2), chrom_col: int = 0, verbose: bool = True)[source]

Evaluate multiple BED or bigwig style tracks over an arbitrary BED style dataframe in which the 0th column details the chromosome and the 1nd and 2nd column detail the regions. Contrary to evaluate_tracks_over_cooler_bins, this instead returns both new column names and a value array which can then be appended to the original BED-style dataframe for for ease of access later.

Parameters
  • df (pd.DataFrame) – BED style DataFrame

  • paths (List) – List of paths to (multiple) bigwig or BED files

  • names (List) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames

  • stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std

  • value_col (int) – Which collumn to collect values from in provided BED files (default = 3) (Note: zero-indexing is assumed)

  • region_cols (tuple) – Tuple detailing which columns (zero-indexed) to collect the region information from in provided BED files (default = (1,2))

  • chrom_col (int) – Which collumn to collect chromosome information from in provided BED files (default = 0) (Note: zero-indexing is assumed)

  • verbose (bool) – Whether to print progress/names etc.

Returns

list of column names of length len(paths) and a value array of shape (df.shape[0],len(paths))

Return type

list, array

Utilities and Special Usage:

GrapHiC.Datatrack_creation.evaluate_bigwigs_over_cooler_bins(cooler_path: str, bwpaths: List[str] = [], names: List[str] = [], stats_types: List[str] = ['max'], allowed_chroms: List = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX'], verbose: bool = True) pandas.core.frame.DataFrame[source]

Evaluate multiple bigwigs over all bins within a cooler object and return the results in a Pandas dataframe

Parameters
  • cooler_path (str) – path to cooler file

  • bwpaths (List[str]) – List of paths to (multiple) bigwig files

  • names (List[str]) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames

  • stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std

  • allowed_chroms (List) – List of chromosomes to retrieve from the cooler file

  • verbose (bool) – Whether to print progress/names etc.

Returns

Dataframe detailing evaluated tracks/statistics over cooler bins

Return type

pd.DataFrame

GrapHiC.Datatrack_creation.evaluate_dtrvp_over_cooler_bins(cooler_path: str, bedpaths: List[str] = [], names: List[str] = [], stats_types: List[str] = ['max'], value_cols: List = [], region_cols: List = [], chrom_cols: List = [], allowed_chroms: List = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX'], verbose: bool = True) pandas.core.frame.DataFrame[source]

Evaluate multiple BED (dtrvp = datatrack region-value-pairs) style tracks over all bins within a cooler object and return the results in a Pandas dataframe

Parameters
  • cooler_path (str) – path to cooler file

  • bedpaths (List) – List of paths to (multiple) BED files

  • names (List) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames

  • stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std

  • value_cols (int) – List detailing which collumns to collect values from in provided BED files (Note: zero-indexing is assumed and different value columns can be specified per BED file unlike evaluate_tracks_over_cooler_bins)

  • region_cols (tuple) – List of tuples detailing which columns (zero-indexed) to collect the region information from in provided BED files. Different region columns can be specified per BED file unlike evaluate_tracks_over_cooler_bins.

  • chrom_cols (int) – List detailing which collumn to collect chromosome information from in provided BED files (Note: zero-indexing is assumed and different chromosome columns can be specified per BED file unlike evaluate_tracks_over_cooler_bins)

  • allowed_chroms (List) – List of chromosomes to retrieve from the cooler file

  • verbose (bool) – Whether to print progress/names etc.

Returns

Dataframe detailing evaluated tracks/statistics over cooler bins

Return type

pd.DataFrame

GrapHiC.Datatrack_creation.evaluate_bigwigs_over_bed_dataframe(df: pandas.core.frame.DataFrame, bwpaths: List = [], names: List = [], stats_types: List[str] = ['max'], verbose: bool = True)[source]

Evaluate multiple bigwig style tracks over an arbitrary BED style dataframe in which the 0th column details the chromosome and the 1nd and 2nd column detail the regions. Contrary to evaluate_tracks_over_cooler_bins, this instead returns both new column names and a value array which can then be appended to the original BED-style dataframe for for ease of access later.

Parameters
  • df (pd.DataFrame) – BED style DataFrame

  • bwpaths (List) – List of paths to (multiple) bigwig files

  • names (List) – List of names to associated with each datatrack provided with path. If the length of the names list doesn’t equal the length of the paths list then the function instead assigns names based on filenames

  • stats_types (List[str]) – List of statistics to collect over each bin. Allowed statistics are: mean, max, min, sum, coverage, std

  • verbose (bool) – Whether to print progress/names etc.

Returns

list of column names of length len(paths) and a value array of shape (df.shape[0],len(paths))

Return type

list, array

GrapHiC.Datatrack_creation.cooler_bin_info(cooler_path: str, allowed_chroms: List = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX']) Tuple[source]

Retrieve bin-level information from a cooler object

Parameters
  • cooler_path (str) – path to cooler file

  • allowed_chroms (List) – List of chromosomes to retrieve from the cooler file

Returns

Tuple containing a Dictionary, chrom_binregs, of regions associated with each bin; a Dictionary, chrom_stats, of cooler indices associated with each bin; an binsize

Return type

tuple