Graph Creation

Graph Constructors:

GrapHiC.Graph_creation.from_regions(contacts: List[str], regions: Dict[str, numpy.ndarray], names: Optional[dict] = {}, balance: Optional[bool] = False, join: Optional[bool] = False, force_disjoint: Optional[bool] = False, backbone: Optional[bool] = True, record_cistrans_interactions: Optional[bool] = False, record_backbone_interactions: Optional[bool] = False, record_node_chromosome_as_onehot: Optional[bool] = False, add_self_loops: Optional[bool] = True, record_names: Optional[bool] = True, same_index: Optional[bool] = True, chromosomes: Optional[list] = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX']) Dict[source]

Computes a HiC Graph from a list of cooler files

Parameters
  • contacts – list of cooler files generated from Hi-C experiments. Cooler files don’t have to be indexed the same but they do have to all contain the chromosomes and regions being probed.

  • regions – Dictionary specifying chromosomes and regions to collect data over. Dictionary should contain chromosomes as keys and 2D integer numpy arrays as values.

  • names – Dictionary specifying the name associated with each region

  • balance – Optional boolean to determine whether returned weights should be balanced or not.

  • join – Optional boolean to determine whether trans (inter-region) interactions are included. If this option is True then we automatically compose the subgraphs into one big graph

  • force_disjoint – Optional boolean to determine whether to force the input regions to be disjoint regions.

  • backbone – Optional boolean to identify edges which make up the chromatin backbone and include this as an edge feature.

  • record_cistrans_interactions – Optional boolean to explicitely record cis (within chromosome) and trans (between chromosome) interactions within the edge_attributes

  • record_backbone_interactions – Optional boolean to explicitely record backbone interactions within the edge attributes

  • record_node_chromosome_as_onehot – Optional boolean to explicitely record node chromosome as a feature vector with chromosomes one hot encoded.

  • chromosomes – Optional list of chromosomes with which to perform the one-hot encoding

Returns

python dictionary in the style of a Pytorch Geometric data object but with additional parameters detailing the cooler bins assigned to each node. If used choose the join==False option then a python dictionary is returned detailing the graph structure per submitted region.

GrapHiC.Graph_creation.from_sites(contacts: List[str], sites: Dict[str, numpy.ndarray], names: Optional[dict] = {}, balance: Optional[bool] = False, join: Optional[bool] = True, record_cistrans_interactions: Optional[bool] = False, record_node_chromosome_as_onehot: Optional[bool] = False, record_names: Optional[bool] = True, same_index: Optional[bool] = True, chromosomes: Optional[list] = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX']) Tuple[source]

Computes a HiC Graph from a list of cooler files and some disjoint sites

Parameters
  • contacts – list of cooler files generated from Hi-C experiments. Cooler files don’t have to be indexed the same but they do have to all contain the chromosomes and regions being probed.

  • regions – Dictionary specifying chromosomes and regions to collect data over. Dictionary should contain chromosomes as keys and 2D integer numpy arrays as values.

  • names – Dictionary specifying the name associated with each region

  • balance – Optional boolean to determine whether returned weights should be balanced or not.

  • join – Optional boolean to determine whether trans (inter-region) interactions are included. If this option is True then we automatically compose the subgraphs into one big graph

  • force_disjoint – Optional boolean to determine whether to force the input regions to be disjoint regions.

  • backbone – Optional boolean to identify edges which make up the chromatin backbone and include this as an edge feature.

  • record_cistrans_interactions – Optional boolean to explicitely record cis (within chromosome) and trans (between chromosome) interactions within the edge_attributes

  • record_backbone_interactions – Optional boolean to explicitely record backbone interactions within the edge attributes

  • record_node_chromosome_as_onehot – Optional boolean to explicitely record node chromosome as a feature vector with chromosomes one hot encoded.

  • chromosomes – Optional list of chromosomes with which to perform the one-hot encoding

Returns

python dictionary in the style of a Pytorch Geometric data object but with additional parameters detailing the cooler bins assigned to each node. If used choose the join==False option then a python dictionary is returned detailing the graph structure per submitted region.

Adding Data:

GrapHiC.Graph_creation.add_binned_data_to_graphlist(graph_list: dict, binned_data: str, sep: str = '\t', index_col=0) None[source]

Given a graph encoded as a dictionary (such as that made from the output of from_regions or from_sites) and binned data within a pandas dataframe (such as that made using the Datatrack_creation submodule), populates the graph node features with the appropriate binned data.

Parameters
  • graph_list – List of dictionaries detailing a graph made either either from_regions or from_sites

  • binned_data (str) – Path to binned data such as that created using the Datatrack_creation module

  • sep (str) – Separation character within the binned data file. Defaults to “t”

  • index_col – Index column within the binned data file. Defaults to 0

GrapHiC.Graph_creation.add_binned_data_to_graph(graph: Dict, binned_data: str, sep: str = '\t', index_col=0) None[source]

Given a graph encoded as a dictionary (such as that made from the output of from_regions or from_sites) and binned data within a pandas dataframe (such as that made using the Datatrack_creation submodule), populates the graph node features with the appropriate binned data.

Parameters
  • graph (Dict) – Dictionary detailing a graph made either either from_regions or from_sites

  • binned_data (str) – Path to binned data such as that created using the Datatrack_creation module

  • sep (str) – Separation character within the binned data file. Defaults to ” “

  • index_col – Index column within the binned data file. Defaults to 0