meNet_singleCGI.Rd
Builds a network of CpGs for a single CpG island (CGI). For a given CGI, all CpGs associated with the island are nodes in the network. Edges are based on the correlation which is provided either directly as correlation matrix of CpGs or as a data frame with CpGs in columns and variables in rows. Different methods can be used to decide which edges are kept in the network: "full", "clust" or "twoLyr_clust". For explanation, see details. Resulting network can be weighted in which case weights are distances between CpGs expressed as base pair distance.
meNet_singleCGI( cg_island, link_method = "twoLyr_clust", weighted = TRUE, cor_matrix = NULL, data = NULL, cor_normalization_fun = max_normalization, dist_normalization_fun = neg_max_normalization, cor_threshold = 0.2, neg_cor_threshold = NULL, cor_stDev = NULL, cor_alpha = NULL, n_repetitions = 1000, alternative = "two_sided", infomap_call = "infomap", folder = "./meNet/", file_basename = "meNet_CGI_infomap", relaxation_rate = 0.15, cg_meta = data("CpG_anno450K", package = "meNet"), cg_meta_cols = list(cg_id = "IlmnID", cg_coord = "MAPINFO", island_name = "UCSC_CpG_Islands_Name", island_region = "Relation_to_UCSC_CpG_Island"), include_regions = c(), check_matrices = TRUE, delete_files = FALSE )
cg_island | Name of a CpG island. |
---|---|
link_method | Method used to determine the edges of the network. See details. Default value is "twoLyr_clust". |
weighted | Whether the resulting network will be weighted. If TRUE, the weights are base pair distances between CpGs. Defaults to TRUE. |
cor_matrix | Correlation matrix of CpG sites. |
data | Data frame with CpGs in columns. Variables in rows are used to calculate `cor_matrix`. |
cor_normalization_fun | Normalization function for the correlation layer. Default method is `max_normalization` function. |
dist_normalization_fun | Normalization function for the distance layer. Default method is `neg_max_normalization` function. |
cor_threshold | Correlation threshold. Defaults to `0.2`. |
neg_cor_threshold | Negative correlation threshold. Defaults to `NULL`. |
cor_stDev | Threshold for the standard deviation of correlation. Defaults to `NULL`. |
cor_alpha | Significance level of the correlation permutation test. Defaults to `NULL`. |
n_repetitions | Number of repetitions for resampling and/or for the correlation permutation test. Defaults to `1000`. |
alternative | Alternative hypothesis for the correlation permutation test. Default value is `"two_sided"`. |
infomap_call | Path to the `Infomap` on user's system. Default value is `"infomap"`. |
folder | Folder in which the files will be saved. It is automatically created if not already present. Default value is `"./meNet/"`. |
file_basename | Base name of the created files. Default value is `"meNet_CGI_infomap"`. |
relaxation_rate | `multilayer-relax-rate` parameter in `Infomap` call. Defaults to `0.15`. |
cg_meta | Data frame which defines relationship between CpG sites and CpG islands. It has CpGs in the rows and their description in the columns. If CpG is part of an CGI, `cg_meta` gives the name of the island and the region of an island in which the CpG is found. Additionally, if base pair distances are being calculated, `cg_meta` also holds chromosomal coordinate of CpGs. By default, function uses `CpG_anno450K` data frame which contains Illumina Infinium HumanMethylation450 manifest file. |
cg_meta_cols | Named list with `cg_meta` column names. The list must include: `cg_id` naming the column with unique CpG names, `island_name` naming the column with CGI names and `island_region` naming the column with the CGI region in which the CpG is located. If base pair distances are being calculated, it should also include `cg_coord` naming the column with chromosomal coordinates. Default value shouldn't be changed if `cg_meta` keeps it default value. |
include_regions | CGI regions which should be searched for CpGs besides the island itself. Can take any subset of values `"S_Shelf"`, `"S_Shore"`, `"N_Shore"`, `"N_Shelf"`. By default, only the CpGs inside the island are reported. |
check_matrices | Whether to check the validity of `cg_meta`, `cg_meta_cols`, `cor_matrix` and `data`. `cor_matrix` and `data` are checked only if provided. Defaults to `TRUE`. Change this parameter only if called within other function which already preforms the check. |
delete_files | Should created files be automatically deleted from the user's system. Defaults to `FALSE`. If `save_all_files` is `TRUE` then its value is automatically changed to `FALSE.` Changing the parameter to `TRUE` should be done with caution since it will allow the function to delete files from user's system. |
For the “full” method, the full network is kept. For the other methods clustering of CpGs is performed and only the CpGs in the same community are connected with edges. For the “clust” method, Infomap clustering is used on the correlation layer while for the “twoLyr_clust”, Infomap clustering is performed on a 2-layer correlation-distance multiplex .
For speed, instead of the whole correlation matrix only a submatrix with targer CpGs can be given. This may be useful if the function is called multiple times in a loop.
Infomap (De Domenico et al. 2015)
De Domenico M, Lancichinetti A, Arenas A, Rosvall M (2015). “Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems.” Physical Review X, 5(1), 011027.