plotCI_meCpG.Rd
Plots methylation beta values in the form of error bars. For every group of samples, the mean methylation value is plotted together with the confidence interval. Depending on the function parameters, CpG sites represented in the plot can be plotted equidistantly or their position on the x-axis can represent their true genomic coordinates. In the latter case, additional information can be included in the plot such as position of genes and exons.
plotCI_meCpG( beta_values, sample_groups, cg_list = NULL, chr = NULL, first_coord = NULL, last_coord = NULL, add_lines = TRUE, plot_cg = FALSE, plot_cgi = FALSE, plot_gene = FALSE, plot_exon = FALSE, transcript_types = c("NM", "NR"), col_groups = NULL, col_cg = "grey77", col_gene = "cyan4", col_cgi = "grey77", title = "", x_label = "", y_label = "", text_cex = 1, beta_min = 0, beta_max = 1, error_size = 1, cg_names = FALSE, cg_names_expand = NA, plot_legend = FALSE, legend_cex = 1, plot_outside = TRUE )
beta_values | Data frame with methylation beta values. CpG sites should be listed in the rows with the row names being the Illumina CpG identifiers. Samples should be listed in the columns. |
---|---|
sample_groups | Data frame which defines the sample groups. It has to have two columns, the first column with sample IDs and the second column with the corresponding group. |
cg_list | List of CpG sites whose methylation values should be presented in the plot. If provided, CpGs will be plotted equidistantly on the x-axis in the given order. |
chr | Chromosome on which the plotted CpG sites are found. In the format `1`:`22` or `"X"` or `"Y"`. |
first_coord | The first coordinate to be represented in the plot. |
last_coord | The last coordinate to be represented in the plot. |
add_lines | Whether to connect the error bars. If `TRUE`, mean value points will be connected with the lines. Default to `TRUE`. |
plot_cg | Whether to plot the position of all CpG sites on x-axis. Defaults to `FALSE`. |
plot_cgi | Whether to plot the position of CpG island on x-axis. Defaults to `FALSE`. |
plot_gene | Whether to plot the position of genes/transcripts on x-axis. Defaults to `FALSE`. |
plot_exon | Whether to plot the position of exons on x-axis. Defaults to `FALSE`. |
transcript_types | Genes/transcripts to be plotted. `"NM"` stands for protein-coding and `"NR"` stands for non-protein coding transcripts. By default, both types of transcripts are included. |
col_groups | Color used for the error bars. It can be given as a list of colors of the same length as the number of groups. Alternatively, it can be given as a data frame with sample groups in the first column and the corresponding color in the second column. |
col_cg | Color of the other CpG sites. Used only if `plot_cg` is `TRUE`. Default color is `grey77`. |
col_gene | Color of the genes/exons. Used only if `plot_genes` or `plot_exons` is `TRUE`. Default color is `"cyan4"`. |
col_cgi | Color of the CpG islands. Used only if `plot_cgi` is `TRUE`. Default color is `"grey77"`. |
title | Plot title. By default, omitted. |
x_label | X-axis label. By default, omitted. |
y_label | Y-axis label. By default, omitted. |
text_cex | Numeric character expansion factor, used for all displayed text. Defaults to `1`. |
beta_min | Minimum beta value to be displayed. Has to be in `[0,1>` range. Defaults to `0`. |
beta_max | Maximum beta value to be displayed. Has to be in `<0,1]` range. Defaults to `1`. |
error_size | Number of standard deviations which are to be plotted below and above the mean. Defaults to `1`. |
cg_names | Whether to add names of the plotted CpG sites below the x-axis. Defaults to `FALSE`. |
cg_names_expand | Expand factor for the `cg_names`. If the labels are cluttered, the expand factor could be set to separate the labels. Expand factor defines the minimum distance between two consecutive labels and is expressed as a percentage of the distance between the first and the last label. Optimal separation is usually achieved for values close to `1`. For the default `NA` value, labels are not expanded. |
plot_legend | Whether to plot the legend. Default to `FALSE`. |
legend_cex | Numeric expansion factor, used for all elements of the legend. Defaults to `1`. |
plot_outside | Whether error plot should be plotted outside of the plotting area. Plotting area is determined with `beta_min` and `beta_max`. Defaults to `TRUE`. |
Vector with names of plotted CpG sites, in the same order as in the plot.
Plots error bars and confidence intervals for beta values of given CpGs. If a list of CpGs is given, they are plotted equidistantly. If a range of coordinates and a chromosome are given, then all CpGs inside the range are plotted and positioned on x-axis based on their chromosomal coordinate (from 5' end to 3' end). If the latter case, additional information can be displayed on the x-axis such as positions of unmeasured CpGs, positions of CpG islands and/or positions of genes and exons. If we are specifically interested in protein-coding or non-protein-coding genes, we can accordingly set the values of parameter `transcript_types`. The color for each genomic element can be changed with `col_cg`, `col_cgi` and `col_gene` parameters.
For error bars, the default plotted confidence interval represents distance of one standard deviation from the mean. This can be changed with `error_size` parameter. If we are interested only in certain beta values, we can restrict the y-axis with `beta_min` and `beta_max` parameters. Error bars are separately plotted for each group of samples and the color of each group is set with `col_groups` parameter. Groups of samples are defined in `sample_groups`. To better observe the methylation patterns, lines which connect mean methylation values can be added with `add_lines`.
Plotting area can be customized with `title`, `x_label` and `y_label`. Below the x-axis, names of CpG sites can be displayed setting `cg_names` to `TRUE`. In cg names are cluttered, we can expand them setting `cg_names_expand` factor to an appropriate numerical value. Usually this value is a number close to `1`. An increase of the expand factor will result in more separate labels. For the optimal separation, the best strategy is to try several different values until the right value is found. The size of all displayed text is regulated simultaneously with `text_cex` parameter.
The legend can be added to the plot with `plot_legend` parameter. There are two parts of the legend. Color legend for sample groups and the legend which explains the meaning of plotted genomic elements. The size of the whole legend is regulated with `legend_cex` parameter.
In some cases, error bars have values outside the `beta_min`-`beta_max` range. If `plot_outside` is `FALSE`, these values will be trimmed. Otherwise, they will be plotted. If `beta_min` is `0` and `beta_max` is `1`, there still may be values outside of the range. This happens for methylation values which are very close to `0` or `1` since the confidence interval is wider than the real methylation values. In this case, it is recommended for `plot_outside` to be set to `TRUE`.