Data frame with description of both protein-coding and non-protein-coding transcripts including positions of exons. Transcript name starts with `"NM"` for protein-coding and with `"NR"` for non-protein coding transcripts. It is downloaded as `“UCSC RefSeq (refGene)”` table from the UCSC Table Browser for assembly `“Feb. 2009 (GRCh37/hg19)”` and group `“Genes and Gene Predictions”`.

UCSC_genes

Format

A data frame with 78288 rows and variables:

name

Name of gene/transcript.

chrom

Chromosome containing the gene.

strand

Genome strand on which the gene is located.

txStart

Transcription start position (or end position for minus strand).

txEnd

Transcription end position (or start position for minus strand).

cdsStart

Coding region start position (or end position for minus strand).

cdsEnd

Coding region end position (or start position for minus strand).

exonCount

Number of exons.

exonStarts

Exon start positions (or end positions for minus strand).

exonEnds

Exon end position (or start positions for minus strand).

Source

http://genome.ucsc.edu/cgi-bin/hgTables