| Title: | The tRNA Adaptation Index |
|---|---|
| Description: | Functions and example files to calculate the tRNA adaptation index, a measure of the level of co-adaptation between the set of tRNA genes and the codon usage bias of protein-coding genes in a given genome. The methodology is described in dos Reis, Wernisch and Savva (2003) <doi:10.1093/nar/gkg897>, and dos Reis, Savva and Wernisch (2004) <doi:10.1093/nar/gkh834>. |
| Authors: | Mario dos Reis [aut, cre] |
| Maintainer: | Mario dos Reis <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.2.2 |
| Built: | 2026-05-14 05:12:41 UTC |
| Source: | https://github.com/mariodosreis/tai |
A list with elements trna, a vector of length 64 of tRNA gene copy numbers
in the Escherichia coli K-12 genome, w, a data frame with some codon bias
statistics for 49 E. coli K-12 coding genes, and m, a 49 by 61 matrix of
codon frequencies for the 49 genes in question.
ecolik12ecolik12
An object of class list of length 3.
Mario dos Reis
# 87 tRNA genes in the E. coli K-12 genome: sum(ecolik12$trna) # Two copies are isoacceptors for Phe, with anticodon GAA (codon TTC) ecolik12$trna[2] # ecolik12$w, a data frame with codon bias statistics names(ecolik12$w) # Effective number of codons vs. gene length (in codons) plot(ecolik12$w$Nc, ecolik12$w$L_aa, xlab="Nc", ylab="Gene length")# 87 tRNA genes in the E. coli K-12 genome: sum(ecolik12$trna) # Two copies are isoacceptors for Phe, with anticodon GAA (codon TTC) ecolik12$trna[2] # ecolik12$w, a data frame with codon bias statistics names(ecolik12$w) # Effective number of codons vs. gene length (in codons) plot(ecolik12$w$Nc, ecolik12$w$L_aa, xlab="Nc", ylab="Gene length")
Calculates the correlation between tAI and Nc (adjusted for GC content at third codon positions).
get.s(tAI, nc, gc3)get.s(tAI, nc, gc3)
tAI |
a vector of length n with tAI values for genes |
nc |
a vector of length n with Nc values for genes |
gc3 |
a vector of length n with GC content at third codon positions for genes |
Numeric of length one with the correlation between tAI and Nc adjusted
Mario dos Reis
Calculates the tRNA adaptation index (tAI) of dos Reis et al. (2003, 2004).
get.tai(x, w)get.tai(x, w)
x |
an n by 60 matrix of codon frequencies for n open reading frames. |
w |
a vector of length 60 of relative adaptiveness values for codons. |
The tRNA adaptation index (tAI) is a measure of the level of co-adaptation between the set of tRNA genes and the codon usage bias of protein-coding genes in a given genome. STOP and methionine codons are ignored. The standard genetic code is assumed.
A vector of length n of tAI values.
Mario dos Reis
dos Reis M., Wernisch L., and Savva R. (2003) Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res., 31: 6976–85.
dos Reis M., Savva R., and Wernisch L. (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res., 32: 5036–44.
# Calculate relative adaptiveness values (ws) for E. coli K-12 eco.ws <- get.ws(tRNA=ecolik12$trna, sking=1) # Calculate tAI for a set of 49 E. coli K-12 coding genes eco.tai <- get.tai(ecolik12$m[,-33], eco.ws) # Plot tAI vs. effective number of codons (Nc) plot(eco.tai, ecolik12$w$Nc, xlab="tAI", ylab="Nc")# Calculate relative adaptiveness values (ws) for E. coli K-12 eco.ws <- get.ws(tRNA=ecolik12$trna, sking=1) # Calculate tAI for a set of 49 E. coli K-12 coding genes eco.tai <- get.tai(ecolik12$m[,-33], eco.ws) # Plot tAI vs. effective number of codons (Nc) plot(eco.tai, ecolik12$w$Nc, xlab="tAI", ylab="Nc")
Calculates the relative adaptiveness values of codons based on the number of tRNA genes.
get.ws(tRNA, s = NULL, sking)get.ws(tRNA, s = NULL, sking)
tRNA |
a vector of length 64 with tRNA gene copy numbers |
s |
a vector of length 9 with selection penalties for codons |
sking |
a vector of length 1 indicating the superkingdom |
The relative adaptiveness values are calculated as described in
dos Reis et al. (2003, 2004). If s = NULL, the s values are set to
the optimised values of dos Reis et al. (2004). sking indicates the
superkingdom, with 0 indicating Eukaryota, and 1 Prokaryota.
A vector of length 60 of relative adaptiveness values.
Mario dos Reis
dos Reis M., Wernisch L., and Savva R. (2003) Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res., 31: 6976–85.
dos Reis M., Savva R., and Wernisch L. (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res., 32: 5036–44.
eco.ws <- get.ws(tRNA=ecolik12$trna, sking=1)eco.ws <- get.ws(tRNA=ecolik12$trna, sking=1)
The adjusted Nc is f(gc3s) - Nc
nc.adj(nc, gc3)nc.adj(nc, gc3)
nc |
a vector of length n with the effective number of codons for genes |
gc3 |
a vector of length n with corresponding GC composition at third codon positions |
The adjusted Nc is calculated as described in dos Reis et al. (2004).
A vector of length n with adjusted Nc values
Mario dos Reis
dos Reis M., Savva R., and Wernisch L. (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res., 32: 5036–44.
nc.f for the function used to calculate f(gc3s)
eco.ncadj <- nc.adj(ecolik12$w$Nc, ecolik12$w$GC3s) plot(eco.ncadj ~ ecolik12$w$Nc, xlab="Nc", ylab="Nc adjusted")eco.ncadj <- nc.adj(ecolik12$w$Nc, ecolik12$w$GC3s) plot(eco.ncadj ~ ecolik12$w$Nc, xlab="Nc", ylab="Nc adjusted")
Calculates the expected Nc value of a gene for a given GC content at the third codon positions.
nc.f(x)nc.f(x)
x |
a vector of GC contents at third codon positions |
Without selection on codon bias, the expected value of Nc as a function of GC content at third positions, x, is given by
This equation follows dos Reis et al. (2004, see also Wright 1990 for the original).
A vector of Nc values for the given GC contents.
Mario dos Reis
Wright F. (1990) The 'effective number of codons' used in a gene. Gene, 87: 23–9.
dos Reis M., Savva R., and Wernisch L. (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res., 32: 5036–44.
curve(nc.f(x), xlab="GC3s content", ylab="Nc") points(ecolik12$w$GC3s, ecolik12$w$Nc, pch=19)curve(nc.f(x), xlab="GC3s content", ylab="Nc") points(ecolik12$w$GC3s, ecolik12$w$Nc, pch=19)
Calculates the p-value (using a Monte Carlo or randomisation test) that the correlation (the S value) between tAI and the adjusted Nc for a set of genes is different from zero.
ts.test(m, ws, nc, gc3s, ts.obs, samp.size, n = 1000)ts.test(m, ws, nc, gc3s, ts.obs, samp.size, n = 1000)
m |
a k by 60 matrix of codon frequencies for k genes |
ws |
vector of length 60 of relative adaptiveness values of codons |
nc |
vector of length k of Nc values for genes |
gc3s |
vector of length k of GC content at third codon position for genes |
ts.obs |
vector of length 1 with observed correlation between tAI and Nc adjusted for the k genes |
samp.size |
a vector of length 1 with the number of genes to be sampled from m (see details) |
n |
the number of permutations of ws in the randomisation test |
The Monte Carlo test is described in dos Reis et al. (2004). When
working with complete genomes, matrix m can have a very large number
of rows (large k). In this case it may be advisable to choose samp.size
< k to speed up the computation.
A list with elements p.value, the p-value for the test, and
ts.simulated, a vector of length n with the simulated
correlations between tAI and adjusted Nc.
Mario dos Reis
dos Reis M., Savva R., and Wernisch L. (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res., 32: 5036–44.
eco.ws <- get.ws(tRNA=ecolik12$trna, sking=1) eco.tai <- get.tai(ecolik12$m[,-33], eco.ws) ts.obs <- get.s(eco.tai, ecolik12$w$Nc, ecolik12$w$GC3s) # The S-value (dos Reis et al. 2004): ts.obs # [1] 0.9065442 # There seems to be a high correlation between tAI and Nc adjusted for # the 49 genes in ecolik12$m. Is the correlation statistically significant? ts.mc <- ts.test(ecolik12$m[,-33], eco.ws, ecolik12$w$Nc, ecolik12$w$GC3s, ts.obs, samp.size=dim(ecolik12$m)[1]) # The p-value is zero: ts.mc$p.value # [1] 0 # Histogram of simulated S-values: hist(ts.mc$ts.simulated, n=50, xlab = "Simulated S values", xlim=c(min(ts.mc$ts.simulated), ts.obs)) # Add the observed S-value as a red vertical line: abline(v=ts.obs, col="red")eco.ws <- get.ws(tRNA=ecolik12$trna, sking=1) eco.tai <- get.tai(ecolik12$m[,-33], eco.ws) ts.obs <- get.s(eco.tai, ecolik12$w$Nc, ecolik12$w$GC3s) # The S-value (dos Reis et al. 2004): ts.obs # [1] 0.9065442 # There seems to be a high correlation between tAI and Nc adjusted for # the 49 genes in ecolik12$m. Is the correlation statistically significant? ts.mc <- ts.test(ecolik12$m[,-33], eco.ws, ecolik12$w$Nc, ecolik12$w$GC3s, ts.obs, samp.size=dim(ecolik12$m)[1]) # The p-value is zero: ts.mc$p.value # [1] 0 # Histogram of simulated S-values: hist(ts.mc$ts.simulated, n=50, xlab = "Simulated S values", xlim=c(min(ts.mc$ts.simulated), ts.obs)) # Add the observed S-value as a red vertical line: abline(v=ts.obs, col="red")