-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathabout.html
212 lines (170 loc) · 14.2 KB
/
about.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
<!doctype html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate" />
<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Expires" content="0" />
<link rel="shortcut icon" href="favicon.ico">
<!-- Bootstrap CSS -->
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css" integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh" crossorigin="anonymous">
<script src="https://code.jquery.com/jquery-3.4.1.slim.min.js" integrity="sha384-J6qa4849blE2+poT4WnyKhv5vZF5SrPo0iEjwBvKU7imGFAV0wwj1yYfoRSJoZ+n" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/popper.js@1.16.0/dist/umd/popper.min.js" integrity="sha384-Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo" crossorigin="anonymous"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/js/bootstrap.min.js" integrity="sha384-wfSDF2E50Y2D1uUdj0O3uMBJnjuUD4Ih7YwaYd1iqfktj0Uod8GCExl3Og8ifwB6" crossorigin="anonymous"></script>
<title>About the T-box database</title>
<link type="text/css" rel="stylesheet" href="css/style.css"/>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-121083048-5"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-121083048-5');
</script>
</head>
<body>
<!-- Nav bar start-->
<nav class="navbar navbar-expand-md navbar-dark bg-dark fixed-top pt-3 mx-auto pr-5 pl-5" >
<div class="navbar-collapse collapse w-100 order-3 dual-collapse2">
<div class="container">
<a class="navbar-brand" href="https://tbdb.io/">
<img src="../logo/TBDB_logo_1_no_fill_white_white.png" height="40 px" alt="tbdb_logo">
</a>
</div>
<ul class="navbar-nav ml-auto pr-5">
<a class="navbar-brand" href="https://tbdb.io/"><strong>T-box</strong> Database</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarsExampleDefault" aria-controls="navbarsExampleDefault" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarsExampleDefault">
<ul class="navbar-nav mr-auto">
<li class="nav-item dropdown active">
<a class="nav-link dropdown-toggle" href="#" id="dropdown01" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">about</a>
<div class="dropdown-menu" aria-labelledby="dropdown01">
<a class="dropdown-item" href="https://tbdb.io/about.html">about TBDB</a>
<a class="dropdown-item" href="https://tbdb.io/tbox_background.html">tbox background</a>
<a class="dropdown-item" href="https://tbdb.io/faq.html">faq</a>
<a class="dropdown-item" href="https://tbdb.io/citing.html">how to cite</a>
<a class="dropdown-item" href="https://tbdb.io/contact.html">contact</a>
</div>
</li>
<li class="nav-item dropdown active">
<a class="nav-link dropdown-toggle" href="#" id="dropdown01" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">database</a>
<div class="dropdown-menu" aria-labelledby="dropdown01">
<a class="dropdown-item" href="https://tbdb.io/tbdb.html">browse</a>
<a class="dropdown-item" href="https://tbdb.io/advanced_search.html">advanced search</a>
<a class="dropdown-item" href="https://tbdb.io/specifier_use.html">specifier use table</a>
<a class="dropdown-item" href="https://tbdb.io/database/tbdb.csv">download tdbd</a>
</div>
</li>
<li class="nav-item dropdown active">
<a class="nav-link dropdown-toggle" href="#" id="dropdown01" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">tools</a>
<div class="dropdown-menu" aria-labelledby="dropdown01">
<a class="dropdown-item" href="https://github.com/jamarchand/tbox-scan">tbox-scan</a>
</div>
</li>
<li class="nav-item dropdown active">
<a class="nav-link dropdown-toggle" href="#" id="dropdown01" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">repository</a>
<div class="dropdown-menu" aria-labelledby="dropdown01">
<a class="dropdown-item" href="https://github.com/jamarchand/tboxdb">site</a>
<a class="dropdown-item" href="https://github.com/mpiersonsmela/tbox">source code</a>
</div>
</li>
</ul>
</div>
</ul>
</div>
</nav>
<!-- Nav bar end-->
<main role="main" class="container">
<div class="pl-5 pr-5">
<br><br><br>
<div class="pt-4">
<div class="pt-4"><h2>About the T-box database</h2></div>
<h3 class="text-justify">The T-box database (TBDB) is a database that attempts to annotate structural and genetic features of T-box leader sequences. The goal of this database is to enrich information available about T-box sequences in order to facilitate research in this area. While >15,000 T-box sequences have been discovered by genome mining, only a handful have been experimentally characterized. We hope that the information contained in this database will decrease the barrier to entry into this field. </h3>
</div>
<div class="pt-3"><h5>Feature prediction</h5></div>
<h3 class="text-justify"> The predictions contained in the TBDB were performed using the methods found in the <a href="https://www.biorxiv.org/content/10.1101/2020.06.17.157016v1">bioRxiv</a>. All code used to generate the databse is present in <a href = "https://github.com/mpiersonsmela/tbox"> our Github.</a> In summary, our pipeline performs feature prediction in two steps. First INFERNAL is used to predict secondary structure, then the secondary structure is searched for conserved features including Stem I, the specifier loop, and the antiterminator. The codon and discriminator are extracted from the position of these motifs.
Thermodynamic calculations on antiterminator and terminator folds were performed using ViennaRNA. The NCBI accession numbers of the input sequences were used to gather various annotations, including taxonomy and donwstream gene ontology. tRNAscan-SE was used to generate a list of tRNAs for each organism. Most likely codons within specifier loops were chosen based on their position within the specifier loop, with additional refinement using tRNA discriminator base and downstream gene ontology (where present). Alternative codon-frames, where found, are also presented. </h3>
<div class="pt-3"><h5>Benchmarking feature prediction</h5></div>
<h3>The structurally-annotated dataset found in <a href = "https://dx.doi.org/10.1261/rna.819308">Vitreschak <i>et al.</i>, 2008</a> was used to validate the accuracy of our feature prediction pipeline. From the 698 initial sequences:</h3>
<ul id="bulletpoint">
<strong>T-boxes (n = 698)</strong><br>
<ul>
<li>694 sequences had a T-box detected by INFERNAL (99.5%)</li>
<li>621 scored high enough to make a feature prediction (89.5%)</li>
</ul>
</ul>
<ul id="bulletpoint">
<strong>Codons (n = 621)</strong><br>
<ul>
<li>589 codons were predicted correctly (94.8%)</li>
<li>1 codon was off by -1 (0.2%)</li>
<li>9 codons were off by +1 (1.4%)</li>
<li>22 codons were otherwise incorrect (3.5%)</li>
</ul>
</ul>
<ul id="bulletpoint">
<strong>Discriminator Base (n = 619)</strong><br>
<ul>
<li>619 discriminator bases were predicted correctly (100%)</li>
</ul>
</ul></p>
<div class="pt-3"><h5>T-box classification and regulatory type</h5></div>
<h3 class="text-justify">
We used two different covariance models to build this database: the RFAM class I T-box model (RF00230), and our own translational class II model derived from ileS leader sequences. Qualitatively, class I T-boxes tend to have a larger Stem I structure, while class II T-boxes have a shorter one. In our database, T-boxes detected by the Rfam RF00230 covariance model are mostly Class I transcriptional T-boxes, and T-boxes detected by the our translational model will be class II translational T-boxes. However, there may be instances where the RFAM00230 model predicts the folding of what are class II T-boxes due to similar structural features (in particular, the antiterminator/antisequestrator motif). Additionally, there are other classes of T-boxes (such as S. aureus ileS T-boxes, e.g. <a href="https://tbdb.io/tboxes/RMB1LX9O.html">RMB1LX9O</a>) for which we do not currently have a robust covariance models, but which are sometimes detected by the RF00230 covariance model.
<br><br>
We have attempted to classify the T-boxes in the database by type of regulation. T-boxes predicted using RF00230 are classified as transcriptional if we have been able to identify a downstream terminator hairpin,
or unknown if they do not. T-boxes predicted using our ileS translational model are classified as translational. In total, we have 20396 putative class I transcriptional T-boxes, 1012 putative class II translational
T-boxes, and 2128 T-boxes of unknown regulatory type.
<br><br>
<ul id="bulletpoint">
Class I model (RFAM00230)<br>
<ul>
<li>Will fold mostly canonical class I transcriptional T-boxes</li>
<li>Sometimes will also fold canonical class I translational T-boxes (terminator usually not found here)
</li>
<li>Sometimes will fold T-boxes that are actually class II translational (will have a poor INFERNAL output score)
</li>
</ul>
<br>
Class II model (TBDB Ile Translational)<br>
<ul>
<li>Will fold mostly class II Ile translational T-boxes
</li>
<li>Sometimes will also fold canonical class I T-boxes (will have a poor INFERNAL output score)
</li>
</ul>
</ul>
<div class="pt-3"><h5>Handling complex cases</h5></div>
<h3 class="text-justify">
The RFAM00230 model does not produce secondary structures with more than one antiterminator at a time. This means that complex T-box leader sequences (such as any partially-double T-boxes). As model outputs, these cases would either be truncated after the first antiterminator/terminator (i.e. missing their second 'half'), or they would have the first antiterminator/terminator pair not labeled (i.e. first half not shown as a structural loop). The same problem could occur with double T-boxes (tandemly arranged complete T-boxes) where either one of the two T-boxes would be absent or the first T-box's antiterminator/terminator and second T-box's Stem I mischaracterized. As we continue to build new covariance models for finding new T-boxes, we will also be improving existing models to handle complex cases.
<br><br>
We have attempted to classify the T-boxes in the database by type of regulation. T-boxes predicted using RF00230 are classified as transcriptional if we have been able to identify a downstream terminator hairpin,
or unknown if they do not. T-boxes predicted using our ileS translational model are classified as translational. In total, we have 20396 putative class I transcriptional T-boxes, 1012 putative class II translational
T-boxes, and 2128 T-boxes of unknown regulatory type.
<br>
</h3>
<div class="pt-3"><h5>RNA folding thermodynamics</h5></div>
<h3 class="text-justify">
T-box switching depends on the relative stability of antiterminator and terminator folds. This was evaluated using thermodynamic methods provided in the <a href = "https://www.tbi.univie.ac.at/RNA/">ViennaRNA</a> package.
The antiterminator structure output by INFERNAL was optimized with appropriate constraints using RNAfold. Similarly, the terminator structure was found by RNALfold, looking for hairpins between the UGGN and poly-U regions. Both structures and their folding ∆Gs are included in the database.
</h3>
<div class="pt-3"><h5>tRNA pairing prediction</h5></div>
<h3 class="text-justify"> In nature, T-box logic is controlled by cognate tRNAs that Watson-Crick base pair with the T-box specifier loop and anti-acceptor arm sequences (T-box region). However, other tertiary interactions are thought to play an important role in deciding if a specific tRNA can control T-box logic. In particular, structural features in certain T-box Stem I and Stem II regions are thought to interact with tRNA in a sequence-specific manner. In order to facilitate discovery of functional T-box leaders, we used <a href=http://lowelab.ucsc.edu/tRNAscan-SE/>tRNAscan-SE</a> to identify all tRNAs from T-box hosts that could pair with each T-box. Host tRNA identification was performed for all complete sequence records. For partial sequence records, tRNA identification was attempted and we report matching tRNAs if any were identified.
</h3>
<div class="pt-3"><h5>Data sources</h5></div>
<h3 class="text-justify">Input sequences for building the TBDB were obtained from previously published datasets. Structurally annotated datasets from <a href = "https://dx.doi.org/10.1261/rna.819308">Vitreschak <i>et al.</i></a> were used for validation. T-boxes were assigned a unique ID based on sequence to de-duplicate entries shared between datasources.
</h3>
<ul id="bulletpoint">
<li><a href = "https://rfam.xfam.org/family/RF00230">The Rfam database</a> (14106 T-boxes, predicted using <a href=http://eddylab.org/infernal/>INFERNAL</a>)</li>
<li><a href = "http://operons.ibt.unam.mx/gctNG/">GeCont3</a> (4491 T-boxes)</li>
<li><a href = "https://dx.doi.org/10.1261/rna.819308">Vitreschak <i>et al.</i>, 2008</a> (698 T-boxes, structurally annotated)</li>
</ul>
<div class="pb-5"></div>
<div class="pb-5"></div>
</div>
<br>
</html>