Brewery: quick help and references
The Servers: description
Brewery is the state-of-the-art predictor of protein structural annotations (contact density, secondary structure in 3 and 8 classes, solvent accessibility and structural motifs). Brewery is based on ensembles of cascaded BRNNs (bidirectional recurrent neural networks) and Convolutional Neural Networks. Brewery's feature include:
- New, large training sets.
The servers are trained on a recent redundancy reduced subset of the Protein Data Bank, containing nearly 16,000 protein structures.
- More diverse evolutionary information
We use both psiblast and HHblits on recent versions of the UniProt database. While systems based on psiblast or HHblits separately have broadly similar performances, systems using both at the same time, or ensembles of systems trained separately, give us significant improvements.
- More efficient input encoding. We have come up with an encoding for the evolutionary information that gives us a more informative representation of the distribution of alignments and at the same time keeps track of the identity of the primary sequence of the query. This gives us a significant boost compared to older versions of the servers.
- 8 class prediction alongside the traditional 3 classes
Porter8 5.0 is a completely independent system from Porter 5.0 and is trained on the 8 DSSP classes rather than a projection thereof onto the 3 classic classes (Helix, Strand and Coil). While there are other predictors attempting this task, in our tests Porter8 performs best by far.
Porter 5.0, tested on a large independent test set,
achieves approximately 84% correct classification on the "hard" CASP 3-class assignment.
Porter8 5.0, similarly tested,
achieves 73% correct classification.
A paper on Porter 5.0 is currently undergoing review. Many of the details on the architectures can be found in older references on the previous versions of the server.
Input formats
Email
If you input an email address you will receive an email containing the results.
NOTE: Check that you typed your address correctly. A lot of
the queries handled don't receive an answer because of incorrect typing.
Whether you input an email address or not, a link to a web page will be provided to you
after you click submit. The link will point
you to the results page, which is updated automatically every 60 seconds until the query
is complete.
Notice that if we have many jobs in our queue it may take hours to serve a query containing many sequences.
Even if the queue is empty, a maximal query (64kbytes) would typically take in the region of two hours to be processed.
If you don't want to keep a browser window open for half a day, bookmark the response link or input an email address.
Input sequence(s)
The sequence of amino acids:
- You can submit sequences in FASTA format. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line must begin with a greater-than (">") symbol in the first column.
- You can send up to 64kbytes per submission, which is approximately 200 average sized proteins
- Larger queries can be broken down into 64kbyte chunks, or you can ask us to lift the limit for you on a one-off basis.
- Spaces, newlines and tabs will be ignored, so feel free to have them in your query.
- Characters not corresponding to any aminoacid will be treated as X.
- Only 1 letter amino acid code understood. Please do not send nucleotide sequences. If so, A will be treated as Alanine, C as Cysteine, etc...
Output format
Replies are sent by email (if you give us one) and shown as a web page if you click the link we give you after you submit. The email response and main web page contain the same information in the same format.
Brewery's replies come as text.
You might have to "view attachments inline" in your email client to see these replies.
If you submit multiple sequences you will receive one single email/web page with all the predictions.
The web page version of the results will be updated incrementally (every 60 seconds) until your
query is complete.
Here you have an example of prediction:
Subject: Porter 5.0 PaleAle 5.0 Porter+ 5.0 BrownAle 5.0 response to 1 queries
Query_name: foo3
Query_length: 50
SEQ MANIEIRQETPTAFYIKVHDTDNVAIIVNDNGLKAGTRFPDGLELIEHIP
SS3 CCCCCCCCCCCCCEEEEECCCCCEEEEECCCCCCCCCEECCCEEEECCCC
98765556899870688667775899971896389967057317712569
SS8 CCCCCCCCCCCCCEEEEECTTCCEEEEECTTSCCTTCECTTSEEEESCCC
98445565455351688665613899972430087772023017603579
SA EEEEEEEEEEEEbbbBeBeEEbbBBBBBbEEEbEEeeEbEEEbebeeEbE
85524124423111051216622342510160150003334021011418
TA EEEEEEEEEEEEEEEEebtTBEBEEEebtTBBeSsBebhgTBeSsBebhH
01111111111345899950211965330000899922110145548699
CD NNNNNNNNNNccCCCCCCCCcCCCCCCCCCcCCCCCCCCccCCCCCcccN
54424221100013776630115777764011524522200232310011
Your query is split into blocks of 50 residues.
For each block you have 5 lines:
- Line 1: The 1-letter code of your protein primary sequence, preceded by "SEQ ".
- Line 2: Secondary structure prediction by Porter 5.0, preceded by "SS3 ":
- H = helix : DSSP's H (alpha helix) + G (3-10 helix) + I (pi-helix) classes.
- E = strand : DSSP's E (extended strand) + B (beta-bridge) classes.
- C = the rest : DSSP's T (turn) + S (bend) + . (the rest).
- Line 3: 3 class Secondary structure prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
- Line 4: 8 class secondary structure prediction by Porter8 5.0, preceded by "SS8 ":
- H = DSSP's H (alpha helix)
- G = DSSP's G (3-10 helix)
- I = DSSP's I (pi-helix)
- E = DSSP's E (extended strand)
- B = DSSP's B (beta-bridge)
- T = DSSP's T (turn)
- S = DSSP's S (bend)
- C = DSSP's . (the rest).
- Line 5: 8 class Secondary structure prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
- Line 6: Solvent Accessibility prediction by PaleAle 5.0, preceded by "SA ":
- E = very exposed (over 50% exposed).
- e = somewhat exposed (25%-50% exposed).
- b = somewhat buried (4%-25% exposed).
- B = very buried (under 4% exposed).
- Line 7: Solvent Accessibility prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
- Line 8: Structural Motif prediction in 14 classes by Porter+ 5.0, preceded by "TA ":
In the table below are the 1-letter codes for the 14 structural classes, and the ideal sequence of 4 pairs of Φ and Ψ defining them:
Class | &phi1 | &psi1 |
&phi2 | &psi2 |
&phi3 | &psi3 |
&phi4 | &psi4 |
b | 265 | 148 | 280 | 153 | 300 | 327 | 291 | 332 |
h | 274 | 152 | 301 | 321 | 297 | 322 | 293 | 322 |
H | 297 | 319 | 297 | 319 | 296 | 319 | 294 | 321 |
I | 294 | 334 | 270 | 346 | 279 | 138 | 293 | 336 |
C | 271 | 355 | 273 | 144 | 283 | 155 | 296 | 329 |
e | 253 | 144 | 254 | 144 | 279 | 149 | 299 | 333 |
E | 251 | 143 | 244 | 143 | 245 | 144 | 253 | 142 |
S | 255 | 147 | 284 | 138 | 267 | 341 | 231 | 154 |
t | 270 | 147 | 301 | 345 | 266 | 360 | 268 | 145 |
g | 294 | 327 | 283 | 346 | 250 | 1.7 | 268 | 147 |
T | 290 | 344 | 263 | 1 | 266 | 147 | 263 | 143 |
B | 292 | 349 | 248 | 148 | 252 | 144 | 254 | 145 |
s | 288 | 139 | 319 | 347 | 231 | 150 | 245 | 146 |
i | 262 | 343 | 234 | 156 | 288 | 326 | 295 | 324 |
- Line 9: Structural Motif prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
- Line 10: Contact Density prediction by BrownAle 5.0, preceded by "CD ":
- N = very low density.
- n = low density.
- c = high density.
- C = very high density.
- Line 11: Contact Density prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
References
Brewery
M.Torrisi, G.Pollastri,
"Brewery: Deep Learning and deeper profiles for the prediction of 1D protein structure annotations"
Bioinformatics, 1367-4803, 2020, doi: 10.1093/bioinformatics/btaa204
Toll free PDF (Bioinformatics web site)
M.Torrisi, G.Pollastri,
"Protein Structure Annotations", in Essentials of Bioinformatics, Volume I. Understanding Bioinformatics: Genes to Proteins,
Springer Nature, 2019; doi: 10.1007/978-3-030-02634-9_10
Repository UCD (Abstract)
PaleAle 5.0
M.Kaleel, M.Torrisi, C.Mooney, G.Pollastri,
"PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning"
Amino Acids, 2019, doi: 10.1007/s00726-019-02767-6
AMAC web site (Toll-free Link)
Porter 5.0
M.Torrisi, M.Kaleel, G.Pollastri,
"Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction",
Scientific Reports, 9: 12374, 2019, doi: 10.1038/s41598-019-48786-x
Open access abstract and PDF (Scientific Reports web site)
M.Torrisi, M.Kaleel, G.Pollastri,
"Porter 5: state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes",
bioRxiv, 289033; doi: 10.1101/289033
Abstract and PDF (bioRxiv web site)
Porter 4.0, PaleAle 4.0
C.Mirabello, G.Pollastri,
"Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility",
Bioinformatics, 29(16):2056-2058, 2013, doi: 10.1093/bioinformatics/btt344
Toll free PDF (Bioinformatics web site)
Older references on Porter and PaleAle
G.Pollastri, A.McLysaght.
"Porter: a new, accurate server for protein secondary structure prediction".
Bioinformatics, 21(8),1719-20, 2005.
Toll-free link to the article
C. Mooney, G.Pollastri.
"Beyond the Twilight Zone: Automated prediction of structural properties of proteins by recursive neural networks and remote homology information"
Proteins, 77(1), 181-90, 2009.
Abstract and PDF (Proteins web site)
G.Pollastri*, A. J. M. Martin, C. Mooney, A. Vullo.
"Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information"
BMC Bioinformatics, 8:201, 2007.
Open access abstract and PDF (BMC Bioinformatics web site).
Distill as a whole
D. Baú, A. J. M. Martin, C. Mooney, A. Vullo, I. Walsh, G. Pollastri.
"Distill: A suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins"
BMC Bioinformatics, 7:402, 2006.
Open access abstract and PDF (BMC Bioinformatics web site).
A more comprehensive list of publications from our group is available here.
|