Brewery:
quick help and references





The Servers: description

Brewery is the state-of-the-art predictor of protein structural annotations (contact density, secondary structure in 3 and 8 classes, solvent accessibility and structural motifs). Brewery is based on ensembles of cascaded BRNNs (bidirectional recurrent neural networks) and Convolutional Neural Networks. Brewery's feature include:

  • New, large training sets. The servers are trained on a recent redundancy reduced subset of the Protein Data Bank, containing nearly 16,000 protein structures.
  • More diverse evolutionary information We use both psiblast and HHblits on recent versions of the UniProt database. While systems based on psiblast or HHblits separately have broadly similar performances, systems using both at the same time, or ensembles of systems trained separately, give us significant improvements.
  • More efficient input encoding. We have come up with an encoding for the evolutionary information that gives us a more informative representation of the distribution of alignments and at the same time keeps track of the identity of the primary sequence of the query. This gives us a significant boost compared to older versions of the servers.
  • 8 class prediction alongside the traditional 3 classes Porter8 5.0 is a completely independent system from Porter 5.0 and is trained on the 8 DSSP classes rather than a projection thereof onto the 3 classic classes (Helix, Strand and Coil). While there are other predictors attempting this task, in our tests Porter8 performs best by far.
Porter 5.0, tested on a large independent test set, achieves approximately 84% correct classification on the "hard" CASP 3-class assignment.
Porter8 5.0, similarly tested, achieves 73% correct classification.
A paper on Porter 5.0 is currently undergoing review. Many of the details on the architectures can be found in older references on the previous versions of the server.

Input formats

Email

If you input an email address you will receive an email containing the results.
NOTE: Check that you typed your address correctly. A lot of the queries handled don't receive an answer because of incorrect typing.

Whether you input an email address or not, a link to a web page will be provided to you after you click submit. The link will point you to the results page, which is updated automatically every 60 seconds until the query is complete.

Notice that if we have many jobs in our queue it may take hours to serve a query containing many sequences. Even if the queue is empty, a maximal query (64kbytes) would typically take in the region of two hours to be processed. If you don't want to keep a browser window open for half a day, bookmark the response link or input an email address.

Input sequence(s)

The sequence of amino acids:

  • You can submit sequences in FASTA format. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line must begin with a greater-than (">") symbol in the first column.
  • You can send up to 64kbytes per submission, which is approximately 200 average sized proteins
  • Larger queries can be broken down into 64kbyte chunks, or you can ask us to lift the limit for you on a one-off basis.
  • Spaces, newlines and tabs will be ignored, so feel free to have them in your query.
  • Characters not corresponding to any aminoacid will be treated as X.
  • Only 1 letter amino acid code understood. Please do not send nucleotide sequences. If so, A will be treated as Alanine, C as Cysteine, etc...

Output format

Replies are sent by email (if you give us one) and shown as a web page if you click the link we give you after you submit. The email response and main web page contain the same information in the same format. Brewery's replies come as text. You might have to "view attachments inline" in your email client to see these replies. If you submit multiple sequences you will receive one single email/web page with all the predictions. The web page version of the results will be updated incrementally (every 60 seconds) until your query is complete. Here you have an example of prediction:

Subject: Porter 5.0 PaleAle 5.0 Porter+ 5.0 BrownAle 5.0  response to 1 queries

Query_name: foo3
Query_length: 50

SEQ  MANIEIRQETPTAFYIKVHDTDNVAIIVNDNGLKAGTRFPDGLELIEHIP
SS3  CCCCCCCCCCCCCEEEEECCCCCEEEEECCCCCCCCCEECCCEEEECCCC
     98765556899870688667775899971896389967057317712569
SS8  CCCCCCCCCCCCCEEEEECTTCCEEEEECTTSCCTTCECTTSEEEESCCC
     98445565455351688665613899972430087772023017603579
SA   EEEEEEEEEEEEbbbBeBeEEbbBBBBBbEEEbEEeeEbEEEbebeeEbE
     85524124423111051216622342510160150003334021011418
TA   EEEEEEEEEEEEEEEEebtTBEBEEEebtTBBeSsBebhgTBeSsBebhH
     01111111111345899950211965330000899922110145548699
CD   NNNNNNNNNNccCCCCCCCCcCCCCCCCCCcCCCCCCCCccCCCCCcccN
     54424221100013776630115777764011524522200232310011

Your query is split into blocks of 50 residues. For each block you have 5 lines:

  • Line 1: The 1-letter code of your protein primary sequence, preceded by "SEQ ".
  • Line 2: Secondary structure prediction by Porter 5.0, preceded by "SS3 ":
    • H = helix : DSSP's H (alpha helix) + G (3-10 helix) + I (pi-helix) classes.
    • E = strand : DSSP's E (extended strand) + B (beta-bridge) classes.
    • C = the rest : DSSP's T (turn) + S (bend) + . (the rest).
  • Line 3: 3 class Secondary structure prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
  • Line 4: 8 class secondary structure prediction by Porter8 5.0, preceded by "SS8 ":
    • H = DSSP's H (alpha helix)
    • G = DSSP's G (3-10 helix)
    • I = DSSP's I (pi-helix)
    • E = DSSP's E (extended strand)
    • B = DSSP's B (beta-bridge)
    • T = DSSP's T (turn)
    • S = DSSP's S (bend)
    • C = DSSP's . (the rest).
  • Line 5: 8 class Secondary structure prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
  • Line 6: Solvent Accessibility prediction by PaleAle 5.0, preceded by "SA ":
    • E = very exposed (over 50% exposed).
    • e = somewhat exposed (25%-50% exposed).
    • b = somewhat buried (4%-25% exposed).
    • B = very buried (under 4% exposed).
  • Line 7: Solvent Accessibility prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
  • Line 8: Structural Motif prediction in 14 classes by Porter+ 5.0, preceded by "TA ":
    In the table below are the 1-letter codes for the 14 structural classes, and the ideal sequence of 4 pairs of Φ and Ψ defining them:

    Class&phi1&psi1 &phi2&psi2 &phi3&psi3 &phi4&psi4
    b265148280153300327291332
    h274152301321297322293322
    H297319297319296319294321
    I294334270346279138293336
    C271355273144283155296329
    e253144254144279149299333
    E251143244143245144253142
    S255147284138267341231154
    t270147301345266360268145
    g2943272833462501.7268147
    T2903442631266147263143
    B292349248148252144254145
    s288139319347231150245146
    i262343234156288326295324
  • Line 9: Structural Motif prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
  • Line 10: Contact Density prediction by BrownAle 5.0, preceded by "CD ":
    • N = very low density.
    • n = low density.
    • c = high density.
    • C = very high density.
  • Line 11: Contact Density prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.

References

Brewery

M.Torrisi, G.Pollastri, "Protein Structure Annotations", in Essentials of Bioinformatics, Volume I. Understanding Bioinformatics: Genes to Proteins, Springer Nature, 2019; doi: 10.1007/978-3-030-02634-9_10
Repository UCD (Abstract)

PaleAle 5.0

M.Kaleel, M.Torrisi, C.Mooney, G.Pollastri, "PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning" Amino Acids, 2019, doi: 10.1007/s00726-019-02767-6
AMAC web site (Toll-free Link)

Porter 5.0

M.Torrisi, M.Kaleel, G.Pollastri, "Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction", Scientific Reports, 9: 12374, 2019, doi: 10.1038/s41598-019-48786-x
Open access abstract and PDF (Scientific Reports web site)

M.Torrisi, M.Kaleel, G.Pollastri, "Porter 5: state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes", bioRxiv, 289033; doi: 10.1101/289033
Abstract and PDF (bioRxiv web site)


Porter 4.0, PaleAle 4.0

C.Mirabello, G.Pollastri, "Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility", Bioinformatics, 29(16):2056-2058, 2013, doi: 10.1093/bioinformatics/btt344
Toll free PDF (Bioinformatics web site)

Older references on Porter and PaleAle

G.Pollastri, A.McLysaght. "Porter: a new, accurate server for protein secondary structure prediction". Bioinformatics, 21(8),1719-20, 2005.
Toll-free link to the article

C. Mooney, G.Pollastri. "Beyond the Twilight Zone: Automated prediction of structural properties of proteins by recursive neural networks and remote homology information" Proteins, 77(1), 181-90, 2009.
Abstract and PDF (Proteins web site)

G.Pollastri*, A. J. M. Martin, C. Mooney, A. Vullo. "Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information" BMC Bioinformatics, 8:201, 2007.
Open access abstract and PDF (BMC Bioinformatics web site).

Distill as a whole

D. Baú, A. J. M. Martin, C. Mooney, A. Vullo, I. Walsh, G. Pollastri. "Distill: A suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins" BMC Bioinformatics, 7:402, 2006.
Open access abstract and PDF (BMC Bioinformatics web site).

A more comprehensive list of publications from our group is available here.




Brewery (4 structural predictors through one interface)
Porter 5.0
PaleAle 5.0
Distill 2.0

Porter 4.0, PaleAle 4.0

Gianluca Pollastri, gianluca.pollastri at ucd.ie,
Gianluca Pollastri's group
School of Computer Science and Informatics
University College Dublin