Porter 5.0: quick help and references
The Servers: description
Porter and Porter8 are servers for protein secondary structure prediction in 3 and 8 classes based on ensembles of cascaded BRNNs (bidirectional recurrent neural networks) and Convolutional Neural Networks. Porter's feature include:
- New, large training sets.
The servers are trained on a recent redundancy reduced subset of the Protein Data Bank, containing nearly 16,000 protein structures.
- More diverse evolutionary information
We use both psiblast and HHblits on recent versions of the UniProt database. While systems based on psiblast or HHblits separately have broadly similar performances, systems using both at the same time, or ensembles of systems trained separately, give us significant improvements.
- More efficient input encoding. We have come up with an encoding for the evolutionary information that gives us a more informative representation of the distribution of alignments and at the same time keeps track of the identity of the primary sequence of the query. This gives us a significant boost compared to older versions of the servers.
- 8 class prediction alongside the traditional 3 classes
Porter8 5.0 is a completely independent system from Porter 5.0 and is trained on the 8 DSSP classes rather than a projection thereof onto the 3 classic classes (Helix, Strand and Coil). While there are other predictors attempting this task, in our tests Porter8 performs best by far.
Porter 5.0, tested on a large independent test set,
achieves approximately 84% correct classification on the "hard" CASP 3-class assignment.
Porter8 5.0, similarly tested,
achieves 73% correct classification.
A paper on Porter 5.0 is currently undergoing review. Many of the details on the architectures can be found in older references on the previous versions of the server.
Input formats
Email
If you input an email address you will receive an email containing the results.
NOTE: Check that you typed your address correctly. A lot of
the queries handled don't receive an answer because of incorrect typing.
Whether you input an email address or not, a link to a web page will be provided to you
after you click submit. The link will point
you to the results page, which is updated automatically every 60 seconds until the query
is complete.
Notice that if we have many jobs in our queue it may take hours to serve a query containing many sequences.
Even if the queue is empty, a maximal query (64kbytes) would typically take in the region of two hours to be processed.
If you don't want to keep a browser window open for half a day, bookmark the response link or input an email address.
Input sequence(s)
The sequence of amino acids:
- You can submit sequences in FASTA format. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line must begin with a greater-than (">") symbol in the first column.
- You can send up to 64kbytes per submission, which is approximately 200 average sized proteins
- Larger queries can be broken down into 64kbyte chunks, or you can ask us to lift the limit for you on a one-off basis.
- Spaces, newlines and tabs will be ignored, so feel free to have them in your query.
- Characters not corresponding to any aminoacid will be treated as X.
- Only 1 letter amino acid code understood. Please do not send nucleotide sequences. If so, A will be treated as Alanine, C as Cysteine, etc...
Output format
Replies are sent by email (if you give us one) and shown as a web page if you click the link we give you after you submit. The email response and main web page contain the same information in the same format.
Porter 5.0 and
Porter8 5.0's replies come as text.
You might have to "view attachments inline" in your email client to see these replies.
If you submit multiple sequences you will receive one single email/web page with all the predictions.
The web page version of the results will be updated incrementally (every 60 seconds) until your
query is complete.
Here you have an example of prediction:
Subject: Porter 5.0 response to 1 queries
Query_name: 1A1W_
Query_length: 83
SEQ MDPFLVLLHSVSSSLSSSELTELKYLCLGRVGKRKLERVQSGLDLFSMLL
SS3 CCHHHHHHHHHHHCCCHHHHHHHHHHHCCCCCHHHHHHCCCHHHHHHHHH
98357887766521885679999998504566233320677788999988
SS8 CCHHHHHHHHHHHTSCHHHHHHHHHHHTTTSCHHHHHHCCSHHHHHHHHH
76567777776630066677887787413016344310642677777787
SEQ EQNDLEPGHTELLRELLASLRRHDLLRRVDDFE
SS3 HCCCCCCCCHHHHHHHHHHCCCHHHHHHHHHCC
557889774477888887458412566665229
SS8 HTTCCCTTCHHHHHHHHHHTTCHHHHHHHHHTC
656215454567777776535433566654107
Your query is split into blocks of 50 residues.
For each block you have 5 lines:
- Line 1: The 1-letter code of your protein primary sequence, preceded by "SEQ ".
- Line 2: Secondary structure prediction by Porter 5.0, preceded by "SS3 ":
- H = helix : DSSP's H (alpha helix) + G (3-10 helix) + I (pi-helix) classes.
- E = strand : DSSP's E (extended strand) + B (beta-bridge) classes.
- C = the rest : DSSP's T (turn) + S (bend) + . (the rest).
- Line 3: 3 class Secondary structure prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
- Line 4: 8 class secondary structure prediction by Porter8 5.0, preceded by "SS8 ":
- H = DSSP's H (alpha helix)
- G = DSSP's G (3-10 helix)
- I = DSSP's I (pi-helix)
- E = DSSP's E (extended strand)
- B = DSSP's B (beta-bridge)
- T = DSSP's T (turn)
- S = DSSP's S (bend)
- C = DSSP's . (the rest).
- Line 5: 8 class Secondary structure prediction confidence: a number between 0 and 9, with 9 signifying maximal confidence.
References
Porter 5.0
M.Torrisi, M.Kaleel, G.Pollastri,
"Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction",
Scientific Reports, 9: 12374, 2019, doi: 10.1038/s41598-019-48786-x
Open access abstract and PDF (Scientific Reports web site)
M.Torrisi, M.Kaleel, G.Pollastri,
"Porter 5: state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes",
bioRxiv, 289033; doi: 10.1101/289033
Abstract and PDF (bioRxiv web site)
Brewery
M.Torrisi, G.Pollastri,
"Protein Structure Annotations", in Essentials of Bioinformatics, Volume I. Understanding Bioinformatics: Genes to Proteins,
Springer Nature, 2019; doi: 10.1007/978-3-030-02634-9_10
Repository UCD
Porter 4.0, PaleAle 4.0
C.Mirabello, G.Pollastri,
"Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility",
Bioinformatics, 29(16):2056-2058, 2013, doi: 10.1093/bioinformatics/btt344
Toll free PDF (Bioinformatics web site)
Older references on Porter and PaleAle
G.Pollastri, A.McLysaght.
"Porter: a new, accurate server for protein secondary structure prediction".
Bioinformatics, 21(8),1719-20, 2005.
Toll-free link to the article
C. Mooney, G.Pollastri.
"Beyond the Twilight Zone: Automated prediction of structural properties of proteins by recursive neural networks and remote homology information"
Proteins, 77(1), 181-90, 2009.
Abstract and PDF (Proteins web site)
G.Pollastri*, A. J. M. Martin, C. Mooney, A. Vullo.
"Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information"
BMC Bioinformatics, 8:201, 2007.
Open access abstract and PDF (BMC Bioinformatics web site).
Distill as a whole
D. Baú, A. J. M. Martin, C. Mooney, A. Vullo, I. Walsh, G. Pollastri.
"Distill: A suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins"
BMC Bioinformatics, 7:402, 2006.
Open access abstract and PDF (BMC Bioinformatics web site).
A more comprehensive list of publications from our group is available here.
|