##################################################### # PaleAle 5.0 datasets # ##################################################### > set.dataset was used in 5-fold cross-validation to train and tune the models. >independent_test_set.dataset was used only to assess the performances of the final system and other recent predictors. >The format of the datasets is as follows: two lines header: number_of_proteins attributes classes #45 4 Following the header are the protein records, five lines per protein: 1) pdb_id 2) length 3) encoded input # (length * 44) floats + 1 (legth/1000) 4) relative solvent accessibility 5) blank >set.fasta and independent_test_set.fasta are the very same datasets containing only the sequences. >This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-nc-sa/4.0/). >Please email us at gianluca.pollastri@ucd.ie if you wish to use it for commercial or any other purposes not permitted by the CC BY-NC-SA 4.0.