Extract training datasets and test datasets from IEDB database

According to the IEDB database (March 28, 2018), we downloaded 377839 linear B-cell peptides. Through data washing strategy such as the peptide length in the interval [10, 50], peptide-containing full antigen sequences and unique IEDB IRI code, we finally got 240563 peptides, which contains 25884 positive samples and 214679 negative samples. The number of involved antigen sequences is 6086, which were downloaded from NCBI protein database. Finally, we extracted training datasets and independent test datasets using the information from the antigen sequences and IEDB peptides above, which can be downloaded from the following table. Each row of the table contains the information from the particular model. For example, the first row of the table is associated with the information of the model DLBEptope11. The columns “Epitope”, “Test datasets”, “Training datasets”, "Test prediction", "Test performance" and "AUC" stand for the epitope length, the number of samples in test dataset, the number of samples in training dataset, prediction scores for all samples in test dataset, prediction performance from test dataset and test dataset-based ROC plot, respectively.