Dataset: 321,552 records in total, 90% as training set and 10% for validation with 10-folds cross-validation.
The steps for model construction as below: (Fig.1)
We use 5% of total samples for in-train validation. If the validation part (orange line) got large bias compared to training part (blue line), the model might be over-fitted. (Fig.3)
Finally, take 10-fold cross-validation test and see if the model is robust enough (Table 1).
| No. | loss | Accuracy (%) |
|---|---|---|
| 1 | 0.1619 | 0.9002 |
| 2 | 0.2766 | 0.8935 |
| 3 | 0.2109 | 0.9115 |
| 4 | 0.1880 | 0.8978 |
| 5 | 0.2261 | 0.8837 |
| 6 | 0.1748 | 0.8991 |
| 7 | 0.2177 | 0.8794 |
| 8 | 0.2225 | 0.9046 |
| 9 | 0.2314 | 0.9028 |
| 10 | 0.1807 | 0.8881 |
We have 46 types of label, but these are highly-unbalanced data. The way we estimate the accuracy of the model is to find how many times the model hits right answer among samples of one type, then we got the "reliability" of this type. The average of these "reliability" is the macro average accuracy of this model. (Table 2) And the performance of 46 genotypes are listed on Table-3 .
| Evaluation method | Precision | Recall | F1-score |
|---|---|---|---|
| Macro_Average | 0.8419 | 0.7689 | 0.8037 |
| Micro_Average | 0.8629 | 0.8629 | 0.8629 |
| Category | TP | TN | FP | FN | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| SARS-CoV-2 | 816 | 31336 | 5 | 1 | 0.9939 | 0.9988 | 0.9963 |
| Influenza.A.virus_HA | 245 | 31784 | 63 | 66 | 0.7955 | 0.7878 | 0.7916 |
| Influenza.A.virus_NA | 203 | 31740 | 118 | 97 | 0.6324 | 0.6767 | 0.6538 |
| Influenza.A.virus_MP | 35 | 32079 | 15 | 29 | 0.7 | 0.5469 | 0.614 |
| Influenza.A.virus_NS | 91 | 32021 | 25 | 21 | 0.7845 | 0.8125 | 0.7982 |
| Influenza.A.virus_NP | 103 | 32025 | 11 | 19 | 0.9035 | 0.8443 | 0.8729 |
| Influenza.A.virus_PA | 105 | 31993 | 28 | 32 | 0.7895 | 0.7664 | 0.7778 |
| Influenza.A.virus_PB2 | 134 | 31972 | 35 | 17 | 0.7929 | 0.8874 | 0.8375 |
| Influenza.A.virus_PB1 | 109 | 31979 | 40 | 30 | 0.7315 | 0.7842 | 0.7569 |
| Influenza.B.virus_HA | 2205 | 29901 | 20 | 32 | 0.991 | 0.9857 | 0.9883 |
| Influenza.B.virus_NA | 1651 | 30376 | 26 | 105 | 0.9845 | 0.9402 | 0.9618 |
| Influenza.B.virus_NS | 1313 | 30818 | 10 | 17 | 0.9924 | 0.9872 | 0.9898 |
| Influenza.B.virus_MP | 1204 | 30947 | 3 | 4 | 0.9975 | 0.9967 | 0.9971 |
| Influenza.B.virus_NP | 1177 | 30957 | 12 | 12 | 0.9899 | 0.9899 | 0.9899 |
| Influenza.B.virus_PA | 1169 | 30973 | 10 | 6 | 0.9915 | 0.9949 | 0.9932 |
| Influenza.B.virus_PB2 | 1130 | 29561 | 730 | 737 | 0.6075 | 0.6052 | 0.6064 |
| Influenza.B.virus_PB1 | 1140 | 29555 | 737 | 726 | 0.6074 | 0.6109 | 0.6091 |
| Human.orthopneumoviru | 2668 | 29397 | 44 | 49 | 0.9838 | 0.982 | 0.9829 |
| Enterovirus.A | 2553 | 29173 | 161 | 271 | 0.9407 | 0.904 | 0.922 |
| Enterovirus.C | 720 | 31112 | 140 | 186 | 0.8372 | 0.7947 | 0.8154 |
| Enterovirus.B | 2648 | 28256 | 1187 | 67 | 0.6905 | 0.9753 | 0.8085 |
| Enterovirus.D | 301 | 31670 | 122 | 65 | 0.7116 | 0.8224 | 0.763 |
| MERS-CoV | 97 | 32055 | 0 | 6 | 1.0 | 0.9417 | 0.97 |
| Human.mastadenovirus.B | 253 | 31696 | 82 | 127 | 0.7552 | 0.6658 | 0.7077 |
| Mumps.orthorubulaviru | 1184 | 30846 | 40 | 88 | 0.9673 | 0.9308 | 0.9487 |
| Rhinovirus.A | 512 | 31239 | 85 | 322 | 0.8576 | 0.6139 | 0.7156 |
| Human.respirovirus.3 | 170 | 31953 | 17 | 18 | 0.9091 | 0.9043 | 0.9067 |
| Measles.morbilliviru | 1950 | 30015 | 42 | 151 | 0.9789 | 0.9281 | 0.9528 |
| Rhinovirus.C | 391 | 31185 | 373 | 209 | 0.5118 | 0.6517 | 0.5733 |
| Influenza.C.virus_HE | 29 | 32123 | 0 | 6 | 1.0 | 0.8286 | 0.9062 |
| Human.mastadenovirus.C | 153 | 31839 | 24 | 142 | 0.8644 | 0.5186 | 0.6483 |
| Human.mastadenovirus.D | 105 | 31875 | 57 | 121 | 0.6481 | 0.4646 | 0.5412 |
| Human.metapneumoviru | 672 | 31219 | 87 | 180 | 0.8854 | 0.7887 | 0.8343 |
| Influenza.C.virus_NS | 20 | 32134 | 1 | 3 | 0.9524 | 0.8696 | 0.9091 |
| Betacoronavirus.1 | 109 | 32040 | 3 | 6 | 0.9732 | 0.9478 | 0.9604 |
| Rhinovirus.B | 23 | 31976 | 19 | 140 | 0.5476 | 0.1411 | 0.2244 |
| Influenza.C.virus_MP | 23 | 32130 | 1 | 4 | 0.9583 | 0.8519 | 0.902 |
| Human.mastadenovirus.E | 6 | 32080 | 6 | 66 | 0.5 | 0.0833 | 0.1429 |
| Influenza.C.virus_PB1 | 15 | 32143 | 0 | 0 | 1.0 | 1.0 | 1.0 |
| Influenza.C.virus_PB2 | 13 | 32140 | 3 | 2 | 0.8125 | 0.8667 | 0.8387 |
| Influenza.C.virus_P3 | 12 | 32143 | 0 | 3 | 1.0 | 0.8 | 0.8889 |
| Human.mastadenovirus.F | 48 | 31943 | 9 | 158 | 0.8421 | 0.233 | 0.365 |
| Influenza.C.virus_NP | 11 | 32143 | 0 | 4 | 1.0 | 0.7333 | 0.8462 |
| Enterovirus.G | 8 | 32092 | 9 | 49 | 0.4706 | 0.1404 | 0.2162 |
| Human.respirovirus.1 | 50 | 32084 | 9 | 15 | 0.8475 | 0.7692 | 0.8065 |
| Monkeypox.viru | 175 | 31983 | 0 | 0 | 1.0 | 1.0 | 1.0 |
There are two methods you can use to input data you want to predict: either you paste FASTA-format text in the "Input FASTA" area , or upload FASTA file from your disk folder via the button marked with red line.
Fig.1 shows e-mail textbox, checkbox for "Terms of Use" agreement and submit button...etc.
Here we ONLY accept FASTA-format text, you can paste them in the text area. (Fig.2)
Every sample starts with a right angle bracket (>) and its ID or other descriptions, followed by the original sequence. Of course you can input many samples, just keep in mind to start another one with a right angle bracket.
Or you can upload the FASTA file by pressing the "Fileupload" button, an "Open file" window will pop-up when you do so, then you can choose a FASTA file from your disk and click "Open" to upload. (Fig.3)
If you left email address before pressing 'Submit', you would find a system notification in your mail inbox. Clicking the hyperlink will lead you to the result page. (Fig.5)
Or your web browser would take you to the result page after waiting for a while, if you didn't provide a mail address.
There are 8 columns in total, ID, Length, Species(1st-hit), eval-score of 1st-hit, Species(2nd-hit), eval-score of 2nd-hit, Strain and BlastBestHit. (Fig.6)
You can also download 3 files in the result page: result, submission and log file.
Copyright © 2020 Institute of Information Science, Academia Sinica, TAIWAN. |
All Rights reserved. |