Prediction of Coreceptor Usage for HIV-1
Build your own classifiers with this training data or your own.
The classifiers available below all predict coreceptor usage with near 90% accuracy. According to our experiments, the SVM (support vector machine) is the most accurate, however, the rules generated by the other algorithms are much more comprehensible.
Instructions for obtaining predictions of coreceptor usage for your HIV-1 sequences:
- Extract the V3 region from your HIV sequences.
- Align the V3 sequences to the following consensus sequence built from the training data obtained from Los Alamos
- CTRPNNNT-RK*I*I--GPG*AFY*-TG*I-IGDIRQAHC
- * indicates that it could be anything, - indicates a gap
- Make sure your sequence comes out the same length (40 bps)
- The better your alignment, the more reliable our prediction
- Pay particular attention to position 12 in the alignment
- Upload a fasta file with your sequences included in the following format
> class ID1
CTRPNNNT-RKRISL--GPGRVFYT-TGEI-IGDIRKAHC
- To obtain this format, starting with a normal fasta file, replace all ">" with "> class "
- Here is a sample file to emulate. (Right click or option click to download. If you just click on this file and then copy the results displayed in your web browser to a file you will have problems because the line breaks will have been removed).
- Make sure you end up with a fasta file extension. (Some browsers will rename extensions when downloading). The sample file should end up "sample.fas". When uploading your own file, please give it a similar extension (ie test.fas);
- Make sure that your sequences do not contain any * characters. At the moment this will crash the program...
For more information on these coreceptor classifiers: C4.5 ,C4.5 with p8-p12, PART , SVM , Charge Rule
Contact Benjamin Good at bgood@vapop.ucsd.edu