In stratified 10-fold cross-validation, the dataset is randomly divided into 10 sets with approximately equal size and class distributions. For each “fold”, the classifier is trained using all but one of the 10 groups and then tested on the unseen group. This procedure is repeated for each of the 10 groups. The cross-validation score is the average performance across each of the ten training runs.
Cross-validation is a "hold out" technique for evaluating learning based classifiers. It is neccesary to evaluate ML classifiers on samples that have not been used to construct the classifier because estimates of classifier performance on unseen data are likely to be "hopelessly optimistic" when the same data is used for training as for testing. By separating training sets from test sets, it is possible to distinguish models that memorize or "overfit" the training set from models that discover consistent patterns in the data. Ideally, if there were enough samples, one set of data could be used for training, and another used for testing, however there is often not enough data to produce useful results with this technique. In order to evaluate classifier's likely performance on future data when training data is scarce, "hold out" methods are often employed. These methods repeatedly train the classifier on subsets of the available data and test it on subsets "held out" from training.
A high score in cross-validation means that:
Confidence in both of these claims is increased by the actual score and by the size of the dataset used. For more information about cross-validation and other issues in machine learning, you might try the book by the WEKA designers