6.1 Train/Validation/Test (Holdout validation)

6.2 Accuracy, Precision, Recall

6.2.1 Confusion Matrix

6.3 Perplexity (ppl)

6.4 K-fold cross validation

6.4.1 LOOCV (Leave only one croos validation)

6.5 Benchmark, LMSYS LLM Arena(ELO)