6.1 Train/Validation/Test (Holdout validation)
6.2 Accuracy, Precision, Recall
6.2.1 Confusion Matrix
6.3 Perplexity (ppl)
6.4 K-fold cross validation
6.4.1 LOOCV (Leave only one croos validation)
6.5 Benchmark, LMSYS LLM Arena(ELO)