Beating Neural Scaling Raw 뭐시기 논문
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data (2019.11)
msft - Phi3
SemDeDup: Data-efficient learning at web-scale through semantic deduplication (2023.04)
Deep Learning on a Data Diet: Finding Important Examples Early in Training (2021.07)