DeepMind - Training Compute-Optimal Large Laguage Models
Scaling Laws vs Model Architecture: How does Inductive Bias Influence Scaling
사전학습 모델의 크기가 중요함을 강조하는 논문 : LIMA:Less is More for Alignment, The False Promise of Imitating Proprietary LLMs