LLM Intro | Notion

🔸LLM?

LLM = Foundation Model
창발성 (emergent ability) : 단일 모델로 여러 task를 처리할 수 있는 능력
LLM의 핵심 = Human Alignment (=Human Feedback)

🔸LLM Scaling Law

DeepMind - Training Compute-Optimal Large Laguage Models
- DeepMind의 Chinchilla 모델은 모델 크기는 줄이고, 학습 데이터 크기를 늘렸다. 다양한 task에서 175B 이상의 모델들보다 좋은 성능을 보였다.
Scaling Laws vs Model Architecture: How does Inductive Bias Influence Scaling
- vanilla transformer와 다른 transformer 변형 모델들을 175B까지 늘리고 그 성능을 실험해본 결과, model architecture보다는 데이터가 중요하다는 결론을 낸 논문.
사전학습 모델의 크기가 중요함을 강조하는 논문 : LIMA:Less is More for Alignment, The False Promise of Imitating Proprietary LLMs

🔸Multimodal

PaLM-E (google) , Kosmos-1 & Kosmos-2 (microsoft) , GPT-4 (OpenAI) , Gemini (google) , IMAGEBIND (meta), …

🔸Synthetic Data

ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks : LLM을 이용한 labeling이 사람보다 낫다는 논문.

🔸ETC

Domain specialized
Evaluation
Prompt Engineering : CoT, PoT(Program-of-Thought: 프롬프트를 파이썬 코드로 변환시켜 문제를 푸는 방법), Zero-shot Reasoner
Prompt Manager (Cross Function Modality) : prompt manager는 user prompt에 따라 어떤 모델(언어 LLM, Vision LLM 등)을 호출할지 결정함.
Function call