WanDB | Notion

ref : https://greenjade.tistory.com/80 ba2.3.6.5.3_1. title: wandb 의 그룹(group)은 유연하지 않다. 데이터셋은 경로가 아닌 최종 경로명을 저장하고, run 이름은 포맷을 지키고, 태그는 사용하지 마라.

🔸Wandb login & init

새로운 프로젝트 시작 + config 설정

🔸Logging

ref : https://greeksharifa.github.io/references/2020/06/10/wandb-usage/#google_vignette

run.log()

🔸Watch - 모델 가중치 및 그라디언트 로깅

모델을 자동으로 트래킹하고 로깅하는 기능을 제공. gradient, weights 등을 시각화하고 트래킹한다.
- parameters : 각 layer의 weights and bias 분포를 시각화
  
  각 training epoch에서 해당 layer의 weights, bias 값의 분포를 시각화한다.
  - 해석: weights의 경우 layer 내 weight 값의 범위가 좁고, -0.1~0.1 처럼 작은 값을 유지한다면 안정적인 학습을 의미한다. 또한 전체 layer 간 weight 분포들이 극단적인 분포 없이 비슷할 때 안정적인 학습을 의미한다. bias의 경우 과도하게 한 방향으로 치우치지 않고 다양한 값을 유지하거나 조금 변화해도 괜찮다.
- gradients : 각 layer의 weights and bias에 대한 gradient 값의 분포를 시각화
  - 해석: 각 layer의 gradients 분포가 0 근처로 수렴하는 것이 안정적인 학습을 의미하고, 지나치게 0에 가깝고 parameters의 weights 분포가 layer를 지남에 따라 변화가 없으면 vanishing gradient를 의심할 필요가 있다. layer 간 gradients 값의 분포도 지나치게 차이나면 특정 parameter가 값이 큰 경우이므로 overfitting을 의심해야 한다. bias의 gradient는 다양한 분포를 갖는 것이 일반적이다.
run.watch
- model = model 객체
- criterion = torch.F로 정의된 loss function
- log = ‘all’, ‘gradients’, ‘parameters’
- log_freq: int(=1000) = batch에서 gradients, parameters를 기록하는 주기
- log_graph (=False) = 모델의 계산 그래프를 기록할지
```
# wandb에 모델의 weight & bias, graident를 시각화합니다.
run.watch(model, criterion, log = 'all', log_graph = True)
### run.watch() 다음에 모델 학습 코드 작성
for e in range(epochs):
	...
```

🔸Sweep - 하이퍼파라미터 튜닝

program : 딥러닝 학습을 위한 python 스크립트 기입

sweep configuration : yaml 파일이나 딕셔너리로 설정

method : grid, random, bayes 중 하이퍼 파라미터 탐색 기법 설정
parameters : 하이퍼 파라미터에 대해서 탐색해야하는 범위를 설정

# example
sweep_configuration = {
    'method': 'bayes',
    'metric': {'goal': 'maximize', 'name': 'valid_accuracy'},
    'parameters': {
        'lr': {'min': 0.0001, 'max': 0.01},
        'dropout_ratio': {'values': [0.1, 0.2, 0.3]},
        'weight_decay': {'min': 0.00001, 'max': 0.01}
        }
}

wandb.sweep() : sweep_id를 출력한다.

wandb.agent() : sweep을 실행한다.

sweep_id : sweep에 대한 unique한 configuration을 wandb.sweep()으로 만들어 전달한다.
function : program, 하이퍼파라미터 튜닝을 위한 python 함수
entity : wandb username + organization name
project : wandb 프로젝트 이름
count : 시도할 sweep 횟수. → 10이면 wandb.config.parameters의 하이퍼파라미터 조합을 10번만 실험하고 종료.

https://docs.wandb.ai/ref/python/agent/#docusaurus_skipToContent_fallback

project_name = 'test-mnist'
sweep_id = wandb.sweep(
      sweep = sweep_configuration,
      project = project_name
)
def run_sweep(): # sweep agent가 실행할 함수를 설정한다.
  num_epochs = 100
  patience = 3
  model_name = 'exp1'

  run = wandb.init(
	  entity= ,
	  project='',
	  name='', # name에 실험하는 hyperparam을 작성해서 나중에 보기 편하게 해야 한다.
	  ...
  )

  model = # 모델 정의
  model.weight_initialization()
  device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
  model = model.to(device)

  criterion = # loss function 정의
  # hyperparameter tuning을 진행할 것들은 wandb.config로 불러오기
  optimizer = optim.Adam(model.parameters(), lr = wandb.config.lr, weight_decay = wandb.config.weight_decay)

  # wandb에 모델의 weight & bias, graident를 시각화합니다.
  run.watch(model, criterion, log = 'all')
  model, valid_max_accuracy = training_loop(model, train_loader, val_loader, ...) # 학습 함수
  return model, valid_max_accuracy # (=best validation score)

wandb.agent(sweep_id, function=run_sweep, count=10)

🔸Finish

run.finish()