How do epochs and learning rates work together? How should you pick the best combination?

While you training your model you can manipulate with just several things (running a bit ahead of the tutor, sorry) :

  • source data (you can invent several new columns that are computed/extracted from other columns)
  • loss function (you can chose best function)
  • structure of you model (combination of layers)
  • batch size
  • training scheme (combination of learning rates and a number of epochs)

As I see it, you need to find such a combination of learning rate and number of epochs, that your model starts to learn from dataset. Errors it returns on each epoch should get lower and lower in general. Local peaks are possible though.
Here is a code, that shows the process of learning/training (it was in the last lection):

val_loss = evaluate(model, val_loader)
val_loss_history = [result] + history1 + history2 + history3 + history4 + history5
val_loss_list = [vl['val_loss'] for vl in val_loss_history]
plt.plot(val_loss_list, '-g')
plt.ylabel('val_loss (log10 scale)')
plt.yscale('log')  # linear scale is not very representative
plt.title('Val_loss vs. training epochs')

Learning rate is like a zoom-factor in you photocamera.
When you make photos you generally start with a small zoom - an easy way to locate an object.
Then you are zooming more and more to make object fill almost all your frame.

My guess: it’s better to start from such learning rate, that show signs of convergence. Error should get smaller and smaller. You train for some epochs, while your errors decrease. After that you need to decrease your learning rate, and train for some more epochs (while it shows signs of convergence).
Back side of the medal, your model can get overtrained. It will memorize all info from training set but will fail to forecast on validation/test subset.
Courtesy- @worminhole

1 Like