Parameter Logging in DL/ML Models
Parameter Logging in DL/ML Models: Part 1
Overview
Analyzing the model metric is one of the most crucial tasks while training any machine Learning or Deep Learning model. It gives us the capability to diagnose the model statistics if model prediction is not up to the mark. In this article I will be discussing how we can log the model metric from model architecture, hardware metric and epoch data in Wandb.ai.
Scope
In this article I will discuss Wandb.ai. Wandb.ai is an experiment logging 3rd party platform which enables us to log Machine Learning (ML) or Deep Learning (DL) model data, training information, dataset, model weights and many other useful information related to the model which we have trained. The following points are discussed in this article:
-
Wandb.ai account setup and logging in
-
Connection out notebook/IDE with Wandb.ai
-
Saving the model configuration in Wandb.ai Dashboard
-
Logging epoch information while training the model.
-
Visualization of the model statistic in the Wandb.ai dashboard
Note : This article is for the readers who are comfortable with colab, Jupyter notebooks, Python Script and fundamental concepts of Keras Apis such as Loading Datasets, Creating Model, Training model and Callbacks.
Experiment Logging
Logging the parameters of the model while training is a very crucial part of the model optimisation. After the training session, we should analyze the models parameters as well as the metrics based on the model use case (i.e. in which sector model will be deployed and what metrics we need to focus more as compared to others) visually as well as statistically in order to analyze the model performance for improvement. ML/DL framework has inbuilt options such as callbacks, csv logger or manually extraction of the model metric into csv/excel file and visualizing them manually. But it gives us limited power since we have to continuously monitor each epoch cycle of the model training process and if the model stops training due to some error or some unfortunate circumstances we lose all the progress and we have to start again. To counter this problem I have designed this article in which we will walk through the python library known as wandb.ai - an experiment tracking, dataset management and versioning tool. Wandb.ai gives us the capability to log the model parameter as well as to visualize the metrics in order to get the model insight for further optimization and inference.
- Wandb.ai Account (Free/Paid)
- IDE/Colab/Jupyter Notebook
This article is structured in a step by step process on how to integrate wandb.ai with Jypyter notebook. The same procedure can be followed in case of python script. The Steps are discussed below:
Step 1 : Creating an account on wandb.ai.
Since, Wandb.ai is a third party app, which is available for free as well as paid version. First we need to create our account with our gmail account. The steps are mentioned below:
- Open the browser and navigate to Link
- Click on the Signup button if you are registering it for the first time else you can directly log in if you have an existing account with wandb.ai.
- If you are a new user then you need to create one account by submitting the details such as email, name and other relevant information. In this case, I have already created my account, so I will login directly.
- Choose the option in which you are comfortable, for me it is Sign in with Google i will choose this option and proceed further.
Step 2 : Setting up Wandb.ai Project and getting API key
-
After successfully registering in the wandb.ai you will be redirected to homepage where Overview, Project and Likes options will be available. Select the Project tab and click on the Create new Project button.
-
Since the scope of this article is to explain how to use the wandb.ai so we will create a new project. After clicking the Create new project button it will take you to the next page shown below. You will have to give the name of your project, I have given Mnist because I will be explaining on MNIST Dataset and click on Create Project button.
- After creating the Project you will be directed to a page which will have basic information of how to integrate and run the wandb.ai with API key in Jupyter notebook. The below image shows the page on which you will be redirecting after Creating project with Api key, copy the Api key and we will use it in our Colab notebook.
Step 3 : Connecting Wandb.ai with Colab
In this section, we will discuss the integration of wandb.ai with Colab, how we can integrate the wandb.ai experiment logging with Colab and a little bit of the workflow of the wanddb.ai.
- Integration of Wandb.ai and Tensorflow Installation
As in Section 2, you will have API key for specific project, open the colab notebook, start the session and add the following code snippets:
These snippets will install the required libraries associated with wandb.ai along with Tensorflow are running this snippets the following output will appear in Colab.
!pip install wandb --q
!pip install tensorflow --q
The next step is to login via your Colab into the wandb.ai with your Api key and connect the Colab with you account. The below script will login in the wandb.ai and ask for the authentication i.e. Api key.
!wandb login
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
After running the above code snippets you will the input box for submitting the Api key, fill the Api key and hit enter the following output.
Now after successfully initialization of the session, we can now start writing our Keras Model for logging the metrics.
import numpy as np
import wandb
from wandb.keras import WandbCallback
from tensorflow import keras
from tensorflow.keras import layers
The above snippets are used to import the required libraries for keras and wandb.ai so that the classes and functions can be implemented in our code. Next step is to create our configuration variable i.e. dictionary. The config variable will have model parameters which will be used in model parameters as well. It will be logged in the wandb.ai also. The below snippets show the config variable and model parameters.
config = dict (
num_classes=10,
input_shape=(28, 28, 1),
normalize_factor=255,
Layer1_Conv2d_filter=32,
Conv2d_kernel_size=(3,3),
Layer1_Conv2d_activation_size="relu",
max_pool_size=(2,2),
Layer2_Conv2d_filter=64,
Droupout=0.5,
Layer2_Conv2d_activation_size="relu",
classification_activation="softmax",
batch_size = 128,
epochs = 15,
loss="categorical_crossentropy",
optimizer="adam",
validation_split=0.1)
Now, we have to initialize our session for this we need to run the following commands with the config variable along with our username and project name (entity) as shown below in the snippets.
wandb.init(project="Jovian_Article", entity="happyman",config =config )
After executing the above snippets, you will see the session name which is initialized randomly and path of the wandb.ai log which is send to the wandb.ai platform for tracking. Basically, wandb.ai creates its own logs file structure by collecting the environment variable of Colab notebook. These environment variables are then sent to the via Api to the wandb.ai server for populating the dashboard of associated projects.
If you will open the wandb.ai dashboard you will see the CPU utilization metric and Configuration variable as shown below.
Now, session has sucessfully started and config variables i.e. model parameter is sucessfully uploaded now it the time to start preprocessing the dataset.
If you will open the wandb.ai dashboard you will see the CPU utilisation metric and Configuration variable as shown below.
Step 4 : Dataset Preprocessing
For the explanation purpose, I have used MNIST-Digit dataset which consists of 60,000 greyscale images of handwritten digits from 0-1. The dataset is splitted into two section i.e. Train set which consists of 50,000 images and Test set which consist of 10,000 images. I have also preprocessed the dataset by normalization and converting the labels into categorical values. The below code snippets depict the preprocessing steps of the MNIST Digit dataset.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 1s 0us/step
x_train = x_train.astype("float32") / config["normalize_factor"]
x_test = x_test.astype("float32") /config["normalize_factor"]
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, config["num_classes"])
y_test = keras.utils.to_categorical(y_test, config["num_classes"])
Step 5 : Keras Model
Now after preprocessing the dataset it's time for creation of the model which we will train. For sake of simplicity I have constructed a simple 2D-CNN model with 1 dense layer and initialized the model parameter using the config variable which is a dictionary. The explanation of the model creation and data preprocessing is out of the scope of this article. That is why I have not explained it in depth. The below code snippets depicts the model creation.
model = keras.Sequential(
[
keras.Input(shape=config["input_shape"]),
layers.Conv2D(config["Layer1_Conv2d_filter"], kernel_size=config["Conv2d_kernel_size"], activation=config["Layer1_Conv2d_activation_size"]),
layers.MaxPooling2D(pool_size=config["max_pool_size"]),
layers.Conv2D(config["Layer2_Conv2d_filter"], kernel_size=config["Conv2d_kernel_size"], activation=config["Layer2_Conv2d_activation_size"]),
layers.MaxPooling2D(pool_size=config["max_pool_size"]),
layers.Flatten(),
layers.Dropout(config["Droupout"]),
layers.Dense(config["num_classes"], activation=config["classification_activation"]),
]
)
After model creation we are logging the model summary in the config variable and displaying the summary in the outputs cell the snippets is shown below.
summary=model.summary()
config["model_summary"]=summary
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 32) 320
max_pooling2d (MaxPooling2D (None, 13, 13, 32) 0
)
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
max_pooling2d_1 (MaxPooling (None, 5, 5, 64) 0
2D)
flatten (Flatten) (None, 1600) 0
dropout (Dropout) (None, 1600) 0
dense (Dense) (None, 10) 16010
=================================================================
Total params: 34,826
Trainable params: 34,826
Non-trainable params: 0
_________________________________________________________________
Step 6 : Compiling and Trainning Model
In this section, we are going to compile our model. The hyperparameters/model parameters will be used from the config variable as discussed in the Section 4. The below snippets depicts the model compilation with config variable parameters.
model.compile(loss=config["loss"], optimizer=config["optimizer"], metrics=["accuracy"])
After successful compilation of the model, our final step is to train the model, with callback. Basically callback will be implemented in order to send the data of each epoch during model training to the wandb.ai which will be displayed in our project dashboard. The below snippets depicts the code for model training with wandb.ai.
history=model.fit(x_train, y_train, batch_size=config["batch_size"], epochs=config["epochs"], validation_split=config["validation_split"],callbacks=[WandbCallback()])
wandb: WARNING The save_model argument by default saves the model in the HDF5 format that cannot save custom objects like subclassed models and custom layers. This behavior will be deprecated in a future release in favor of the SavedModel format. Meanwhile, the HDF5 model is saved as W&B files and the SavedModel as W&B Artifacts.
Epoch 1/15
422/422 [==============================] - ETA: 0s - loss: 0.3768 - accuracy: 0.8868
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 49s 114ms/step - loss: 0.3768 - accuracy: 0.8868 - val_loss: 0.0852 - val_accuracy: 0.9785
Epoch 2/15
422/422 [==============================] - ETA: 0s - loss: 0.1160 - accuracy: 0.9650
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 48s 114ms/step - loss: 0.1160 - accuracy: 0.9650 - val_loss: 0.0600 - val_accuracy: 0.9817
Epoch 3/15
422/422 [==============================] - ETA: 0s - loss: 0.0867 - accuracy: 0.9739
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 49s 117ms/step - loss: 0.0867 - accuracy: 0.9739 - val_loss: 0.0474 - val_accuracy: 0.9878
Epoch 4/15
422/422 [==============================] - ETA: 0s - loss: 0.0733 - accuracy: 0.9776
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 51s 121ms/step - loss: 0.0733 - accuracy: 0.9776 - val_loss: 0.0439 - val_accuracy: 0.9882
Epoch 5/15
422/422 [==============================] - ETA: 0s - loss: 0.0636 - accuracy: 0.9804
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 52s 123ms/step - loss: 0.0636 - accuracy: 0.9804 - val_loss: 0.0403 - val_accuracy: 0.9883
Epoch 6/15
422/422 [==============================] - ETA: 0s - loss: 0.0572 - accuracy: 0.9821
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 50s 117ms/step - loss: 0.0572 - accuracy: 0.9821 - val_loss: 0.0371 - val_accuracy: 0.9908
Epoch 7/15
422/422 [==============================] - ETA: 0s - loss: 0.0534 - accuracy: 0.9835
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 48s 113ms/step - loss: 0.0534 - accuracy: 0.9835 - val_loss: 0.0336 - val_accuracy: 0.9917
Epoch 8/15
422/422 [==============================] - ETA: 0s - loss: 0.0497 - accuracy: 0.9846
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 49s 117ms/step - loss: 0.0497 - accuracy: 0.9846 - val_loss: 0.0334 - val_accuracy: 0.9915
Epoch 9/15
422/422 [==============================] - ETA: 0s - loss: 0.0450 - accuracy: 0.9859
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 51s 121ms/step - loss: 0.0450 - accuracy: 0.9859 - val_loss: 0.0306 - val_accuracy: 0.9915
Epoch 10/15
422/422 [==============================] - ETA: 0s - loss: 0.0432 - accuracy: 0.9865
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 50s 118ms/step - loss: 0.0432 - accuracy: 0.9865 - val_loss: 0.0301 - val_accuracy: 0.9927
Epoch 11/15
422/422 [==============================] - 48s 113ms/step - loss: 0.0413 - accuracy: 0.9870 - val_loss: 0.0317 - val_accuracy: 0.9917
Epoch 12/15
422/422 [==============================] - 50s 118ms/step - loss: 0.0390 - accuracy: 0.9876 - val_loss: 0.0301 - val_accuracy: 0.9917
Epoch 13/15
422/422 [==============================] - ETA: 0s - loss: 0.0369 - accuracy: 0.9880
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 50s 119ms/step - loss: 0.0369 - accuracy: 0.9880 - val_loss: 0.0287 - val_accuracy: 0.9928
Epoch 14/15
422/422 [==============================] - ETA: 0s - loss: 0.0348 - accuracy: 0.9893
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 2 of 2). These functions will not be directly callable after loading.
wandb: Adding directory to artifact (/content/wandb/run-20221121_063632-259vi3mi/files/model-best)... Done. 0.1s
422/422 [==============================] - 51s 122ms/step - loss: 0.0348 - accuracy: 0.9893 - val_loss: 0.0259 - val_accuracy: 0.9927
Epoch 15/15
422/422 [==============================] - 50s 119ms/step - loss: 0.0341 - accuracy: 0.9891 - val_loss: 0.0292 - val_accuracy: 0.9927
Step 7 : Finishing Run Session
After successfully running the training process its time to close the session of wandb.ai by running the below snippets.
The following output will show with the metric statistics and graphs.
wandb.finish()
Step 8 : Wanddb.ai Dashboard
Now its time to log-in in the wandb.ai project session dashboard and see the results.
First, we will start with the config variable which we saved initially while starting the session. For this you need to navigate to the Project > Session > and information Panel, you will be able to see the config variable which is shown below.
We have also saved the model summary in the config variable so for this we need to go to the Project> Session > and Model Panel and we can analyse the model summary.
And finally, if you navigate to the Project > Session> and Graphs you can easily see the plotted graphs of the metrics which were used by the model.
In this article we have seen the basics of wandb.ai experimental metric logging. In the next article we will implement in detail how we can upload dataset and download it for training model as well as logging custom metrics in wanddb.ai as well as modifying graphs.
Special Thanks
As we say Car is useless if it doesn’t have a good engine similarly student is useless without proper guidance and motivation. I will like to thank my Guru as well as my Idol “Dr. P. Supraja” and “A. Helen Victoria”- guided me throughout the journey, from the bottom of my heart. As a Guru, she has lighted the best available path for me, motivated me whenever I encountered failure or roadblock- without her support and motivation this was an impossible task for me.
Conclusion
In this article, I have discussed how to set up our IDE or Jupyter notebook with Wandb.ai for logging the parameters. Some important point from the article is mentioned below:
- The experiment logging works by adding the callback while training the model.
- We can visualize metrics loss and architecture of the model in Wandb.ai
- For every run new session is created in Wandb.ai
- Wandb.ai is compatible with Sklearn Pytorch, and Keras as well.