Lecture 1 - PyTorch Basics & Linear Regression

When the gradient is negative we increase the values of the weight by adding a portion of the gradient to the weights and when the gradient is positive we decrease the value of the weights by subtracting a portion of the gradient from the weights. Is my statement correct ?

1 Like

Where you able to resolve this issue, as I still have the same.

For anyone struggling with the online notebooks it may be worth a shot running them locally on your machine. I’ve thrown together a quick guide on how I installed on Windows using VSCode, if anyone is interested.


conda in windows is not very stable with Pytorch and specially when we needed to use GPUs, which is going to be the case for the next lectures. It has numerous compatibility issues with the cuda libraries necessary to make it work.

May I suggest you guys not to run things locally, otherwise you can spend a lot of time trying to configure and make things work instead of focusing on the code and the concepts from the lecture. I just forked the notebooks and I am running the notebooks right here on Jovian, which uses Binder as a kernel where things are already configured.

Just a suggestion from having spend numerous hours trying to run things on Windows.


Hello, I just finished going through the first note book 01-pytorch-basics and I was trying some stuff when I encountered this warning. I am just a beginner in pytorch.

What I did upto this point:

  • The next thing I did was to follow the github link and found this solution.

Use .retain_grad() if you want the gradient for a non-leaf Tensor. Or make sure you have the leaf Tensor if your have a non-leaf Tensor by mistake.

  • Then I searched about leaf and non leaf tensors and I could not find enough information, would be great if someone helps me with this also.

Steps to reproduce:

Warning message:

/srv/conda/envs/notebook/lib/python3.7/site-packages/torch/tensor.py:746: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won’t be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
warnings.warn("The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "


About questions at “Further Reading”

I don’t know if answer could be:
“you can’t backward on many non-leaf tensors”, because “grad can be implicitly created only for scalar outputs” (what that means, I neither know)

So a solution could be execute => y.sum().backward()

But, is this what you expect? I’m trying to draw on this end questions.

How do you know if i watched the lecture 100% ?

what to do when binder take a long time to start an environment?

Yes! To minimise the loss as evident from the weight-loss graph.

I got it to work by running this command into the environment:
conda install -c defaults intel-openmp -f

Also make sure Conda is updated to the latest version.


It’s all about being genuine. If you haven’t watched lecture. Your won’t be able to work on assignments if you’re a beginner . One thing leads to another @anis-bensaci8

It is for us to learn. That’s why its free !!!
Getting only certificate doesn’t mean anything if you can’t apply the skills.


That should not cause an issue @edsenmichaelcy. Depending on your operating system, some underlying dependencies of PyTorch/Jupyter may not get installed on your system. But the notebook should work

Hi @alvertosk84 and @danny thanks for reporting your errors. Try the solution shared by @Luay conda install -c defaults intel-openmp -f

@jazz215 There’s no confirmation. Also, we’ve made marking attendance optional for lecture 1. cc @viratsatheesh29

We have an option to vote so that you say that you have watched the video and if you have the knowledge already you can complete the assignment

1 Like

@kumarsuraj9450 Binder takes 2-10 minutes to install your dependencies. There may also be some queueing time since it is a free service. Generally, I click “Run on Binder” as soon as I open a notebook, and then I read through the notebook on Jovian, while it loads up on Binder. Hope that helps!

Yes, as far as I know, you can only run .backward on scalars (i.e. numbers)

1 Like

@PrajwalPrashanth I fix it already Thanks :smile: :smile: :smile: :smile:

May I ask in the chapter of liner-regression how do we get W^11 and b^11 such as

yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1

I don’t understand much about how to get W11 and B1

@edsenmichaelcy This is an assumption we are making. That the yield of apples (output variable) is assumed to be a weighted sum of the temperature, rainfall and humidity (input variables). The numbers w11, w12 etc. are the weights we give to give to each input variable. The goal of linear regression is to figure out a good set of weights

1 Like

Another reason for squaring the loss is that the absolute value function is not differentiable at its minima/maxima (it’s a pointed edge) whereas a quadratic function is differentiable at all points.

1 Like

Thanks for giving a detailed description @akashdeep-ghosh!

Basically it goes something like this:

  • If you track the dependencies, you’ll realize y depends on (x,w,b), z depends on y, and w depends on z.
  • This is why “w” is called a leaf node (nothing depends on it) and “z” and “y” are non-leaf nodes.
  • When you call w.backward(), the Pytorch implicitly calls z.backward and y.backward to calculate w.grad and b.grad (this is the chain rule of differentiation if you remember from calculus)
  • Now, the warning simply indicates that you can only call .backward on a variable exactly once. So, if you call y.backward(), then you can no longer call w.backward() since it calls y.backward internally.

@jonathanloscalzo hope this also answers your question from the “Further” reading section.

Indeed, windows has caused us lot of issues in the past. That’s the main reason we run everything on the cloud.

Technically, we always subtract the gradient.

  • Suppose weight is 1.5 and gradient is .5 (positive). Then, we need to decrease the weight to decrease the loss, right? By subtracting gradient, new weight = 1.5 - .5 = 1
  • Suppose weight is 1.5 and gradient is -0.5 (negative). Then, we need to increase the weight to decrease the loss, right? By subtracting gradient, new weight = 1.5 - (-0.5) = 1.5 + .5 = 2

Thanks i got it :grin: