从import torch开始,torch对整个文件就开始构建一张 计算图 每个tensor都是这张计算图中的一个节点 从执行backword的tensor开始,反向更新tensor的grad属性, 也就是梯度, 梯度是tensor的一个属性
问题代码 x = torch.exp(x-x.max()) for i in range(batch_size): x[i]=x[i]/x[i].sum() 问题在于 x[i]=x[i]/x[i].sum() 在梯度计算中,不允许内部修改,修改x[i],就是修改了x内部的数据 解决办法是使用 detech() 或者 clone(), 修改整个x是可以的,使用新的变量也是可以的,下面就是使用了新变量 下面的代码给为个警告,说torch.tensor([])将作为常量计算 #统一转换到负数(非正),这样exp运算后也不会出现极大的数 x = torch.exp(x-x.max()) out = torch.tensor([]) for i in range(batch_size): row = (x[i]/x[i].sum()).unsqueeze(dim=0) out = torch.cat((out, row), dim=0) 最终代码 x = torch.exp(x-x.max()) out = (x[0]/x[0].sum()).unsqueeze(dim=0) for i in range(1,batch_size): row = (x[i]/x[i].sum()).unsqueeze(dim=0) out = torch.cat((out, row), dim=0) |
|
|
|
|
标量 浮点 可导
import torch a=torch.tensor(1.0,requires_grad=True) a.ndim 0 a.backward()
tensor设置requires_grad=True才会有grad属性,进而才涉及backward,否则就无从谈起backward
import torch a = torch.tensor([1., 2., 3.], requires_grad=True) print(a.grad) # None out = a.cos() out.sum().backward() #backward方法更新了计算图中所有与out相关的tensor的grad属性 print(a.grad) #tensor([-0.8415, -0.9093, -0.1411])
backward不更新不相关tensor的grad
import torch a = torch.tensor([1., 2., 3.], requires_grad=True) print(a.grad) # None b = torch.tensor([1., 2., 3.], requires_grad=True) out = a.cos() out.sum().backward() print(a.grad) #tensor([-0.8415, -0.9093, -0.1411]) print(b.grad) #None
requires_grad=True在数据上建立了一份视图,不能再用in-place方式修改原数据
import torch a = torch.tensor([1., 2., 3.], requires_grad=True) a[2]=1 --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /tmp/ipykernel_389/2764311836.py in module 2 3 a = torch.tensor([1., 2., 3.], requires_grad=True) ----> 4 a[2]=1 RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
整个模型计算中,只有模型参数有梯度,数据是没有梯度的
import torch from torch import nn class Model(torch.nn.Module): def __init__(self,in_feature=1,out_feature=3): super().__init__() self.line_layer = nn.Linear(in_features=in_feature,out_features=out_feature) def forward(self,x): x = self.line_layer(x) return x model = Model(in_feature=1,out_feature=3) for param in model.parameters(): print(param)
Parameter containing: tensor([[-0.3010], [-0.1164], [-0.1319]], requires_grad=True) Parameter containing: tensor([-0.0333, -0.6954, -0.8287], requires_grad=True)
x = torch.randn(2,1) y_pred = model(x) print(y_pred.shape) #torch.Size([2, 3]) label= torch.tensor([0,0]).unsqueeze(dim=1).long() print(label.shape) #torch.Size([2, 1])
def loss_fn(model_out,label): _mean = (model_out - label).mean() return _mean
backward只是计算梯度grad,根据梯度修改模型参数的工作由优化器完成
loss = loss_fn(y_pred,label) loss.backward()
torch.detach()
返回一个新的tensor,从当前计算图中分离下来的, 但是仍指向原变量的存放位置, 不同之处只是requires_grad为false, 得到的这个tensor永远不需要计算其梯度,不具有grad。
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach()
torch.as_tensor(data=x).float() 替换 torch.tensor(data=x).float()
pytorch:.detach()、.detach_()的作用和区别