I'm training a three-layer model, and I get an error when using loss.backward() to calculate the gradient during the training process.**RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).**However I am not using backward() twice in the same round of training.My training process is as follows:
for _ in tqdm(range(inner_iters)): idx_batch_eval = torch.randint(low=0, high=idx_range, size=(batch_size,), dtype=torch.int32) train_states = target_set[idx_batch_eval] train_level = c train_roa_labels = target_labels[idx_batch_eval] # 确定他们的标签 class_weights, class_counts = balanced_class_weights(train_roa_labels.astype(bool), scale_by_total=True) def loss(value,nn_train_value): class_labels = 2 * train_roa_labels - 1 decision_distance = train_level - value class_labels_torch = torch.tensor(class_labels, dtype=torch.float64) class_weights_torch = torch.tensor(class_weights, dtype=torch.float64) # 创建一个300行1列的全0矩阵,数据类型为float64 zero_matrix = torch.zeros((300, 1), dtype=torch.float64) classifier_loss = class_weights_torch * torch.maximum(- class_labels_torch * decision_distance, zero_matrix) tf_dv_nn_train = nn_train_value - value stop_gra=value.detach() train_roa_labels_torch = torch.tensor(train_roa_labels, dtype=torch.float64) decrease_loss = train_roa_labels_torch * torch.maximum(tf_dv_nn_train, torch.zeros([300,1])) / (stop_gra + OPTIONS.eps) # decrease_loss = train_roa_labels * tf.maximum(tf_dv_nn_train, 0) res = (classifier_loss + lagrange_multiplier * decrease_loss).mean() return res values = lyapunov_nn.lyapunov_function.forward(train_states) nn_train_value = lyapunov_nn.lyapunov_function.forward(dynamics.build_evaluation(train_states)) objective = loss(values, nn_train_value) optimizer.zero_grad() objective.backward() optimizer.step()
The structure of a neural network is simple:
kernel_0 = nn.Linear(4, 256)kernel_1 = nn.Linear(256,256)kernel_2 = nn.Linear(256,256)
I tried to save the computed graph but normally this training doesn't need to save the computed graph, after using .backward(retain_graph=True) the error is reported as RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [256, 256]], which is output 0 of AsStridedBackward0, is at version 3; expected version 2 instead.
Please teach me how to find the tensor which makes the errors.