In the documentation of torch.nn.functional.linear (https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html), the dimensions of the weight input are (out_features, in_features) then the wight matrix is transposed when computing the output: y=xA^T+b. Why are they doing this instead of taking a matrix W of dimensions (in_features, out_features) and doing y=xW+b?
By doing y=xW+b the dimensions will match and so I cannot find a clear reason for the above.