**TAGS :**

**Viewed:**8 -

**Published at:**a few seconds ago

#### [ Matlab Regularized Logistic Regression - how to compute gradient ]

I am currently taking Machine Learning on the Coursera platform and I am trying to implement Logistic Regression. To implement Logistic Regression, I am using gradient descent to minimize the cost function and I am to write a function called `costFunctionReg.m`

that returns both the cost and the gradient of each parameter evaluated at the current set of parameters.

The problem is better described below:

**My cost function is working, but the gradient function is not.** Please note that I would prefer to implement this using looping, rather than element-by-element operations.

I am computing `theta[0]`

(in MATLAB, `theta(1)`

) separately as it is not being regularized, i.e. we do not use the first term (with `lambda`

).

```
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta); %number of parameters (features)
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
% ----------------------1. Compute the cost-------------------
%hypothesis
h = sigmoid(X * theta);
for i = 1 : m
% The cost for the ith term before regularization
J = J - ( y(i) * log(h(i)) ) - ( (1 - y(i)) * log(1 - h(i)) );
% Adding regularization term
for j = 2 : n
J = J + (lambda / (2*m) ) * ( theta(j) )^2;
end
end
J = J/m;
% ----------------------2. Compute the gradients-------------------
%not regularizing theta[0] i.e. theta(1) in matlab
j = 1;
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j);
end
for j = 2 : n
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j) + lambda * theta(j);
end
end
grad = (1/m) * grad;
% =============================================================
end
```

What am I doing wrong?

# Answer 1

The way you are applying regularization is incorrect. You add regularization **after** you sum over all training examples but instead you are adding regularization **after each example**. If you left your code as it was before the correction, you are inadvertently making the gradient step larger and will eventually overshoot the solution. This overshooting will accumulate and will inevitably give you a gradient vector of `Inf`

or `-Inf`

for all components (except for the bias term).

Simply put, place your `lambda*theta(j)`

statement after the second `for`

loop terminates:

```
for j = 2 : n
for i = 1 : m
grad(j) = grad(j) + ( h(i) - y(i) ) * X(i,j); % Change
end
grad(j) = grad(j) + lambda * theta(j); % Change
end
```