- In most learning networks, error is calculated as the difference between the actual output and the predicted output.
- The error function is which tells us how far are we from the solution.
- The function that is used to compute this error is known as loss function.
- Different loss functions will give different errors for the same prediction and thus would have a considerable effort on the performance of the model.
Imagine, we are standing on top of a mountain(mount Everest) and we want to descend.It is not that easy and it is cloudy and it is big and we cant see the big picture.We would look at all the possible directions where we can walk.
Note:
If we constantly take step to decrease the error by decreasing the height then we would reach all the way down the mountain.
KEY MATRIX
- In this case the key matrix that we use to solve the problem is the height.
- We can call the height the error.This would say that how badly we are doing at the mountain and how far we are then an ideal solution.
- We might also be stuck in a valley.This would often happen in solving real world problems. We have to resolve this issue.Many time the local min would give a pretty good solution to the problem.This method is called gradient descend
GOAL TO SPLIT DATA
How do we tell the computer how far are they from a perfect solution?
We can count the no of mistakes. (example that is our height)
Now let us try to decrease the no of errors
SOLUTION
1) After moving once step we had decreased the errors
2) After moving another steps we have decreased all the errors
ISSUE
- The problem with this approach is that,algorithm would be taking very small steps.The reason for this is calculus. Tiny steps would be calculated by derivatives.
- The problem with small steps is:
- We start with 2 errors
- We move a small amount
- We are still at 2 errors
- Even after moving a tiny amount we are at 2 error
This is equivalent to using gradient descend from an aesthetic pyramid with flat steps.
DISCRETE
- If we are standing above and looking for errors, then we would always get 2 errors and we would get confused what to do.
CONTINUOUS
- In this case we can figure out which direction we can decrease the most.
- In math terms, in order for us , it means to do the gradient descent, our error function cannot be discrete. It should be continuos
LOG LOSS ERROR FUNCTION
As shown above in the figure, we have 6 points, out of which 4 are correctly classified and other 2 are incorrectly classified.
- Assuming error function would give penalty to incorrectly classified points an small penalty to the 4 correctly specified points
- Here we are representing the size of the points as penalty
- Penalty is the distance from the boundary when the points are missclassified and 0 when they are correctly classified
- Lets add all the errors from the corresponding points
- The idea now is to move the line around to decrease these errors. In the fig below we have decreased the error.
SOLUTION
GRADIENT DESCENTERROR : In this example error is the sum of blue and red areas
- We explore around to see what direction brings us the down most or equivalent.
- We explore what direction we can move to reduce the error the most
- We take a step in that direction
- In the mountain we go one step down and in the graph we reduce the error a bit by classifying one of the points
- Now we look again and follow the steps described above
On the left we have reduced the height and have successfully descended from the mountain and on the right we have reduced the error to the min possible value and successfully classified our points
No comments:
Post a Comment