Web Snippets: Error functions

In most learning networks, error is calculated as the difference between the actual output and the predicted output.
The error function is which tells us how far are we from the solution.
The function that is used to compute this error is known as loss function.
Different loss functions will give different errors for the same prediction and thus would have a considerable effort on the performance of the model.

EXAMPLE:
Imagine, we are standing on top of a mountain(mount Everest) and we want to descend.It is not that easy and it is cloudy and it is big and we cant see the big picture.We would look at all the possible directions where we can walk.

Note:
If we constantly take step to decrease the error by decreasing the height then we would reach all the way down the mountain.

KEY MATRIX

In this case the key matrix that we use to solve the problem is the height.
We can call the height the error.This would say that how badly we are doing at the mountain and how far we are then an ideal solution.

GRADIENT DESCENT

We might also be stuck in a valley.This would often happen in solving real world problems. We have to resolve this issue.Many time the local min would give a pretty good solution to the problem.This method is called gradient descend

GOAL TO SPLIT DATA

How do we tell the computer how far are they from a perfect solution?

We can count the no of mistakes. (example that is our height)

Now let us try to decrease the no of errors

SOLUTION

1) After moving once step we had decreased the errors

2) After moving another steps we have decreased all the errors

ISSUE

The problem with this approach is that,algorithm would be taking very small steps.The reason for this is calculus. Tiny steps would be calculated by derivatives.
The problem with small steps is:

We start with 2 errors
We move a small amount
We are still at 2 errors
Even after moving a tiny amount we are at 2 error

This is equivalent to using gradient descend from an aesthetic pyramid with flat steps.

DISCRETE

If we are standing above and looking for errors, then we would always get 2 errors and we would get confused what to do.

CONTINUOUS

In this case we can figure out which direction we can decrease the most.
In math terms, in order for us , it means to do the gradient descent, our error function cannot be discrete. It should be continuos

LOG LOSS ERROR FUNCTION

As shown above in the figure, we have 6 points, out of which 4 are correctly classified and other 2 are incorrectly classified.

Assuming error function would give penalty to incorrectly classified points an small penalty to the 4 correctly specified points
Here we are representing the size of the points as penalty

Note:

Penalty is the distance from the boundary when the points are missclassified and 0 when they are correctly classified
Lets add all the errors from the corresponding points
The idea now is to move the line around to decrease these errors. In the fig below we have decreased the error.

We can now use gradient descent to solve our problem

SOLUTION

GRADIENT DESCENT
ERROR : In this example error is the sum of blue and red areas

We explore around to see what direction brings us the down most or equivalent.
We explore what direction we can move to reduce the error the most
We take a step in that direction

In the mountain we go one step down and in the graph we reduce the error a bit by classifying one of the points
Now we look again and follow the steps described above

On the left we have reduced the height and have successfully descended from the mountain and on the right we have reduced the error to the min possible value and successfully classified our points

Web Snippets

Labels

Wednesday, January 29, 2020

Error functions

SOLUTION

No comments:

Post a Comment

Labels

Blog Archive