Input and output for classification problem
- Input to classification problem is a feature and output is called as label
- Problem statement and training data is where we spend amount of time
Lets talk about 2 types of problems
Problem statement 1
Email, tweet or trading day
- Types of problems are Spam or Ham
- Tweet positive or negative
- Trading day up-day or down-day
Problem statement 2
- Build the black box classifier . What happens in this black box is represented using mathematical rules or equations and it is called a model
Features
- Every data point that we see needs to be represented as numerical attributes
- The algorithms only except numerical algorithms
- Even text and images are represented using numeric
Training
- We take large amount of historical data.These are set of problem instances that are correctly labelled. Ex emails that are marked as
- Each part of training data is tuples of features and label
- The patterns the classifier learns in this phase is classified as Model
- If the output that is requires is known, Then it is known as supervised learning
Test Phase
- Here we are actually classify the data that we have not seen before
Note: Most of these the algorithms are available as pre-built libraries for platforms like Python,R or spark
Algorithms for classification problem
- Naive Bayes
- Support Vector Machines
- Decision Trees
- K-Nearest Neighbors
- Random Forest
- Logistic Regression
Term Frequency Representation
This is how we take the text input and represent it as set of numeric data
Great Article
ReplyDeletefinal year projects on machine learning
Final Year Project Centers in Chennai
JavaScript Training in Chennai
JavaScript Training in Chennai