Web Snippets: Classifying data into predefined categories

Thursday, August 31, 2017

Classifying data into predefined categories

Input and output for classification problem

Input to classification problem is a feature and output is called as label
Problem statement and training data is where we spend amount of time

Lets talk about 2 types of problems

Problem statement 1

Email, tweet or trading day

Types of problems are Spam or Ham
Tweet positive or negative
Trading day up-day or down-day

Problem statement 2

Build the black box classifier . What happens in this black box is represented using mathematical rules or equations and it is called a model

Features

Every data point that we see needs to be represented as numerical attributes
The algorithms only except numerical algorithms
Even text and images are represented using numeric

Training

We take large amount of historical data.These are set of problem instances that are correctly labelled. Ex emails that are marked as
Each part of training data is tuples of features and label
The patterns the classifier learns in this phase is classified as Model

Note: All does not follow this

If the output that is requires is known, Then it is known as supervised learning

Test Phase

Here we are actually classify the data that we have not seen before

Note: Most of these the algorithms are available as pre-built libraries for platforms like Python,R or spark

Algorithms for classification problem

Naive Bayes
Support Vector Machines
Decision Trees
K-Nearest Neighbors
Random Forest
Logistic Regression

Term Frequency Representation

This is how we take the text input and represent it as set of numeric data

1 comment:

for ict 99September 7, 2019 at 10:41 AM
Great Article
final year projects on machine learning
Final Year Project Centers in Chennai

JavaScript Training in Chennai
JavaScript Training in Chennai
ReplyDelete
Replies

Subscribe to: Post Comments (Atom)