Monday, April 29, 2019

One Hot Encoding

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.

CATEGORICAL DATA


Lets take a dataset of food names. In this dataset, if there was another food name it would have categorical value as 4.As the no of unique value increases, the categorical values increases.


What is Categorical Data?
  • Categorical data are variables that contain label values rather than numeric values.
  • The number of possible values is often limited to a fixed set.
  • Categorical variables are often called nominal.


Some categories may have a natural relationship to each other, such as a natural ordering.

CONVERT CATEGORICAL DATA INTO NUMERICAL DATA

This involves two steps:
  1. Integer Encoding
  2. One-Hot Encoding

1. Integer Encoding
As a first step, each unique category value is assigned an integer value.

For example, “Apple” is 1, “Chicken” is 2, and “Broccoli” is 3.

This is called a label encoding or an integer encoding and is easily reversible.

For some variables, this may be enough.

The integer values have a natural ordered relationship between each other and machine learning algorithms may be able to understand and harness this relationship.



2. One-Hot Encoding
For categorical variables where no such ordinal relationship exists, the integer encoding is not enough.

In fact, using this encoding and allowing the model to assume a natural ordering between categories may result in poor performance or unexpected results (predictions halfway between categories).

In this case, a one-hot encoding can be applied to the integer representation. This is where the integer encoded variable is removed and a new binary variable is added for each unique integer value.


6 comments:

  1. It's very useful blog post with inforamtive and insightful content and i had good experience with this information. We, at the CRS info solutions ,help candidates in acquiring certificates, master interview questions, and prepare brilliant resumes.Find top Salesforce admin interview questions in 2020.
    These Salesforce developer interview questions are highly helpful in 2020. You can read these Salesforce lightning interview questions and Salesforce integration interview questions which are prepared by industry experts.

    ReplyDelete
  2. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. machine learning projects for final year In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

    ReplyDelete