Labels
- Algorithms
- Architecture
- Big Data
- Cloud services
- Cognitive technologies
- Data Analytics
- Machine learning
- Node
- SQL
Showing posts with label Machine learning. Show all posts
Showing posts with label Machine learning. Show all posts
Wednesday, February 5, 2020
Saturday, April 27, 2019
Friday, April 26, 2019
Bag of words
- Way of representing text data when modeling text with machine learning algorithms.
- Is simple to understand and implement and has seen great success in problems such as language modeling and document classification.
The Problem with Text
A problem with modeling text is that it is messy, and techniques like machine learning algorithms prefer well defined fixed-length inputs and outputs.
Machine learning algorithms cannot work with raw text directly; the text must be converted into numbers. Specifically, vectors of numbers.
In language processing, the vectors x are derived from textual data, in order to reflect various linguistic properties of the text.
This is called feature extraction or feature encoding.
A popular and simple method of feature extraction with text data is called the bag-of-words model of text.
Tuesday, April 16, 2019
Machine Learning Workflow
Machine Learning Workflow consists of 3 components
- Explore and process data
- Modeling
- Deployment
EXPLORE AND PROCESS DATA
This component consists of exploring and processing the data.
Retrieve
The first step is to retrieve the data, which includes test and train dataset. Lets take an example of housing dataset which contains csv files. We need to download the data from the source.
Monday, February 4, 2019
twittter location clustering based on tweets (Spark Mllib)
1) Create a directory for twitter streams
cd /usr/lib/spark
sudo mkdir tweets
cd tweetscd sudo mkdir data
sudo mkdir training
sudo chmod 777 /usr/lib/spark/tweets/
These are the two folders which we would be using in this project
data :Would contain the master of the csv files which we would pretend coming from a training source.
training : Source to train our machine learning algorithm
Friday, December 28, 2018
Tuesday, April 24, 2018
Thursday, January 11, 2018
Quick review of machine learning algorithms
These are some of the important machine learning algorithms
Decision tree
- Belongs to the family of supervised learning algorithms.
- Can be used for solving regression and classification problems too.The general motive of using
- Decision Tree is to create a training model which can use to predict class or value of target variables by learning decision rules inferred from prior data(training data)
Ex : Banker deciding whether to grant a loan.
Thursday, August 31, 2017
Classifying data into predefined categories
Input and output for classification problem
- Input to classification problem is a feature and output is called as label
- Problem statement and training data is where we spend amount of time
Lets talk about 2 types of problems
Problem statement 1
Email, tweet or trading day
- Types of problems are Spam or Ham
- Tweet positive or negative
- Trading day up-day or down-day
Wednesday, November 2, 2016
Subscribe to:
Posts (Atom)
Labels
- Algorithms (52)
- Apache Kafka (7)
- Apache Spark (21)
- Architecture (8)
- Arrays (23)
- Big Data (98)
- Cloud services (6)
- Cognitive technologies (12)
- Data Analytics (3)
- Data Science (6)
- Design (1)
- Hadoop (26)
- Hive (11)
- Java (2)
- JavaScript (65)
- JavaScript Run-time (12)
- Machine learning (11)
- Maths (6)
- MySQL (1)
- Networking (3)
- No SQL (2)
- Node (20)
- Python (28)
- SQL (40)
- Security (4)
- Spark Grpahx (1)
- Spark MLlib (1)
- Spark Sql (3)
- Spark Streaming (4)
- Sqoop (2)
- Strings (13)
- devOps (1)
- mongoDb (2)
- ssis (3)