Web Snippets: Machine learning

Showing posts with label Machine learning. Show all posts

Wednesday, February 5, 2020

Machine Learning workflow

Machine Learning Workflow consists of 3 components

Explore and process data
Modeling
Deployment

Data dimensions

SCALARS

They have 0 dimensions
Ex a persons height would be a scalar

1 2.4 -0.3

Bag of words

Way of representing text data when modeling text with machine learning algorithms.
Is simple to understand and implement and has seen great success in problems such as language modeling and document classification.

The Problem with Text

A problem with modeling text is that it is messy, and techniques like machine learning algorithms prefer well defined fixed-length inputs and outputs.

Machine learning algorithms cannot work with raw text directly; the text must be converted into numbers. Specifically, vectors of numbers.

In language processing, the vectors x are derived from textual data, in order to reflect various linguistic properties of the text.

This is called feature extraction or feature encoding.

A popular and simple method of feature extraction with text data is called the bag-of-words model of text.

Sage Maker Services

SERVICES PROVIDED BY SAGEMAKER

1) Provides jupyter notebook instance

Used to explore and process data

2) API

This simplifies computationally difficult task like train and deploy machine learning model

Machine Learning Workflow consists of 3 components

Explore and process data
Modeling
Deployment

EXPLORE AND PROCESS DATA

This component consists of exploring and processing the data.

Retrieve

The first step is to retrieve the data, which includes test and train dataset. Lets take an example of housing dataset which contains csv files. We need to download the data from the source.

twittter location clustering based on tweets (Spark Mllib)

1) Create a directory for twitter streams

 cd /usr/lib/spark

 sudo mkdir tweets

 cd tweetscd
 sudo mkdir data

 sudo mkdir training

 sudo chmod  777 /usr/lib/spark/tweets/

These are the two folders which we would be using in this project
data :Would contain the master of the csv files which we would pretend coming from a training source.
training : Source to train our machine learning algorithm

Tensorflow

Interface for expressing machine learning algorithms
Implementation for executing such algorithms
Framework for creating ensemble algorithms for today's most challenging problems

Mean, Median and Mode

Mean
The "average" number; found by adding all data points and dividing by the number of data points.

Quick review of machine learning algorithms

These are some of the important machine learning algorithms

Decision tree

Belongs to the family of supervised learning algorithms.
Can be used for solving regression and classification problems too.The general motive of using
Decision Tree is to create a training model which can use to predict class or value of target variables by learning decision rules inferred from prior data(training data)

Ex : Banker deciding whether to grant a loan.

Classifying data into predefined categories

Input and output for classification problem

Input to classification problem is a feature and output is called as label
Problem statement and training data is where we spend amount of time

Lets talk about 2 types of problems

Problem statement 1

Email, tweet or trading day

Types of problems are Spam or Ham
Tweet positive or negative
Trading day up-day or down-day

Machine learning

-Is a science of getting computers to learn, without being explicitly programmed

-Grew out of work in AI

-New capability of the computers

Web Snippets

Labels

Wednesday, February 5, 2020

Machine Learning workflow

Saturday, April 27, 2019

Data dimensions

Friday, April 26, 2019

Bag of words

Tuesday, April 16, 2019

Sage Maker Services

Machine Learning Workflow

Monday, February 4, 2019

twittter location clustering based on tweets (Spark Mllib)

Friday, December 28, 2018

Tensorflow

Tuesday, April 24, 2018

Mean, Median and Mode

Thursday, January 11, 2018

Quick review of machine learning algorithms

Decision tree

Thursday, August 31, 2017

Classifying data into predefined categories

Lets talk about 2 types of problems

Wednesday, November 2, 2016

Machine learning

Labels

Blog Archive