Machine Learning API and Scaling

3 min readSep 14, 2022

What is machine learning?

Machine learning tries to understand existing patterns and makes decision. In other words, tries to find relationship among data and use it to make decision.

Why do we use machine learning?

ML makes it easy to make data driven decisions without human intervention. ML can evolve over time meaning improve decision making accuracy and it can be continuous.

ML Use-cases

Identifying tags and categories from textual data
Weather prediction
Recommendations
Fraud detections

ML Pipeline

Step 1: Prepare training Data

If we have textual data and trying to predict the outcome, we need to convert textual representation into machine readable format which is numerical format.

As a first step, we can cleanse the data (remove nulls, duplicates, punctuations) and apply stemming (convert riding, rode, ride ==> ride)

Let’s convert textual data to numerical format. One such conversion can be done using CountVectorizer. Identify Unique feature names in the input.

Step 2: Extract Features

Input

‘This is the first document.’

‘This document is the second document.’

‘And this is the third one.’

‘Is this the first document?’

[‘and’, ‘document’, ‘first’, ‘is’, ‘one’, ‘second’, ‘the’, ‘third’, ‘this’]

CountVectorizer (Document term matrix)

[[0 1 1 1 0 0 1 0 1]

[0 2 0 1 0 1 1 0 1]

[1 0 0 1 1 0 1 1 1]

[0 1 1 1 0 0 1 0 1]]

Step 3: Build ML Model

In Supervised machine learning, select an algorithm, specify samples(input data) and target(expected result) for building the model

Step 4: Make Predictions

Predict live traffic: e.g credit card transaction fraudulent
Recommendations
Suggestions
Auto complete

Machine Learning Model

Linear Regression is method to study relationship between a dependent variable (Y) with a given set of independent variables (X). The relationship can be established with the help of fitting a best line.

Y = mx + b

Where b is the intercept and m is the slope of the line. So basically, the linear regression algorithm gives us the most optimal value for the intercept and the slope (in two dimensions). The y and x variables remain the same, since they are the data features and cannot be changed. The values that we can control are the intercept and slope. There can be multiple straight lines depending upon the values of intercept and slope. Basically what the linear regression algorithm does is it fits multiple lines on the data points and returns the line that results in the least error.

Support Vector is Supervised learning method for classification. Maximizes the hyperplane distance from the classes.

Multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). each naive Bayes classifier can be considered a way of fitting a probability model.

Scaling

ML Prediction API Python in memory

ML Prediction API (Batch processing)

ML Prediction API streaming pipeline

Deployment Options

AWS SageMaker or GCP cloud API
Python flas or Django app
Tensorflow deploy on k8 cluster on GCP or AWS or if you have your own k8 cluster, can deploy there
Redis-AI module

Resources:

ML Notebook

Python flask service with prediction api (coming soon)