Banz Aisle Technology
Machine Learning
1. NLP - Natural Language Processing
Natural Language Processing (NLP) is a common notion for a variety of machine learning methods that make it possible for the computer to understand and perform operations using human (i.e. natural) language as it is spoken or written.
2. Dataset
Data is an essential part of machine learning. If you would like to build any machine learning system you need to either get the data (e.g. from some public resource) or collect it on your own. All the data that is used for either building or testing the ML model is called a dataset. Basically, data scientists divide their datasets into three separate groups. (Training Data, Validation Data, Test Data)
3. Computer Vision
Computer Vision (CV) is a field of Artificial Intelligence concerned with providing tools for analysis and a high-level understanding of image and video data. The most common problems in CV include: (Image Classification, Object Detection, Image Segmentation, Saliency Detection)
4. Supervised learning
Supervised learning is a family of machine learning models that teach themselves by example. This means that data for a supervised ML task needs
to be labelled (assigned the right, ground-truth class). For instance, if we would like to build a machine learning model for recognizing if a given
text is about marketing, we need to provide the model with a set of labeled examples (text + information if it is about marketing or not). Given a new,
unseen example, the model predicts its target - e.g. for the stated example, a label (eg. 1 if a text is about marketing and 0 otherwise).
5. Unsupervised learning
Contrary to Supervised Learning, Unsupervised Learning models teach themselves by observation. The data provided to that kind of algorithms is unlabeled (there is no ground truth value given to the algorithm). Unsupervised learning models are able to find the structure or relationships
between different inputs. The most important kind of unsupervised learning techniques is "clustering". In clustering, given the data, the model creates different clusters of inputs (where “similar” inputs are in the same clusters) and is able to put any new, previously unseen input in the appropriate cluster.
6. Reinforcement learning
Reinforcement Learning differs in its approach from the approaches we’ve described earlier. In RL the algorithm plays a “game”, in which it aims to maximize the reward. The algorithm tries different approaches “moves” using trial-and-error and sees which one boost the most profit.
The most commonly known use cases of RL are teaching a computer to solve a Rubik’s Cube or play chess, but there is more to Reinforcement Learning than just games. Recently, there is an increasing number of RL solutions in Real Time Bidding, where the model is responsible for bidding a spot for
an ad and its reward is the client’s conversion rate.
7. Neural Networks
Neural Networks is a very wide family of machine learning models. The main idea behind them is to mimic the behaviour of a human brain when processing data.
Just like the networks connecting real neurons in the human brain, artificial neural networks are composed of layers. Each layer is a set of neurons, all of which are responsible for detecting different things. A neural network processes data sequentially, which means that only the first layer is directly connected to the input. All subsequent layers detect features based on the output of a previous layer, which enables the model to learn more and more complex patterns in data as the number of layers increases. When the number of layers increases rapidly, the model is often called a Deep Learning model. It is difficult to determine a specific number of layers above which a network is considered deep, 10 years ago it used to be 3 and now is around 20.
8. Overfitting
It’s a negative effect when the model builds an assumption - bias - from an insufficient amount of data. A fairly common, and very important problem.
Let’s say that you’ve visited a bakery a couple of times, and not once was there your favourite cupcake left! You’d likely get disappointed with the bakery - even though a thousand of other customers might find the stock satisfying. If you were a machine learning model, it’d be fair to say you’ve overfitted against a small number of examples - developed a biased model, a representation in your head, that isn’t accurate compared to the facts.
​
When overfitting happens, it usually means that the model is treating random noise in the data as a significant signal and adjusts to it, which is why it deteriorates on a new data (as the noise there is different.) This is generally the case in very complex models like Neural Networks or Gradient Boosting.
Imagine building a model to detect articles mentioning a particular sports discipline practised during Olympics. Since your training set is biased toward articles about the Olympics, the model may learn features like presence of a word “Olympics” and fail to detect correct articles that do not contain that word.
Looking for Machine Learning consulting?
It's no magic! We're here to help your business get actual measurable value from AI. Get to know our team, and schedule an exploratory call with us!