In recent years, machine learning has developed from an interesting research topic to a point where companies can use its techniques in production systems to solve real world problems. However, the development and usage of production machine learning systems leads to new problems that were not present in the research stage.
One of the biggest problems is the prediction quality degradation of machine learning models due to changes in the input data or relationship changes in the environment for which the model makes its predictions. The general problem is known under the name “model drift”. The main cause for this drift is that a model learns from training data which represents the environment and its relationships at a certain point of time. After the training, the model’s weights are fixed but the real world is dynamic and so the environment can change as time progresses. This leads to a divergence between the model’s fixed knowledge and the real environment which will lower the model’s performance.
If the performance degradation of your production model stays unnoticed for too long, this can have a negative impact on your business which can cause financial damage or even worse accidents (depending on the field of application of your model).
This is the reason why monitoring the performance of your production models is an essential part of machine learning operations (MLOps) which describes best practices for the reliable use of machine learning models in production.
At evoila, we have developed a service that has the ability to collect all of the required information to be able to measure the performance of your production models and to detect a drift in its input data afterwards.
The types of models that the service does support are listed in the following table.
Data type | Input | Model type |
Unstructured data | Images | Classification, object detection (OD), instance segmentation (IS) |
Text | Classification, named entity recognition (NER) | |
Tabular data | Classification, regression |
The system uses five components to accomplish its task:
Collector Service: This service is the developed component and also acts as the central component in the system. The collector works as a proxy between the user and the Inference-Server. In this way the service can collect the input data from the user and the prediction from the model response. The advantage of this approach is that the user does not recognize any differences compared to a direct request to the Inference-Server and the Inference-Server does not need to be bothered with data collection tasks.
The service only collects a configurable rate of requests. If a request shall be monitored, it orchestrates the collection and storage of the information.
In order to be independent of a specific model or Inference-Server, the service can use exchangeable scalers (for OD and IS) to re-scale the returned annotations back to the original image size and converters (for OD, IS, NER) to convert the annotations from any of the various existing annotation formats to the required format. Also the modules for communication with the labeling tool, object storage and database are implemented as plugins which means that these components can easily be exchanged if it is necessary.
Inference-Server: This component contains the machine learning model which shall be monitored. It takes the input data, preprocesses them (e.g. scaling images to a smaller size), applies the model and returns the prediction.
Object Storage: An object storage is well suited for storing unstructured data. This is the reason why it is used for storing the unstructured input data that the user sends in for prediction. Furthermore, this storage can be used to store segmentation masks from the model’s predictions in image form.
Labeling Tool: By acting like a proxy, the collector service can only obtain the model input and the corresponding prediction. In order to get the ground truth labels for the unstructured data, a labeling tool is used with which labelers can manually annotate the data. To speed up this process, not only the images and texts that need to be labeled are imported but also the model’s response. The labelers then only need to correct the prediction. After labeling, the ground truth labels are sent back to the collector service via webhook.
Database: The database is used as persistent storage for saving the model’s prediction and the corresponding ground truth labels. In the case of tabular data, the database also directly stores the input features but for unstructured data only a reference to the corresponding input data in the object storage is stored.
We have developed a service that can collect and store all the necessary information to monitor your machine learning model in production. The service acts as a proxy between the Inference-Server and the user and also uses a number of existing components and services to accomplish this task. With the use of a plugin-system the used components can easily be exchanged for alternatives. It also is independent from a concrete inference-server by using exchangeable scalers and converters to transform the returned object annotations for object detection, instance segmentation and named entity recognition to the required format. The rate of requests that shall be monitored is configurable.