17 November. 19.45 - 20.30 | Garage

Nowadays, an increasing number of business problems rely on the analysis of real-time metrics. Typical use cases range from credit fraud detection to predictive maintenance. Also, we are moving towards an era where all sensors and devices are connected to the internet, I.e. IoT, which monitor the performance of different KPIs. For this reason, it is crucial to extend and refine real-time analytics to streaming data sources to reach fast-developing sectors such as: Smart Cities, Industry 4.0, Smart Healthcare, etc. In this talk we will focus on unsupervised real-time anomaly detection. For this type of setups, it is a standard practice to set up thresholds for the detection of anomalies. Examples of this are: choosing, naïvely, fixed constant upper and lower bounds or estimating a threshold based on MAE with respect of previous data points. This type of thresholds may be suitable while training but may be not accurate enough once we put our model in production. Typically, these degrade with time, by detecting too many or too few anomalies, which forces us to change them later. We present an unsupervised real-time anomaly detector based on a forecaster based on LSTM neural networks. Anomalies in this case are data points which lay outside the confidence interval of the predictions. As we know, confidence intervals are not straightforwardly obtained from this type of neural networks, in contrast to well-known models as ARIMA. For that reason, we propose a way to obtain it using stochastic dropout. We set a Dropout layer after each of the LSTM layers used in the model. Once the model is trained, we bootstrap enough iterations to obtain the desired confidence intervals. In every iteration each layer will have a random dropout value which will disconnect weights between neurons randomly. By following this procedure, stochastic bootstrap makes this confidence intervals more reliable since the width of the interval will not depend on the dropout set before-hand. Also, with this technique the model can adapt their width, in metrics space, of the confidence intervals in such a way that the anomalies detected are truly outliers in the time series. In addition to this, we present an automatized way to detect the root cause of such anomalies in cases where we have at our disposal different metrics. For that end we focus on anomalous points that occur simultaneously in a subset of all the monitorized metrics. In order to estimate the root cause, we analyze correlations between the metrics that belong to this subset such as: cosine similarity, correlation of the tokenized metrics names, etc. In addition, we supply to this, events which may affect to the behavior of the metrics monitorized. Examples of that are: new releases of an app whose metrics we are monitoring, weather for the use of transportation services, etc. Altogether, gives us common features for all the anomalous points which may point towards the origin of the anomalous behavior detected which may be communicated in real-time.