I want to deep into the importance of using graph native algorithms and graph database technologies in the current analytical environments. They bring a high differentiating value that allow our organizations to make a big qualitative leap in our Analytics pipepine. Data Analytics is not a new discipline. We began by analyzing the past thanks to systems such as DWH and using BI tools, and we have ended up trying to foresee the future using predictive technologies and algorithms in disciplines such as machine learning or deep learning. But all these systems lack on using the relations between data.
They focus on analyzing data as “table”, which contains discrete information about reality. For instance, in finance, when analyzing clients, we will surely have a table with age, salary, #children, debt level, loyalty level, … and more data that all together describe our client. But … is that enough? During last years we have learned the possibilities of the current technology for data analytics, but also its limitations. At the same time, scientists such as James H. Fowler have taught us that even person attributes as simple as obesity degree, whether they smoke or level of happiness, are not attributes that can be understood without taking into account the connections of this person with others around them, that is, their CONTEXT.
James H. Fowler, but not only him, but others like David Burkus in his book “Friend of a friend”, have taught us that the knowledge of reality, the understanding of human behavior, comes more from the analysis of relationships than from the analysis of the discrete data of each individual. This premise opened a new analytics field: GRAPHS. Social networks came first, but behind came many other use cases where graphs are a differentiating element when it comes to generating KPIs to model reality. Current systems cannot efficiently manage RELATIONSHIPS between entities. Here is where graph technology comes in, with platforms like Neo4j, which manage the relationships between data in a super-efficient way. The math behind graphs began with Leonhard Euler in the 18th century when trying to model the problem of the Bridges of Königsberg.
Since then until the current technology that allows us to launch algorithms like PageRank, Dijkstra, Louvain or Centrality efficiently, there has been a great evolution. How to use it? very important: graphs do not remove the developments done with current technologies. Graphs adds valuable information to this Machine Learning pipeline, for example, adding variables.
We see different levels of maturity. From lowest to highest: 1. Queries: answering questions like “A customer is asking for a loan. Is there someone connected at less than 3 hops to my client who is marked as a potential fraudster?” 2. Ingesting features from native graph algorithms (Degree, Communities, Pageranks, …) 3. Graph Embeddings. Creation of features that reflect the structure of the graph within the ML pipeline. 4. Neural Networks. Using calculated embeddings, we will have the ability to associate similarities between behaviors of entities (Customers, …) thanks to the correct interpretation of the structure of relationships inherent in the graph. This technology brings value to any industry, but especially those in which the business is based on data. From digital platforms to banking, insurance, pharma, … Some use cases: • Advanced recommendation systems (of products in eCommerce, people in social networks, …) • Fraud Detection, AML, Risk, … • Cyber-Security • Customer-360, Customer Journey, MDM, … Graphs is technology to consider in any Data Analytics strategy. They have already demonstrated in multiple industries their ability to add value and improve the results of prediction models, classification, etc