Uncovering connections in vast amounts of data is made easier through the use of graph data platforms that efficiently model complex networks.
Government agencies often struggle to uncover connections in a vast pool of data — when aiming to fight fraud or determine an individual’s links to ascertain conflict of interest, for example.
Graph data platforms model complex networks of entities and their interrelationships efficiently, uncovering patterns that are difficult to detect using traditional representations, such as tables for example, which may be great at collecting and processing data but fail to see the relationships between data points.
The biggest leak in history
To understand the power of a graph data platform and its benefits to government and taxpayers, look no further than the Pandora Papers, the biggest leak in history.
With more than 2.94 terabytes of unstructured data in various languages and formats (documents, images, emails, spreadsheets and more) from different sources, the investigation — which was spearheaded by the International Consortium of Investigative Journalists (ICIJ) — presented a massive data management challenge.
ICIJ used the Neo4j graph data platform to generate visualisations and make the nearly 12 million records searchable, enabling them to explore connections between all involved parties.
Ultimately, the consortium was able to expose offshore entities linked to more than 330 current and former politicians and 130 Forbes billionaires, as well as celebrities, drug dealers, royal family members and leaders of religious groups globally.
The use of a graph data platform made the Pandora Papers exposé possible in 12 months; the alternative would have been trawling through emails, images (low and high quality), PDFs and documents in various formats including Word and Excel — a mammoth effort that would have taken many years to conclude.
Deep dive
Life before graph data platforms mainly centred around its collection-oriented predecessor, a traditional, relational database. Relational databases are good for well-understood, often aggregated, data structures that don’t change frequently — known problems involving minimally connected or discrete data.
Increasingly, however, government agencies are faced with problems where the data topology is dynamic and difficult to predict, and relationships among the data contribute meaning, context and value. These connection-oriented scenarios necessitate a graph data platform.
A graph data model is easy to understand as it reflects how data naturally exists — as objects and the relationships between those objects. It’s a model that users naturally sketch on a whiteboard when talking about data, with data elements (nodes or vertexes) and the relationships (or edges) between them. Each node represents an entity, and each relationship represents how two nodes are associated. Property attributes (and indexes) can be attached to both nodes and relationships as well.
By assembling the simple abstractions of nodes and relationships into connected structures, graph database platforms allow the user to build sophisticated, flexible models that map closely to a problem domain.
Government agencies stand to benefit from graph data platforms in the following ways:
1. Conduct complex queries
Governments today are challenged with solving complex problems. With the vast amount of data pouring in, the answers exist somewhere — but only if they can make sense of the growing volume, variety and interrelationships of data in disparate sources.
Data becomes more useful once its connectedness is established. Connected data is the representation, usage and persistence of relationships between data elements, and using graphs makes it possible to query relationships across disparate data sources, regardless of the type of data or originating database. That graph technology connects multiple layers of data across processes, people, networks and things.
Once the layers are connected, users gain intelligence downstream and access a connected view of the data to analytic and operational applications. This also delivers context, which allows government departments and agencies to more deeply or better refine the pieces of information being collected.
The better the understanding of data connections, the more accurate downstream insights will be. Graphs empower government agencies to iterate and expand on current datasets, gaining momentum to execute on bigger and better ideas, and to find deeper contextual meaning in the data.
Through graph data platforms, users can increase the number of hops (the level of connections) between data without a corresponding increase in compute cost. An enterprise-grade, native graph data platform enables these deep, complex queries and is built from the ground up to traverse data connections at depth, in real time and at scale.
2. Reduce infrastructure costs
Government agencies run on a lean budget. Any opportunity to reduce infrastructure spending frees up resources to focus on the core mission. A graph data platform does just that by delivering deep, complex queries with less hardware, which means reduced infrastructure cost.
A standard, highly available installation is usually three to five servers, versus a relational database with a graph layer, which requires around 50 servers for the same scale. With this efficiency, graph data platforms also require fewer licences, further reducing database costs.
3. Maximise value from existing resources
A rip-and-replace approach is a non-starter for most government technology projects. By connecting data across diverse existing data stores, graph data platforms leverage the value of all existing systems. And when it’s time to replace ageing applications, government departments and agencies find that graph data platforms are a cost-effective agile foundation for new initiatives.
4. Deliver immediate answers at scale
Government departments and agencies must store massive amounts of data and generally need answers quickly. Graph data platforms deliver huge performance advantages over relational and other NoSQL databases hosting graph engines, reducing response times from minutes to milliseconds for queries of graphs containing billions of connections.
Relational databases and other NoSQL databases typically see a significant performance degradation when traversing data beyond three levels of depth, whereas graphs traverse any level of data in real time due to native graph architecture.
5. Meet security demands
There are graph data platforms that fulfil the stringent security demands of government customers. A good platform will meet federal and state requirements, advanced security architecture that supports attribute-based access control (ABAC) as well as role-based access control (RBAC). Some platforms are approved to run in a classified environment by many Department of Defence and intelligence community agencies.
Objectives for using graph data platforms will vary widely from agency to agency, and even within agencies. Some missions are internal facing; they help the agency run more efficiently. Others are external facing and directly impact constituents and taxpayers. But the main starting point is for departments and agencies to identify their pain points and what data is required to accomplish their goals — the beauty of a graph data platform is that it supports diverse applications and can resolve decades-old problems.
By Peter Philipp, Neo4j ANZ General Manager
This article was first published by GovTech Review