In today’s “Big Data” world, it’s not only about managing the large volumes of data but also about generating useful insights from them. To generate insights, we also need to concentrate on the relationships between data. What better way to leverage the relationships between data than using a database where relationships take first priority. We have such a database in a graph database that can represent our data as nodes and the relationships between them as edges.
Among many scenarios that can be naturally represented as graphs, maps are the most obvious. But when we think of maps, we can also think of logistics and geospatial applications, which are actually the original graph use cases. Geospatial applications of graph databases range from calculating routes between locations in networks such as a road or rail, airspace, or logistical network to spatial operations such as finding all points of interest in a bounded area, finding the center of a region, and calculating the intersection between two or more regions.
One concrete example of graph databases being used in logistics is eBay, who (owing to the acquisition of Shutl) provides a service that uses graphs to compute fast, localized door-to-door delivery of goods between buyers and sellers, scaling their business to include the supply chain. Incidentally, eBay observed that before turning to graphs the latency of their longest query was higher than their shortest physical delivery, both around 15 minutes — something that can’t now be replicated when an average query is powered by a graph database and takes 1/50th of a second!
The eBay example is not isolated. Organizations large and small are adopting and winning with graphs in retail, finance, telecoms, IT, gaming, real estate, healthcare, science, and dozens more areas.
This article gives an example of how a use case for delivery can be modeled as a graph and solved through simple queries in the graph database.
Let me present a highly simplified scenario. A parcel needs to be delivered and the delivery man needs to deliver it in the earliest time possible. So he has to choose the fastest route to the customer’s doorstep from the delivery center.
To solve his problem, we need to represent the streets between the delivery center and the customer as a graph and hence the intersections/road junctions are considered as nodes of the graph and the edges of the graph as the lanes connecting the junctions. In graph database terms, edges (lanes) are the relationship between the nodes (intersections) and we name the relationship as Road which has the following attributes:
The delivery center and the customer address are also represented as nodes in the graph. So essentially we have three types of nodes:
So now finding the fastest route boils down to a weighted shortest path calculation. We calculate it keeping in mind the two attributes of the relationship: distance (shorter is better) and coverability (higher is better).
Here we use the world’s leading graph database- Neo4j which is a highly scalable and ACID compliant database and its declarative query language Cypher.
A sample dataset is created in Neo4j using the CREATE clause in Cypher as given in Query 1. (Create clause in Cypher). This loads the data into Neo4j and generates the graph database as shown in Figure 2.
Neo4j has a lot of graph algorithms shipped with it as a package and those are accessible only from the JAVA API. Implementing some of these algorithms in Cypher is quite complex and time-consuming. From Neo4j 3.x, the concept of user-defined procedures had been introduced called APOC (Awesome Procedures On Cypher). Those are custom implementations of certain functionality, that can’t be (easily) expressed in Cypher itself. The APOC library consists of many (about 300) procedures to help with many different tasks in areas like data integration, graph algorithms or data conversion.
create (n0:sNode {name:’Start’}),
(n1:Intersection {name:’1′}),
(n2:Intersection {name:’2′}),
(n3:eNode {name:’Finish’}),
(n4:Intersection {name:’4′}),
(n5:Intersection {name:’5′}),
(n6:Intersection {name:’6′}),
(n7:Intersection {name:’7′}),
(n8:Intersection {name:’8′}),
(n9:Intersection {name:’9′}),
(n10:Intersection {name:’10’}),
(n11:Intersection {name:’11’}),
(n12:Intersection {name:’12’}),
(n13:Intersection {name:’13’}),
(n14:Intersection {name:’14’}),
(n15:Intersection {name:’15’}),
(n16:Intersection {name:’16’}),
(n17:Intersection {name:’17’}),
(n18:Intersection {name:’18’}),
(n19:Intersection {name:’19’}),(n0)-[:Road {distance:5,coverability:1}]->(n4),
Query 1: Query to add data into the database
(n4)-[:Road {distance:15,coverability:1}]->(n5),
(n5)-[:Road {distance:5,coverability:1}]->(n6),
(n6)-[:Road {distance:100,coverability:1}]->(n7),
(n7)-[:Road {distance:80,coverability:0.9}]->(n8),
(n8)-[:Road {distance:5,coverability:0.8}]->(n1),
(n0)-[:Road {distance:100,coverability:0.9}]->(n9),
(n9)-[:Road {distance:10,coverability:0.8}]->(n10),
(n10)-[:Road {distance:10,coverability:0.8}]->(n1),
(n1)-[:Road {distance:5,coverability:0.8}]->(n11),
(n11)-[:Road {distance:10,coverability:0.8}]->(n12),
(n12)-[:Road {distance:30,coverability:0.8}]->(n13),
(n13)-[:Road {distance:60,coverability:0.8}]->(n2),
(n1)-[:Road {distance:5,coverability:0.8}]->(n8),
(n8)-[:Road {distance:105,coverability:0.9}]->(n14),
(n14)-[:Road {distance:5,coverability:0.8}]->(n2),
(n2)-[:Road {distance:15,coverability:0.8}]->(n15),
(n15)-[:Road {distance:75,coverability:0.9}]->(n3),
(n2)-[:Road {distance:20,coverability:0.7}]->(n16),
(n16)-[:Road {distance:15,coverability:0.9}]->(n17),
(n17)-[:Road {distance:75,coverability:1}]->(n3),
(n2)-[:Road {distance:5,coverability:0.9}]->(n14),
(n14)-[:Road {distance:40,coverability:0.9}]->(n18),
(n18)-[:Road {distance:40,coverability:0.9}]->(n19),
(n19)-[:Road {distance:70,coverability:1}]->(n3);
We need to download the latest APOC library, drop the .jar file into the. /plugins directory of our Neo4j installation, and then just run Query 2. We use the Dijkstra’s algorithm for this task.
MATCH (startNode:sNode {name:”Start”}),(endNode:eNode {name:”Finish”})
call apoc.algo.dijkstra(startNode, endNode, ‘Road’, ‘distance’) YIELD path, weight
return path;
Query 2: Query to apply Dijkstra’s algorithm using only distance as the edge weights
In query 2 we just consider the distance property. To also consider the coverability property, we need to explicitly set the attribute, factor (= distance/coverability) into the relationships and then use Dijkstra’s algorithm. Run query 3 to achieve the desired results.
//set distance/coverability to each relationship
MATCH (a)-[r:Road]->(b)
SET r.factor=r.distance/r.coverability;
//apply Dijkstra’s algorithm with factor as the weight
MATCH (startNode:sNode {name:”Start”}),
(endNode:eNode {name:”Finish”})
call apoc.algo.dijkstra(startNode, endNode, ‘Road’, ‘factor’) YIELD path, weight
return path;
Query 3: Adding attribute distance/coverability and applying Dijkstra’s algorithm
So figure 4 gives the route that the delivery man needs to follow to deliver the parcel on time.
Conclusion
In logistics, planning, impact analysis and many other applications, weighted shortest path algorithms have a great potential. The graph data model and graph databases can be used to generate competitive insights and significant business value in various use cases like: Social and Professional Networking (e.g.: LinkedIn, Facebook), Real-Time Recommendations, Fraud Detection, Master Data Management, Authorization and Access Control etc.
Dr. Roy Marsten in a WIRED article says, “The data that we have today, and in often the ways we look at data, are already steeped in the theory of graphs. Creating and analyzing graphs will bring us to answers automatically. When we let data connect itself, meaning will begin to emerge automatically.” So, with all the chatter around Graph Databases, it won’t be completely unwise to assume that the future is graph shaped.
Further reads:
References:
News By: Team Zaloni
Blogs By: Matthew Caspento
Blogs By: Haley Teeples
Webinars By: Ben Sharma
Blogs By: Pranjal Goswami
Blogs By: Pranjal Goswami