Data Modeling in Graph Databases: Building Intuitive and Scalable Structures

It's not just about creating a structure; it's about creating one that is intuitive and scales with your needs.

Posted by Tahir Waseer on August 30, 2023

Missed the last piece on transitioning from relational to graph databases? You can get up to speed here. Now, let’s get down to business. We’re diving into the art and science of Data Modeling in Graph Databases. I’ll share practical insights, real-world examples, and key lessons I’ve picked up.

If you’re new to Cypher or Neo4j. You’ll get the hang of it as you go — Neo4j’s docs are a solid starting point. Plus, you don’t need to understand all the syntax to grasp the concepts.

So, why should you care about data modeling in graphs? Well, it’s not just about creating a structure; it’s about creating one that is intuitive and scales with your needs. Unlike in relational databases, where your focus might be on tables and normalization, graph databases like Neo4j invite you to think about entities and their relationships in a more interconnected way.

The atomic units of a graph data model are fairly simple:

  1. Nodes: The entities or “nouns” in your model, such as Products or Suppliers.
  2. Relationships: The “verbs” that connect nodes together. In our previous example, a Supplies relationship might connect a Supplier node to a Product node.
  3. Properties: Additional information that can be attached to both nodes and relationships, like a price property for a Product node.

Now, how do you use these elements to build a data model that’s not just coherent, but also optimized for your needs?

You start with the questions you absolutely need to answer and then structure your data to make answering those questions a breeze. In other words, you start with the queries and work backward. It’s what I’ll call “problem-driven modeling”. Let’s explore this concept in more detail.

Building upon our basic model of Products and Suppliers connected by SUPPLIES relationships, let’s add Reviews. In a relational model, Reviews would be represented as another table, with foreign keys linking it to the Products and Users tables. But in our graph, Reviews become nodes connected to Products through a HAS_REVIEW relationship and to Users through a WRITTEN_BY relationship.

Here’s what this structure might look like in (Neo4j’s browser): reviews

I’ve added BOUGHT relationship between User and Product nodes to illustrate how we can connect the dots between different entities. This architecture allows seamless pivoting from a Review to a Product, User, or even other reviews written by the same User. Also helps simplify queries like “Find all products rated above 4 stars by users who have also bought T-shirts,” making them more intuitive.

MATCH (u:User)-[:BOUGHT]->(:Product {name: 'T-Shirt'})
MATCH (u)-[:WRITTEN_BY]-(r:Review)-[:HAS_REVIEW]->(p:Product)
WHERE r.rating > 4
RETURN DISTINCT p.name

And what if you’re interested in adding customizations to multiple products? Imagine you run an online marketplace where a customer asks, “What print sizes can I get on a T-shirt?” This is an excellent example of where problem-driven modeling comes into play.

Let’s consider a scenario where a single supplier can offer different customization options for multiple products. For instance, a “Print” node in your graph model could be connected to a “T-Shirt” node via an APPLIES_TO relationship. This relationship might carry a property like print_size: 'large'. Similarly, the very same “Print” node could be linked to a “Water Bottle” node but with a print_size: 'small' property.

customization

This setup allows us to easily and efficiently find answers to customer queries. For example, a Cypher query to find all products with a ‘large’ print size could look like this:

MATCH (c:Customization {name: 'Print'})-[r:APPLIES_TO]->(p:Product)
WHERE r.print_size = 'large'
RETURN p.name;



Be query-specific. Pay attention to relationships. Don’t shy away from denormalization if it makes your queries faster. And don’t be a perfectionist; your first model won’t be your last. As your data grows, so will your understanding. That’s the beauty of graphs; they can evolve with you.

So, if you model your data based on the questions that really matter to your business, you’re on the right track. You’ll craft a graph data model that’s not just intuitiveit’s also scalable and efficient. While this isn’t the only way to go about it, but it’s a damn good starting point.