Learn why graph databases excel at exploring highly connected data, and how to leverage them in your organization
Key-value, document-oriented, column family, graph, relationalโฆ Today we seem to have as many kinds of databases as there are kinds of data. While this may make choosing a database harder, it makes choosing theย right database easier. Of course, that does require doing your homework. Youโve got to know your databases.ย
One of the least-understood types of databases out there is the graph database. Designed for working with highly interconnected data, a graph database might be described as more โrelationalโ than a relational database. Graph databases shine when the goal is to capture complex relationships in vast webs of information.ย
Here is a closer look at what graph databases are, why theyโre unlike other databases, and what kinds of data problems theyโre built to solve.
Graph database vs. relational database
In a traditional relational or SQL database, the data is organized into tables. Each table records data in a specific format with a fixed number of columns, each column with its own data type (integer, time/date, freeform text, etc.).
This model works best when youโre dealing mainly with data from any one table. It also doesnโt work too badly when youโre aggregating data stored across multiple tables. But that behavior has some notable limits.
Consider a music database, with albums, bands, labels, and performers.ย Ifย you want to report all the performers that were featured on this album by that band released on these labelsโfour different tablesโyou have to explicitly describe those relationships. With a relational database, you accomplish this by way of new data columns (for one-to-one or one-to-many relationships), or new tables (for many-to-many relationships).
This is practical as long as youโre managing a modest number of relationships. If youโre dealing with millions or even billions of relationshipsโfriends of friends of friends, for instanceโthose queries donโt scale well.
In short,ย if theย relationships between data, not the data itself, are your main concern, then a different kind of databaseโa graph databaseโis in order.
Graph database features
The term โgraphโ comes from the use of the word in mathematics. There itโs used to describe a collection of nodes (or vertices), each containing information (properties), and with labeled relationships (or edges)ย between the nodes.
A social network is a good example of a graph. The people in the network would be the nodes, the attributes of each person (such as name, age, and so on) would be properties, and the lines connecting the people (with labels such as โfriendโ or โmotherโ or โsupervisorโ) would indicate their relationship.ย
In a conventional database, queries about relationships can take a long time to process. This is because relationships are implemented with foreign keys and queried by joining tables. As any SQL DBA can tell you, performing joins is expensive, especially when you must sort through large numbers of objectsโor, worse, when you must join multiple tables to perform the sorts of indirect (e.g. โfriend of a friendโ) queries that graph databases excel at.ย
Graph databases work by storing theย relationships along with the data. Because related nodes are physically linked in the database, accessing those relationships is as immediate as accessing the data itself. In other words, instead of calculating the relationship as relational databases must do, graph databases simply read the relationship from storage. Satisfying queries is a simple matter of walking, or โtraversing,โ the graph.ย ย
A graph database not only stores the relationships between objects in a native way, making queries about relationships fast and easy, but allows you to include different kinds of objects and different kinds of relationships in the graph. Like other NoSQL databases, a graph database is schema-less.ย Thus, in terms of performance and flexibility, graph databases hew closer to document databases or key-value stores than they do relational or table-oriented databases.
Graph database use cases
Graph databases work best when the data youโre working with is highly connected and should be represented by how it links or refers to other data, typically by way of many-to-many relationships.
Again, a social network is a useful example. Graph databases reduce the amount of work needed to construct and display the data views found in social networks, such as activity feeds, or determining whether or not you might know a given person due to their proximity to other friends you have in the network.
Another application for graph databases is finding patterns of connection in graph data that would be difficult to tease out via other data representations. Fraud detection systems use graph databases to bring to light relationships between entities that might otherwise have been hard to notice.ย
Similarly, graph databases are a natural fit for applications that manage the relationships or interdependencies between entities. You will often find graph databases behind recommendation engines, content and asset management systems, identity and access management systems, and regulatory compliance and risk management solutions.ย
Graph database queries
Graph databasesโlike other NoSQL databasesโtypically use their own custom query methodology instead of SQL.
One commonly used graph query language is Cypher, originally developed for the Neo4j graph database. Since late 2015 Cypher has been developed as a separate open source project, and a number of other vendors have adopted it as a query system for their products (e.g., SAP HANA).
Here is an example of a Cypher query that returns a search result for everyone who is a friend of Scott:
MATCH (a:Person {name:โScottโ})-[:FRIENDOF]->(b)
RETURN b
The arrow symbol (->) is used in Cypher queries to represent a directed relationship in the graph.
Another common graph query language, Gremlin, was devised for the Apache TinkerPop graph computing framework. Gremlin syntax is similar to that used by some languagesโ ORM database access libraries.
Here is an example of a โfriends of Scottโ query in Gremlin:
g.V().has(โnameโ,โScottโ).out(โfriendofโ)
Many graph databases have support for Gremlin by way of a library, either built-in or third-party.
Yet another query language is SPARQL. It was originally developed by the W3C to query data stored in the Resource Description Framework (RDF) format for metadata. In other words, SPARQL wasnโt devised for graph database searches, but can be used for them. On the whole, Cypher and Gremlin have been more broadly adopted.
SPARQL queries have some elements reminiscent of SQL, namelyย SELECT and WHERE clauses, but the rest of the syntax is radically dissimilar. Donโt think of SPARQL as being related to SQL at all, or for that matter to other graph query languages.
Popular graph databases
Because graph databases serve a relatively niche use case, there arenโt nearly as many of them as there are relational databases. On the plus side, that makes the standout products easier to identify and discuss.
Neo4j
Neo4j is easily the most mature (11 years and counting) and best-known of the graph databases for general use. Unlike previous graph database products, it doesnโt use a SQL back-end. Neo4j is a native graph database that was engineered from the inside out to support large graph structures, as in queries that return hundreds of thousands of relations and more.
Neo4j comes in both free open-source and for-pay enterprise editions, with the latter having no restrictions on the size of a dataset (among other features). You can also experiment with Neo4j online by way of its Sandbox, which includes some sample datasets to practice with.
See InfoWorldโs review of Neo4j for more details.
Microsoft Azure Cosmos DB
The Azure Cosmos DB cloud database is an ambitious project. Itโs intended to emulate multiple kinds of databasesโconventional tables, document-oriented, column family, and graphโall through a single, unified service with a consistent set of APIs.
To that end, a graph database is just one of the various modes Cosmos DB can operate in. It uses the Gremlin query language and API for graph-type queries, and supports the Gremlin console created for Apache TinkerPop as another interface.
Another big selling point of Cosmos DB is that indexing, scaling, and geo-replication are handled automatically in the Azure cloud, without any knob-twiddling on your end. It isnโt clear yet how Microsoftโs all-in-one architecture measures up to native graph databases in terms of performance, but Cosmos DB certainly offers a useful combination of flexibility and scale.
See InfoWorldโs review of Azure Cosmos DB for more details.
JanusGraph
JanusGraph was forked from the TitanDB project, and is nowย under the governance of the Linux Foundation. It uses any of a number of supported back endsโApache Cassandra, Apache HBase, Google Cloud Bigtable, Oracle BerkeleyDBโto store graph data, supports the Gremlin query language (as well as other elements from the Apache TinkerPop stack), and can also incorporate full-text search by way of the Apache Solr, Apache Lucene, or Elasticsearch projects.
IBM, one of the JanusGraph projectโs supporters, offers a hosted version of JanusGraph on IBM Cloud, called Compose for JanusGraph. Like Azure Cosmos DB, Compose for JanusGraph provides autoscaling and high availability, with pricing based on resource usage.


