Graph Databases (Neo4j, Amazon Neptune) Flashcards
Model and query complex relationships using graph theory. "What is a graph database and where would you use it?" "How is Cypher query language different from SQL?" Topics: Nodes, relationships, properties Use cases: social networks, fraud detection, recommendation systems Neo4j and the Cypher query language Querying patterns and traversals Indexes and constraints in Neo4j Amazon Neptune overview Graph algorithms: shortest path, PageRank, centrality (30 cards)
What is a graph database and where would you use it?
A graph database stores data as nodes and relationships, making it ideal for use cases involving highly connected data like social networks, fraud detection, and recommendations.
What are nodes in a graph database?
Nodes represent entities such as people, products, or places in a graph database.
What are relationships in a graph database?
Relationships are directed connections between nodes, representing how entities are related.
What are properties in a graph database?
Properties are key-value pairs stored on nodes and relationships to hold additional metadata.
How is Cypher query language different from SQL?
Cypher uses pattern matching to traverse graphs, unlike SQL which uses JOINs to relate tables in relational databases.
What is Cypher used for in Neo4j?
Cypher is the query language for Neo4j, used to match patterns, create or delete nodes and relationships, and return query results.
What are some common use cases for graph databases?
Social networks, fraud detection, recommendation engines, network and IT operations, and knowledge graphs.
What is pattern matching in graph databases?
Pattern matching refers to specifying a sequence of nodes and relationships in a query to extract relevant subgraphs.
What is a graph traversal?
Graph traversal is the process of visiting connected nodes in a graph, often used in queries and algorithms like shortest path.
What is an index in Neo4j?
An index in Neo4j speeds up lookups of nodes or relationships by property values.
What are constraints in Neo4j?
Constraints enforce rules like uniqueness of a property (e.g., a user ID) on nodes or relationships to maintain data integrity.
What is Amazon Neptune?
Amazon Neptune is a fully managed graph database service that supports both property graphs (via Gremlin) and RDF data (via SPARQL).
What are the main query languages supported by Amazon Neptune?
Neptune supports Gremlin for property graph model and SPARQL for RDF graph model.
What is the shortest path algorithm in graph databases?
It finds the minimum distance between two nodes, commonly used in routing and recommendation systems.
What is PageRank in graph databases?
PageRank is an algorithm that ranks nodes based on the number and quality of their incoming links, useful in web search and influence analysis.
What is centrality in graph analysis?
Centrality measures how important a node is within a graph based on its position and connectivity (e.g., degree, closeness, betweenness).
What are advantages of graph databases?
Efficient for modeling and querying connected data, flexible schema, real-time traversals, and intuitive relationship representation.
What are disadvantages of graph databases?
They may not scale as well for high write throughput, can be complex to set up and operate, and have limited tooling compared to SQL databases.
What are best practices for graph data modeling?
Model based on access patterns, avoid deeply nested relationships, use constraints and indexes for performance, and keep relationships meaningful.
What are best practices for querying in Neo4j?
Use indexes where applicable, avoid Cartesian products, use EXPLAIN
/PROFILE
to optimize queries, and limit traversal depth.
What is the impact of graph databases on system design?
They enable more natural modeling of connected domains, reduce JOIN overhead, and are ideal for real-time relationship querying.
What is a real-world example of using a graph database?
A social media platform using a graph to model users, their connections, posts, and reactions for friend suggestions and feed ranking.
What are architectural implications of using graph databases?
Applications must be designed to take advantage of graph traversals; data ingestion and consistency models may differ from RDBMS.
How do graph databases perform under load?
Read queries involving deep relationships are very efficient, but high-volume write loads may require tuning and careful design.