Graph Databases (Neo4j, Amazon Neptune) Flashcards by se7en coding

What is a graph database and where would you use it?

A graph database stores data as nodes and relationships, making it ideal for use cases involving highly connected data like social networks, fraud detection, and recommendations.

How well did you know this?

Not at all

Perfectly

What are nodes in a graph database?

Nodes represent entities such as people, products, or places in a graph database.

How well did you know this?

Not at all

Perfectly

What are relationships in a graph database?

Relationships are directed connections between nodes, representing how entities are related.

How well did you know this?

Not at all

Perfectly

What are properties in a graph database?

Properties are key-value pairs stored on nodes and relationships to hold additional metadata.

How well did you know this?

Not at all

Perfectly

How is Cypher query language different from SQL?

Cypher uses pattern matching to traverse graphs, unlike SQL which uses JOINs to relate tables in relational databases.

How well did you know this?

Not at all

Perfectly

What is Cypher used for in Neo4j?

Cypher is the query language for Neo4j, used to match patterns, create or delete nodes and relationships, and return query results.

How well did you know this?

Not at all

Perfectly

What are some common use cases for graph databases?

Social networks, fraud detection, recommendation engines, network and IT operations, and knowledge graphs.

How well did you know this?

Not at all

Perfectly

What is pattern matching in graph databases?

Pattern matching refers to specifying a sequence of nodes and relationships in a query to extract relevant subgraphs.

How well did you know this?

Not at all

Perfectly

What is a graph traversal?

Graph traversal is the process of visiting connected nodes in a graph, often used in queries and algorithms like shortest path.

How well did you know this?

Not at all

Perfectly

What is an index in Neo4j?

An index in Neo4j speeds up lookups of nodes or relationships by property values.

How well did you know this?

Not at all

Perfectly

What are constraints in Neo4j?

Constraints enforce rules like uniqueness of a property (e.g., a user ID) on nodes or relationships to maintain data integrity.

How well did you know this?

Not at all

Perfectly

What is Amazon Neptune?

Amazon Neptune is a fully managed graph database service that supports both property graphs (via Gremlin) and RDF data (via SPARQL).

How well did you know this?

Not at all

Perfectly

What are the main query languages supported by Amazon Neptune?

Neptune supports Gremlin for property graph model and SPARQL for RDF graph model.

How well did you know this?

Not at all

Perfectly

What is the shortest path algorithm in graph databases?

It finds the minimum distance between two nodes, commonly used in routing and recommendation systems.

How well did you know this?

Not at all

Perfectly

What is PageRank in graph databases?

PageRank is an algorithm that ranks nodes based on the number and quality of their incoming links, useful in web search and influence analysis.

How well did you know this?

Not at all

Perfectly

What is centrality in graph analysis?

Study These Flashcards

Centrality measures how important a node is within a graph based on its position and connectivity (e.g., degree, closeness, betweenness).

What are advantages of graph databases?

Study These Flashcards

Efficient for modeling and querying connected data, flexible schema, real-time traversals, and intuitive relationship representation.

What are disadvantages of graph databases?

Study These Flashcards

They may not scale as well for high write throughput, can be complex to set up and operate, and have limited tooling compared to SQL databases.

What are best practices for graph data modeling?

Study These Flashcards

Model based on access patterns, avoid deeply nested relationships, use constraints and indexes for performance, and keep relationships meaningful.

What are best practices for querying in Neo4j?

Study These Flashcards

Use indexes where applicable, avoid Cartesian products, use EXPLAIN/PROFILE to optimize queries, and limit traversal depth.

What is the impact of graph databases on system design?

Study These Flashcards

They enable more natural modeling of connected domains, reduce JOIN overhead, and are ideal for real-time relationship querying.

What is a real-world example of using a graph database?

Study These Flashcards

A social media platform using a graph to model users, their connections, posts, and reactions for friend suggestions and feed ranking.

What are architectural implications of using graph databases?

Study These Flashcards

Applications must be designed to take advantage of graph traversals; data ingestion and consistency models may differ from RDBMS.

How do graph databases perform under load?

Study These Flashcards

Read queries involving deep relationships are very efficient, but high-volume write loads may require tuning and careful design.

How do graph databases ensure fault tolerance?

Managed services like Neo4j Enterprise and Amazon Neptune offer replication, backup, and failover to ensure high availability.

How can you monitor a graph database?

Use tools like Neo4j's monitoring dashboard, Amazon CloudWatch for Neptune, and query profiling tools to track query performance and system metrics.

How can you debug performance issues in graph queries?

Use `PROFILE` in Cypher or query plans in Gremlin to analyze slow parts, and review use of indexes and graph structure.

What are real-world tradeoffs of graph databases?

They offer high performance for connected data but require a new query paradigm and careful design to avoid performance issues with large or complex graphs.

What are common graph database interview questions?

Examples: 'What is a graph traversal?', 'How is Cypher different from SQL?', 'What is PageRank?', 'When would you use a graph DB over relational?'

What are potential gotchas in graph database usage?

Unindexed property lookups, overly dense nodes (supernodes), and unbounded traversals can lead to poor performance.