Metered Billing Flashcards

1
Q

What is Hydro

A

Hydro is an event logging system built on Apache Kafka and deployed across multiple sites for scalable, fault-tolerant event logging.

Automatically archives events to the data warehouse for use in batch processing pipelines, dashboards and ad hoc SQL analysis.

Any application that writes Hydro events is referred to as a producer and any application that reads Hydro events is referred as a consumer. Hydro uses protocol buffer schemas (protobuf) to define event payloads and decoupled producers and consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between hydro event streams and job queue

A
  • Messages are never removed from Streams, the cursor just moves forward
  • If an error occurs processing a Hydro message from the log, there are only two options: stop everything or keep going. There is no retry_on like with ActiveJob. And the log is append-only, so if you encounter an error and choose to keep going, there’s no built-in way to keep track of which message failed.

In the case of billing, just dropping messages could mean we under- or overcharge our users, so it’s important that we can keep track of messages that fail. For our Hydro processors, we republish failed messages to a dead-letter topic so we can handle them manually if needed.

Job queues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stream

A
  • Messages are not removed after being processed like they are in queues
  • Append only
  • Maintains a pointer to the current message
  • Streams are persistent, durable and fault tolerant.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Streaming

A
  • Streaming of data is the constant flow of events where each event should contain enough information to reflect the change in state.
  • It allows for the processing of data to occur in real-time (data in motion) and is different from the traditional approach for the processing of static data to occur (data at rest) at a later point in time, known as batch processing
  • Streaming data is unbounded, meaning it has no real beginning and no real end
  • Each event is processed as it occurs and is managed accordingly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give an example streaming service or event log

A
  • The Stock Market.
  • When a stock price change, a new event is created containing the time and day, the stock identifier, and its new trade price which in this example is the change in state
  • Given there are thousands of stocks, and thousands of trades happening every second, this results in a constant stream of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Publisher (pub)?

A

An application that publishes a message to a topic

It does not know the destination of the message

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Subscriber/Consumer (sub)?

A

An application that registers itself to a topic to receive the messages
It does not know the source of the message

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Queue?

A

A Queue is a mechanism for system to system communication that is asynchronous in behaviour
Queues store messages until they are processed and deleted.
Each message is processed only once and for a single consumer.
Queues can be used to decouple heavyweight processing, to control the flow of an influx or batch data, and to support erratic workloads.
This is typically used in Serverless and Microservices architectures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Topic?

A

Is a channel that maintains a list of subscribers to relay messages to that are received from publishers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When do you use a queue message system vs event streaming?

A

Event streaming is really powerful when you have events that you want to be able to process and perform analysis in real-time allowing your systems to immediately take action.

Messaging is powerful when it comes to decoupling your systems and providing a highly available and durable solutions. Messaging is more of a way to manage your system interactions and control the ingestion of your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name the 4 metered billing event streams

A
  • First: UsageRelay Stream. Records individual usage events
  • Second: Processor that filters out line items that can’t be submitted
  • Thrid: Processor builds in realtime CSV files of data we’re submitting to Zuora
  • Fourth: Takes care of some bookkeeping in the database so that we can track which CSV file a line item appears in
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the first iteration of metered billing and why it failed

A
  • We started by building a batch file system that would run through a scheduled job. Every 6 hours we would take the usage we got and build a CSV with the data and then upload that CSV to Zuora.
  • Was initially relatively efficient since it was really just iterating over the rows of the database table.
  • After Actions went GA, and we added more products (packages and shared storage) this didn’t scale
  • The job could now read from three different tables, and we had to introduce several checks on what we could actually submit to Zuora. This led to a couple of problems: performing these checks in memory required loading other related records from the database, which adds up even with all possible optimizations; and at times, we had to check so many records that we’d exhaust the memory available to the job by creating too many Ruby objects.
  • Tried moving as many of the checks into the database query as we could, made sure all queries had covering indices
  • Helped for a bit, right up until the query began to timeout.
  • We reached a point where we were just asking too much of Ruby and MySQL.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly