Top k events aggregation design Flashcards

(5 cards)

1
Q

design HLD for top k events

A

https://drive.google.com/file/d/19PdYcJeWvjhp7LBhM0O2x2y7w8GgtNad/view?usp=drive_link

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How will you aggregate the number of clicks ?

A

All Events -> Map -> Aggregate -> Outputs
All events will comes ad1,ad2,ad3,ad4
Now events will be mapped based on partioning criteria
Node 1 is processing even and node 2 is processing odd
Node 1 will have ad2, ad4
Node 2 will have ad1,ad3
3. After that aggregation will happen
4. output will be count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How will you aggregate the top 100 items

A
  1. Input events are mapped using ad_id
  2. Each aggregated node contains heap like data structure to get top N ads
  3. Last step reduce node reduces all the top aggregated by nodes to top N nodes most clicked ads every minute
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How will you filter let say based on region or some other criteria to get top clicked items ?

A
  1. Have predefined flitering criteria and aggregate based on them
    filter_id, region, ip,user_id
    f001

add additional filter id to the aggregated table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How will you perform data recalculation ?

A

data recalculation steps
1. recalculation service retrieves data from raw data storage
2. Retrieved data is sent to dedicated aggregation service so that real time processing is not impacted
3. Aggregated data are sent to second message queue then updated in aggergation database

https://drive.google.com/file/d/13zoglahglw5-E-213uvVLtSjXxqUQLH1/view?usp=drive_link

How well did you know this?
1
Not at all
2
3
4
5
Perfectly