Top k events aggregation design Flashcards
(5 cards)
design HLD for top k events
https://drive.google.com/file/d/19PdYcJeWvjhp7LBhM0O2x2y7w8GgtNad/view?usp=drive_link
How will you aggregate the number of clicks ?
All Events -> Map -> Aggregate -> Outputs
All events will comes ad1,ad2,ad3,ad4
Now events will be mapped based on partioning criteria
Node 1 is processing even and node 2 is processing odd
Node 1 will have ad2, ad4
Node 2 will have ad1,ad3
3. After that aggregation will happen
4. output will be count
How will you aggregate the top 100 items
- Input events are mapped using ad_id
- Each aggregated node contains heap like data structure to get top N ads
- Last step reduce node reduces all the top aggregated by nodes to top N nodes most clicked ads every minute
How will you filter let say based on region or some other criteria to get top clicked items ?
- Have predefined flitering criteria and aggregate based on them
filter_id, region, ip,user_id
f001
add additional filter id to the aggregated table
How will you perform data recalculation ?
data recalculation steps
1. recalculation service retrieves data from raw data storage
2. Retrieved data is sent to dedicated aggregation service so that real time processing is not impacted
3. Aggregated data are sent to second message queue then updated in aggergation database
https://drive.google.com/file/d/13zoglahglw5-E-213uvVLtSjXxqUQLH1/view?usp=drive_link