News Feed System
Basics
- Your news feed consists of the timelines of all your friends or users that you are following
- Core algorithm: merge k sorted array/list/any
- Daily active users(DAU), monthly active users(MAU), query per second(QPS). Peak value and growth expectation
- Number of reads and writes
- Examples:
- 朋友圈
Pull model and Push model
Pull model
Get the first N tweets from K(all of your) following users, merge them, and then take the first N tweets.
Problem: Long waiting period. It takes K times blocking (阻塞型) DB read request to load the news feed.
Solution: Cache, for both timeline and news feed:
- Cache timeline: cache the most recent n tweets for each user (DB read –> cache read)
- Cache news feed: If read within x seconds from the last time, read from cache. If not, only merge the K tweets after a certain timestamp (such as since the last update)
Push mode
Store a news feed list in the database for each user. When a user publishes a new tweet, do a fanout to push the tweet to all the news feed lists of his/her followers. The fanout is performed asynchronously in another task.
Problem: Delayed display for some users. If we have a star user who has 100 million followers, the fanout will have 100 million write operations, which could take hours and is unacceptable for those unlucky followers.
Solution: Rank users by active level; find star users(who have a lot of followers) and use pull model for them.
Comparison
Item | pull | push |
---|---|---|
DB read | K (num of following) | 1 |
DB write | 1 | K (num of follower) |
Demand for resources | expensive (memory) | cheap (disk) |
Delay | low | high (to hear from stars) |
Active level | users post frequently | users post unfrequently |