Social Networking Timeline with Heterogeneous Backends
Overall
3 data sources are MySQL, HBase, and MongoDB. We need to load the given data to the databases and optimized them to have required performance.
Source code: https://github.com/PhoenixPan/CloudComputing/tree/master/Project3.4
Task1: Implementing Basic Login with MySQL on RDS
Dataset:
- users.csv [UserID, Password]
- userinfo.csv [UserID, Name, Profile Image URL]
Request format: GET /task1?id=[UserID]&pwd=[Password]
Response format: {"name":"my_name", "profile":"profile_image_url"}
If the Id and password combination was found in database, return and display user profile image(as the image below), otherwise return and display error message.
Task2: Storing Social Graph using HBase
Dataset:
links.csv [Followee, Follower]
Request format: GET /task2?id=[UserID]
Response format:
In HBase, we store the database as key:follower, column: followee1, followee2, followee3,...
.
- Get all followees of the given user
- Extract the followees’ IDs
- Sort by name in ascending alphabet order
- Find their profile image in MySQL database just like we did in Task1 and return
Eventally, we could display something like:
Task3: Build Homepage using MongoDB
Dataset: posts.csv [posts in JSON format]
Request format: GET /task3?id=[UserID]
Response format: {"posts":[{post1_json}, {post2_json}, ...]}
Find posts which match the required “uid”, sort them in ascending timestamp order, and return as response.
- Remove the
_id
field when getting JSON from MongoDB.
Example results:
Task4: Put Everything Together
Request format: http://backend-public-dns:8080/MiniSite/task4?id=99
Response format:
One single JSON object includes user name, user profile, an array of followers, and an array of posts.
As the title suggests, we put the previous three tasks together. We need to:
- Get user profile
- Get all the followers
- Get the most recent 30 posts for each user, which are sorted by timestamp and then by post id
- Return the most recent 30 posts from all selected posts in step 3
- This is a complicated system, I put details of design such a News Feed System in an separate article.
Final Result:
Bonus Task: Basic Recommendation
Request format: http://backend-public-dns:8080/MiniSite/task5?id=<user_id>
Response format: 10 users that appeared most frequently
We were asked to implement a very simple and yet successful recommendation model, Collaborative filtering. Simply speaking, in a directed graph, we need to find all qualified user R, where min_distance(me, R) = 2. For example:
- Given:
Followee A follows {B, C, D}
Followee B follows {C, E, A}
followee C follows {F, G}
followee D follows {G, H} - To recommend to A, we collaborate B, C, D and get:
{A:1, C: 1, E: 1, F: 1, G: 2, H: 1} - Then we remove A’s direct followee {B, C, D}
- Eventually we have {G: 2, E: 1, F: 1, H: 1}
In our project, we will do:
- Find all followees(E) of the user
- Find all followees of the followees(EE)
- Store them(EE) in HashMap
- Remove direct followee(E)
- Return the first 10 user with the most frequent appearence
Final Result: