Solution 1: skipped
Brute force with 3 loops, time complexity O(n^3)
Solution 2: accepted 66%
Time:O(n^2), Space: O(1)
Two pointer solution, similar to Question 015, 3 Sum.
|
|
Brute force with 3 loops, time complexity O(n^3)
Time:O(n^2), Space: O(1)
Two pointer solution, similar to Question 015, 3 Sum.
|
|
Pretty massy, need to be improved.
|
|
Use -1 as flag of inbalanced tree. Use another variable for better practice in production.
|
|
All commends lead by ctrl+a
, followed by ?
to check the entire reference.
c
create new windown
and p
: next window / previous windoww
to see all windows(ctrl+)"
goes directly to the wanted windowA
change the window’s name (default: bash)k
kill this window\
quit screen and kill all windowsd
detach from a screen session to the main console, screen -r
return to the screen sessionscreen -ls
to check all screen sessions with ids, then usescreen -r 1234
to attach a certain sessionscreen
c
d
3 data sources are MySQL, HBase, and MongoDB. We need to load the given data to the databases and optimized them to have required performance.
Source code: https://github.com/PhoenixPan/CloudComputing/tree/master/Project3.4
Dataset:
Request format: GET /task1?id=[UserID]&pwd=[Password]
Response format: {"name":"my_name", "profile":"profile_image_url"}
If the Id and password combination was found in database, return and display user profile image(as the image below), otherwise return and display error message.
Dataset:
links.csv [Followee, Follower]
Request format: GET /task2?id=[UserID]
Response format:
In HBase, we store the database as key:follower, column: followee1, followee2, followee3,...
.
Eventally, we could display something like:
Dataset: posts.csv [posts in JSON format]
Request format: GET /task3?id=[UserID]
Response format: {"posts":[{post1_json}, {post2_json}, ...]}
Find posts which match the required “uid”, sort them in ascending timestamp order, and return as response.
_id
field when getting JSON from MongoDB.Example results:
Request format: http://backend-public-dns:8080/MiniSite/task4?id=99
Response format:
One single JSON object includes user name, user profile, an array of followers, and an array of posts.
As the title suggests, we put the previous three tasks together. We need to:
Final Result:
Request format: http://backend-public-dns:8080/MiniSite/task5?id=<user_id>
Response format: 10 users that appeared most frequently
We were asked to implement a very simple and yet successful recommendation model, Collaborative filtering. Simply speaking, in a directed graph, we need to find all qualified user R, where min_distance(me, R) = 2. For example:
In our project, we will do:
Final Result:
Replication: keep multiple copies of data to improve GET performance and recover from database outages.
Partitioning(Sharding): separate data among many nodes to improve PUT performance. Horizental partitioning: shard by rows, vertical partitioning: shard by columns.
Strong consistency:
Create a lockshop HashMap<String, ReentrantReadWriteLock> lockShop
to set lock for each key
Consistency hashing: assign keys to nodes evenly with fault tolerance. The hashing algorithm must return the same value for the same key at all times.
Goal: compare the efficiency of three kinds of queries:
grep
and awk
Install MySQL using
|
|
Connect to the root user, type your password following -p
|
|
Create new user
|
|
Grant partial or all priviledges and login again, here we grant all priviledges:
|
|
Create database and table which support full unicode
|
|
use index to dramatically decrease query time
LIKE
to match a regular expression case insensitive, use LIKE BINARY
to match case sensitive, which is required here. Binary match a character’s ASC code instead of its value, so it’s an exact match. release
is a reserved word in MySQL, we have to use songs.release
to escape it. Notice, we may use 'release'
(with single quotation mark) when we create a table, but tableName.columnName
in a query.At the end of the query, we use AS filtered
otherwise, we will get Exception:Every derived table must have its own alias
Log in with --local-infile
flag
and then execute
We benchmark the performance of FileIO using four settings, t1.micro and m3.large, magnetic disk and SSD.
|
|
First, we build an EMR with one master and one core (slave), and SSH into the master core. You shoud be able to see your hard drive condition through these two commands
(PS: Don’t mistakenly ssh into the slave. You can find instance ID under EMR)
After you SSH into master node, you can run the following command to verify that HDFS is healthy given how it’s being reported per datanode:
hadoop dfsadmin -report
|
|
Create a directory in guest machine (where you are right now) and move the .csv file to that location
|
|
Create a directory in HDFS to store the file
|
|
Put the file into HDFS. HowToUse -put: -put /local/directory /hdfs/directory
|
|
Create a new table in HBase by log in into shell using:
|
|
Create a new table whose name is songdata and column family data
|
|
Exit shell, and gety ready the data for inputing it to HBase, you should see a process bar telling you the progress of input like:
[======> ]45%
|
|
Load the data into table
|
|
Now, scan the table to make sure data has been loaded successfully, use Ctrl+C to stop scanning
|
|
Note down the private IP and enter it into the java file, run demo through commands
|
|
If you see the result of 96 rows, you could process to answer the questions now : )
Goal: Explain and implement cache.
In this project, we will create a simple cache using HashMap. Some suggested that we should use LinkedHashMap, why? As I tried, I didn’t see any improvement in performance. We suppose to apply a sophisticated strategy, however I end up with this simple strategy:
I tried to distinguish decrease and increase but didn’t see any improvement neither. Is there a better way to do this?
Goal: Using APIs to programatically control AWS Load Balancer and Auto Scaling Group.
Elastic Load Balancer automatically divides traffic to connected instances and handles instance failures by doing health checks. The default distribution strategy is Round-Robin. If an instance is unhealthy, it will stop routing requests to it.
Auto Scaling Group automatically adds or removes identical resources by responding to changes in demands.
Auto Scaling Policy: Auto Scaling Group reacts when user-specified threadhols(i.e. CPU utilization) are triggered(by CloudWatch). It maintains a desired number(between specified minium and maximum) of instances at all times. It can also scale based on specific schedule(i.e. more server during the weekend). Auto Scaling Group does health checks as well. If an instance is unhealthy, it will launch new instances to replace them.
Auto Scaling Template: instance AMI, instance type, key pairs, security group, etc. The applications in the instances should run automatically when the instances are up.