Project1.2 Parallel Programming using EMR
Goal: filter and analyze a large dataset using Hadoop MapReducer through AWS EMR.
MapReduce configuration
- Please do not use “.” (periods) or, in general, any other non alphanumeric characters in your bucket name (the bucket to which your mapper and reducer code is uploaded), otherwise the EMR job might fail.
- You may want to preserve your cluster by unchecking the “Terminate on failure” option and adding steps manually in the EMR web console
- Config:
jar –cvf mapper.jar Mapper.class
jar –cvf reducer.jar Reducer.class
phoenixpan@Ghost:~/Desktop$ javac -cp test.jar Main.java doesn’t work
phoenixpan@Ghost:~/Desktop$ java -cp test.jar Main will work
java -cp Mapper.jar Mapper
java -cp Reducer.jar Reducer
-files s3://ccproject0102/Mapper.jar,S3://ccproject0102/Reducer.jar