Amazon Mapreduce Overview

Amazon provides mapreduce computation on their resources, which is a great option for many different types of computation. To get started you will need an amazon account. From a top menu bar you can select mapreduce. There will be a few steps to perform before you start:

  1. Define a Job Flow – Specify whether you want to run your application including a Hive Program, Custom JAR, Streaming, or Pig Program. Sample applications are also available for use.
  2. Specify Parameters – Next you will be prompted for input parameters such as Input Location, Output Location, A Mapper, A Reducer, and extra arguments. One can even read/write directly to Amazon S3, which is great for not being limited in space.
  3. Configure EC2 Instances – Now configure your EC2 Instances, which will perform the computation. You can select an instance type and number of instances you wish to compute on.
  4. Specify Advanced Options – Configure any advanced options such as ec2 keypairs, vpc subnet, s3 log path, debugging and keepalive parameters.
  5. Bootstrap Actions – you can choose to use or not use bootstrap actions. If you choose to use it action types will be prompted and can be configured accordingly.

Amazon provides great flexibility and cost savings when performing advanced computation using mapreduce.

Be the first to comment

Leave a Reply