Friday, December 3, 2010

Simple MapReduce Program Example in Python

Given a set of integers, compute the sum of their square values.
(Assumption: You already know how to run a map reduce program using an input file and how to check the output)

mapper.py

#!/usr/bin/env python

import sys

# input comes from STDIN (standard input)
for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()
    # split the line into numbers
    numbers = line.split()
    for number in numbers:
        num = int(number)
        square = num*num
        print '%s\t%s' % (1,square)

reducer.py

#!/usr/bin/env python

import sys

#sum initialized to zero
sum=0
# input comes from STDIN
for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()
    # parse the input we got from mapper.py
    word, square = line.split('\t', 1)
    # convert square(currently a string) to int
    try:
        square = int(square)
        sum=sum+square
    except ValueError:
        # count was not a number, so silently
        # ignore/discard this line
        pass
#print sum of squares
print '%s'% (sum)
___________________________________________________________________
input.txt
1 2 3 4


output/part-0000
30

No comments:

Post a Comment