빅데이터공부하기 61
map.py
#!/usr/bin/env python
import sys
#--- get all lines from stdin ---
for line in sys.stdin:
#--- remove leading and trailing whitespace---
line = line.strip()
#--- split the line into words ---
words = line.split()
#--- output tuples [word, 1] in tab-delimited format---
for word in words
print'%s\t%s'%(word,"1")
reduce.py
#!/usr/bin/env python
import sys
word2count = {}
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t',1)
try:
count = int(count)
except ValueError:
continue
try:
word2count[word] = word2count[word]+count
except:
word2count[word] = count
for word in word2count.keys():
print'%s\t%s'%(word,word2count[word])
'빅데이터 > 빅데이터Hadoop' 카테고리의 다른 글
빅데이터공부하기63. Pig (0) | 2015.06.25 |
---|---|
빅데이터 공부하기 62. map.pl / reduce.pl (0) | 2015.06.25 |
빅데이터공부하기 60. 스트리밍소개 (0) | 2015.06.24 |
빅데이터공부하기 59_4 INVERTED INDEX V3 (0) | 2014.12.10 |
빅데이터공부하기 59_3 INVERTED INDEX V3 (0) | 2014.12.08 |