빅데이터 하둡 프로그래밍 교육과정 ]WordCount in Pig
빅데이터 하둡 프로그래밍 교육과정 ]WordCount in Pig
빅데이터 하둡 프로그래밍 교육과정 ]WordCount in Pig
실무개발자를위한 실무교육 전문교육센터학원
www.oraclejava.co.kr에 오시면 보다 다양한 강좌를 보실 수 있습니다.
WordCount in Pig
input_lines = LOAD '/data/README.txt' AS (line:chararray);
-- Extract words from each line and put them into a pig bag datatype, then flatten the bag to get one word on each row
words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;
-- filter out any words that are just white spaces
filtered_words = FILTER words BY word MATCHES '\\w+';
-- create a group for each word
word_groups = GROUP filtered_words BY word;
-- count the entries in each group
word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word;
-- order the records by count
ordered_word_count = ORDER word_count BY count DESC;
STORE ordered_word_count INTO '/data/README-count';