빅데이터/빅데이터Hadoop

빅데이터 하둡 프로그래밍 교육과정 ]WordCount in Pig

행복한짱짱이 2017. 4. 3. 19:50

빅데이터 하둡 프로그래밍 교육과정 ]WordCount in Pig




빅데이터 하둡 프로그래밍 교육과정 ]WordCount in Pig

실무개발자를위한 실무교육 전문교육센터학원

www.oraclejava.co.kr에 오시면 보다 다양한 강좌를 보실 수 있습니다.


WordCount in Pig


 input_lines = LOAD '/data/README.txt' AS (line:chararray);


 -- Extract words from each line and put them into a pig bag datatype, then flatten the bag to get one word on each row

words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;


 -- filter out any words that are just white spaces

filtered_words = FILTER words BY word MATCHES '\\w+';


 -- create a group for each word

word_groups = GROUP filtered_words BY word;

 -- count the entries in each group

word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word;

 -- order the records by count

ordered_word_count = ORDER word_count BY count DESC;

STORE ordered_word_count INTO '/data/README-count';