mrjob可以实现用python开发在Hadoop上实行 mrjob程序可以在本地测试运行也可以部署到Hadoop集群上运行 (1)首先,要在自己的python虚拟环境中安装mrjob库 pip install mrjob 完成后通过pip list查看是否安装成功 (2)写好python文件:
from mrjob
.job
import MRJob
class MRJobCount(MRJob
):
def mapper(self
, key
, line
):
yield "chars_number", len(line
)
yield "words_number", len(line
.split
())
yield "lines_number", 1
def reducer(self
, key
, values
):
yield key
, sum(values
)
if __name__
== '__main__':
MRJobCount
.run
()
(3)写好测试文件: (4)实行命令查看结果 统计成功