Hadoop学习(五)Hadoop日志总结

通过之前的WordCount和AdLog实例,我们已经能够编写出简单的MapReduce实例了,但是第一次编写还是难免会遇到一些问题,比如AdLog解析json日志结构的时候出错怎么办,如何查看MapReduce的运行日志呢,这就是我们本篇要重点介绍的Hadoop日志

理解Hadoop的日志

Hadoop的日志一般会分为下面两种

  • Hadoop系统服务输出的日志
  • Mapreduce程序输出来的日志

Hadoop系统服务输出的日志

也就是我们启动NameNode, DataNode, NodeManager ResourceManager, HistoryServer等等系统自带的服务输出来的日志,默认是存放在${HADOOP_HOME}/logs目录下。

可以在mapred-site.xml配置文件中指定Hadoop日志的输出路径

1
2
3
4
5
6
7
<configuration>
<!-- hadoop的日志输出指定目录-->
<property>
<name>mapred.local.dir</name>
<value>/home/yunyu/birdben_logs</value>
</property>
<configuration>
服务名 服务类型 日志文件名
resourcemanager YARN yarn-${USER}-resourcemanager-${hostname}.log
nodemanager YARN yarn-${USER}-nodemanager-${hostname}.log
historyserver HDFS mapred-${USER}-historyserver-${hostname}.log
secondarynamenode HDFS hadoop-${USER}-secondarynamenode-${hostname}.log
namenode HDFS hadoop-${USER}-namenode-${hostname}.log
datanode HDFS hadoop-${USER}-datanode-${hostname}.log
1
2
3
4
5
6
用resourcemanager的输出日志举例 : yarn-${USER}-resourcemanager-${hostname}.log
${USER}是指启动resourcemanager进程的用户
${hostname}是resourcemanager进程所在机器的hostname;当日志到达一定的大小(可以在${HADOOP_HOME}/etc/Hadoop/log4j.properties文件中配置)将会被切割出一个新的文件,切割出来的日志文件名类似yarn-${USER}-resourcemanager-${hostname}.log.数字的,后面的数字越大,代表日志越旧。在默认情况下,只保存前20个日志文件
-rw-rw-r-- 1 yunyu yunyu 8987088 Oct 27 11:24 yarn-yunyu-nodemanager-hadoop1.log -rw-rw-r-- 1 yunyu yunyu 700 Oct 27 10:19 yarn-yunyu-nodemanager-hadoop1.out -rw-rw-r-- 1 yunyu yunyu 2062 Oct 26 18:25 yarn-yunyu-nodemanager-hadoop1.out.1 -rw-rw-r-- 1 yunyu yunyu 2062 Oct 26 17:51 yarn-yunyu-nodemanager-hadoop1.out.2 -rw-rw-r-- 1 yunyu yunyu 700 Oct 25 16:18 yarn-yunyu-nodemanager-hadoop1.out.3 -rw-rw-r-- 1 yunyu yunyu 2062 Oct 23 17:54 yarn-yunyu-nodemanager-hadoop1.out.4

Mapreduce程序输出来的日志

MapReduce程序输出来的日志又细分为下面两种

  • 作业运行日志(历史作业日志)
  • 任务运行日志(Container日志)

作业运行日志(历史作业日志)

作业运行由MRAppMaster(MapReduce作业的ApplicationMaster)产生,详细记录了作业启动时间、运行时间,每个任务启动时间、运行时间、Counter值等信息。这些信息对分析作业是很有帮助的,我们可以通过这些历史作业记录得到每天有多少个作业运行成功、有多少个作业运行失败、每个队列作业运行了多少个作业等很有用的信息。

MapReduce作业的ApplicationMaster也运行在Container中,且是编号为000001的Container,比如container_1385051297072_0001_01_000001,它自身可认为是一个特殊的task,因此,也有自己的运行日志(Container日志),该日志与Map Task和Reduce Task类似,但并不是这里介绍的“作业运行日志”。

作业运行日志和其他的日志文件不一样,是因为这些历史作业记录文件是存储在HDFS上的,而不是存储在本地系统文件中的,可以修改mapred-site.xml配置文件指定对应HDFS的存储路径,而且可以指定正在运行的MapReduce作业和已经完成的MapReduce作业信息在HDFS的存储路径。

mapred-site.xml
<!-- MapReduce已完成作业信息在HDFS的存储路径 -->
<property>
    <name>mapreduce.jobhistory.done-dir</name>
    <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
</property>
<!-- MapReduce正在运行作业信息在HDFS的存储路径 -->
<property>
    <name>mapreduce.jobhistory.intermediate-done-dir</name>
    <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value>
</property>
<!-- MapReduce作业信息在HDFS默认的存储路径 -->
<property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>/tmp/hadoop-yarn/staging</value>
</property>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
作业运行日志产生过程如下:
- 步骤1:ResourceManager启动作业的ApplicationMaster,ApplicationMaster运行过程中,将日志写到${yarn.app.mapreduce.am.staging-dir}/yarn/.staging/job_XXXXX_XXX/下,其中参数yarn.app.mapreduce.am.staging-dir 的默认值是/tmp/hadoop-yarn/staging,该目录下将存在3个文件,分别是以".jhist"、".summary"和".xml"结尾的文件,分别表示作业运行日志、作业概要信息和作业配置属性
- 步骤2:所有任务运行完成后,意味着,该作业运行完成,此时ApplicationMaster将三个文件拷贝到${mapreduce.jobhistory.intermediate-done-dir}/${username}目录下,拷贝后的文件名后面添加"_tmp",其中mapreduce.jobhistory.intermediate-done-dir默认值是${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate
- 步骤3:ApplicationMaster将拷贝完成的三个文件重新命名成".jhist"、".summary"和".xml"结尾的文件(去掉"_tmp")
- 步骤4:周期性扫描线程定期将done_intermediate的日志文件转移到done目录(通过参数mapreduce.jobhistory.done-dir配置,默认值为${yarn.app.mapreduce.am.staging-dir}/history/done)下,同时删除".summary"文件(该文件中的信息,.jhist文件中都有)。
- 步骤5:ApplicationMaster移除
${yarn.app.mapreduce.am.staging-dir}/yarn/.staging/job_XXXXX_XXX/目录
默认情况下,任务运行日志产只会存放在各NodeManager的本地磁盘上,你可以打开日志聚集功能,以便让任务将运行日志推送到HDFS上,以便集中管理和分析。
### 特别需要注意下
默认情况下,NodeManager将日志保存到yarn.nodemanager.log-dirs下,,该属性缺省值为${yarn.log.dir}/userlogs,也就是Hadoop安装目录下的logs/userlogs目录中,通常为了分摊磁盘负载,我们会为该参数设置多个路径,此外,需要注意的是,ApplicationMaster的自身的日志也存放在该路目下,因为它也运行在Container之中,是一个特殊的task。举例如下,其中,最后一个是某个作业的ApplicationMaster日志(编号是000001)。
即ApplicationMaster日志目录名称为container_XXX_000001,普通task日志目录名称则为container_XXX_000002,container_XXX_000003,...
container_XXX_00000X每个目录下包含三个日志文件:stdout、stderr和syslog
- stderr : 错误文件输出
- stdout : System.out.println控制台输出,我们自己写的MapReduce程序的System.out.println输出都将写入到此文件中
- syslog : logger系统日志输出,我们自己的MapReduce程序的logger.info日志记录都将写入到此文件中
#### HistoryServer
Hadoop自带了一个HistoryServer用于查看Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。
需要在mapred-site.xml配置文件中配置HistoryServer的通信地址,并且需要我们手动启动HistoryServer服务
##### 启动HistoryServer
$ mr-jobhistory-daemon.sh start historyserver
1
2
##### mapred-site.xml
<!-- MapReduce JobHistory Server的IPC通信地址,默认端口号是10020 --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop1:10020</value> </property> <!-- MapReduce JobHistory Server的Web服务器访问地址,默认端口号是19888 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop1:19888</value> </property>
1
2
我们可以在HDFS上查看一下MapReduce作业信息相关内容
$ hdfs dfs -ls /data/history/done/2016/10/26/000000 Found 34 items -rwxrwx--- 2 yunyu supergroup 33487 2016-10-26 01:46 /data/history/done/2016/10/26/000000/job_1477390779880_0003-1477471590330-yunyu-wordcount-1477471605995-1-1-SUCCEEDED-default-1477471593992.jhist -rwxrwx--- 2 yunyu supergroup 113716 2016-10-26 01:46 /data/history/done/2016/10/26/000000/job_1477390779880_0003_conf.xml -rwxrwx--- 2 yunyu supergroup 33478 2016-10-26 01:48 /data/history/done/2016/10/26/000000/job_1477390779880_0004-1477471680580-yunyu-wordcount-1477471695328-1-1-SUCCEEDED-default-1477471684537.jhist -rwxrwx--- 2 yunyu supergroup 113716 2016-10-26 01:48 /data/history/done/2016/10/26/000000/job_1477390779880_0004_conf.xml -rwxrwx--- 2 yunyu supergroup 33472 2016-10-26 01:49 /data/history/done/2016/10/26/000000/job_1477390779880_0005-1477471725086-yunyu-wordcount-1477471740048-1-1-SUCCEEDED-default-1477471729179.jhist -rwxrwx--- 2 yunyu supergroup 113716 2016-10-26 01:49 /data/history/done/2016/10/26/000000/job_1477390779880_0005_conf.xml -rwxrwx--- 2 yunyu supergroup 33483 2016-10-26 01:53 /data/history/done/2016/10/26/000000/job_1477390779880_0006-1477471983695-yunyu-wordcount-1477471999289-1-1-SUCCEEDED-default-1477471987996.jhist -rwxrwx--- 2 yunyu supergroup 113716 2016-10-26 01:53 /data/history/done/2016/10/26/000000/job_1477390779880_0006_conf.xml -rwxrwx--- 2 yunyu supergroup 33479 2016-10-26 01:55 /data/history/done/2016/10/26/000000/job_1477390779880_0007-1477472135663-yunyu-wordcount-1477472150775-1-1-SUCCEEDED-default-1477472139406.jhist -rwxrwx--- 2 yunyu supergroup 113716 2016-10-26 01:55 /data/history/done/2016/10/26/000000/job_1477390779880_0007_conf.xml -rwxrwx--- 2 yunyu supergroup 33477 2016-10-26 02:42 /data/history/done/2016/10/26/000000/job_1477390779880_0008-1477474917310-yunyu-wordcount-1477474934028-1-1-SUCCEEDED-default-1477474921803.jhist -rwxrwx--- 2 yunyu supergroup 113716 2016-10-26 02:42 /data/history/done/2016/10/26/000000/job_1477390779880_0008_conf.xml -rwxrwx--- 2 yunyu supergroup 33487 2016-10-26 03:25 /data/history/done/2016/10/26/000000/job_1477477415810_0001-1477477521120-yunyu-wordcount-1477477540308-1-1-SUCCEEDED-default-1477477527364.jhist -rwxrwx--- 2 yunyu supergroup 113712 2016-10-26 03:25 /data/history/done/2016/10/26/000000/job_1477477415810_0001_conf.xml -rwxrwx--- 2 yunyu supergroup 33483 2016-10-26 03:51 /data/history/done/2016/10/26/000000/job_1477477415810_0002-1477479057999-yunyu-wordcount-1477479076500-1-1-SUCCEEDED-default-1477479062997.jhist -rwxrwx--- 2 yunyu supergroup 113712 2016-10-26 03:51 /data/history/done/2016/10/26/000000/job_1477477415810_0002_conf.xml -rwxrwx--- 2 yunyu supergroup 33476 2016-10-26 04:01 /data/history/done/2016/10/26/000000/job_1477477415810_0003-1477479645080-yunyu-wordcount-1477479661516-1-1-SUCCEEDED-default-1477479650248.jhist -rwxrwx--- 2 yunyu supergroup 113712 2016-10-26 04:01 /data/history/done/2016/10/26/000000/job_1477477415810_0003_conf.xml -rwxrwx--- 2 yunyu supergroup 33470 2016-10-26 04:36 /data/history/done/2016/10/26/000000/job_1477477415810_0004-1477481804256-yunyu-wordcount-1477481820774-1-1-SUCCEEDED-default-1477481809917.jhist -rwxrwx--- 2 yunyu supergroup 113638 2016-10-26 04:36 /data/history/done/2016/10/26/000000/job_1477477415810_0004_conf.xml -rwxrwx--- 2 yunyu supergroup 33519 2016-10-26 04:46 /data/history/done/2016/10/26/000000/job_1477477415810_0005-1477482378129-yunyu-wordcount-1477482394056-1-1-SUCCEEDED-default-1477482382630.jhist -rwxrwx--- 2 yunyu supergroup 113682 2016-10-26 04:46 /data/history/done/2016/10/26/000000/job_1477477415810_0005_conf.xml -rwxrwx--- 2 yunyu supergroup 33518 2016-10-26 04:56 /data/history/done/2016/10/26/000000/job_1477477415810_0006-1477482967602-yunyu-wordcount-1477482983257-1-1-SUCCEEDED-default-1477482971751.jhist -rwxrwx--- 2 yunyu supergroup 113682 2016-10-26 04:56 /data/history/done/2016/10/26/000000/job_1477477415810_0006_conf.xml -rwxrwx--- 2 yunyu supergroup 33524 2016-10-26 05:05 /data/history/done/2016/10/26/000000/job_1477477415810_0007-1477483516841-yunyu-wordcount-1477483533885-1-1-SUCCEEDED-default-1477483521388.jhist -rwxrwx--- 2 yunyu supergroup 113682 2016-10-26 05:05 /data/history/done/2016/10/26/000000/job_1477477415810_0007_conf.xml -rwxrwx--- 2 yunyu supergroup 33519 2016-10-26 05:27 /data/history/done/2016/10/26/000000/job_1477477415810_0008-1477484838977-yunyu-wordcount-1477484854521-1-1-SUCCEEDED-default-1477484843086.jhist -rwxrwx--- 2 yunyu supergroup 113682 2016-10-26 05:27 /data/history/done/2016/10/26/000000/job_1477477415810_0008_conf.xml -rwxrwx--- 2 yunyu supergroup 33520 2016-10-26 19:21 /data/history/done/2016/10/26/000000/job_1477534790849_0001-1477534850971-yunyu-wordcount-1477534870748-1-1-SUCCEEDED-default-1477534857063.jhist -rwxrwx--- 2 yunyu supergroup 113699 2016-10-26 19:21 /data/history/done/2016/10/26/000000/job_1477534790849_0001_conf.xml -rwxrwx--- 2 yunyu supergroup 33521 2016-10-26 20:23 /data/history/done/2016/10/26/000000/job_1477534790849_0002-1477538573459-yunyu-wordcount-1477538590195-1-1-SUCCEEDED-default-1477538577756.jhist -rwxrwx--- 2 yunyu supergroup 113699 2016-10-26 20:23 /data/history/done/2016/10/26/000000/job_1477534790849_0002_conf.xml -rwxrwx--- 2 yunyu supergroup 33519 2016-10-26 20:24 /data/history/done/2016/10/26/000000/job_1477534790849_0003-1477538645546-yunyu-wordcount-1477538662701-1-1-SUCCEEDED-default-1477538650360.jhist -rwxrwx--- 2 yunyu supergroup 113699 2016-10-26 20:24 /data/history/done/2016/10/26/000000/job_1477534790849_0003_conf.xml
1
2
3
4
5
6
7
通过上面的结果我们可以得到一下几点:
- (1)历史作业记录是存放在HDFS目录中;
- (2)由于历史作业记录可能非常多,所以历史作业记录是按照年/月/日的形式分别存放在相应的目录中,这样便于管理和查找;
- (3)对于每一个Hadoop历史作业记录相关信息都用两个文件存放,后缀名分别为*.jhist,*.xml。*.jhist文件里存放的是具体Hadoop作业的详细信息,如下:
- (4)每一个作业的历史记录都存放在一个单独的文件中。
hdfs dfs -cat /data/history/done/2016/10/26/000000/job_1477534790849_0003-1477538645546-yunyu-wordcount-1477538662701-1-1-SUCCEEDED-default-1477538650360.jhist Avro-Json {"type":"record","name":"Event","namespace":"org.apache.hadoop.mapreduce.jobhistory","fields":[{"name":"type","type":{"type":"enum","name":"EventType","symbols":["JOB_SUBMITTED","JOB_INITED","JOB_FINISHED","JOB_PRIORITY_CHANGED","JOB_STATUS_CHANGED","JOB_QUEUE_CHANGED","JOB_FAILED","JOB_KILLED","JOB_ERROR","JOB_INFO_CHANGED","TASK_STARTED","TASK_FINISHED","TASK_FAILED","TASK_UPDATED","NORMALIZED_RESOURCE","MAP_ATTEMPT_STARTED","MAP_ATTEMPT_FINISHED","MAP_ATTEMPT_FAILED","MAP_ATTEMPT_KILLED","REDUCE_ATTEMPT_STARTED","REDUCE_ATTEMPT_FINISHED","REDUCE_ATTEMPT_FAILED","REDUCE_ATTEMPT_KILLED","SETUP_ATTEMPT_STARTED","SETUP_ATTEMPT_FINISHED","SETUP_ATTEMPT_FAILED","SETUP_ATTEMPT_KILLED","CLEANUP_ATTEMPT_STARTED","CLEANUP_ATTEMPT_FINISHED","CLEANUP_ATTEMPT_FAILED","CLEANUP_ATTEMPT_KILLED","AM_STARTED"]}},{"name":"event","type":[{"type":"record","name":"JobFinished","fields":[{"name":"jobid","type":"string"},{"name":"finishTime","type":"long"},{"name":"finishedMaps","type":"int"},{"name":"finishedReduces","type":"int"},{"name":"failedMaps","type":"int"},{"name":"failedReduces","type":"int"},{"name":"totalCounters","type":{"type":"record","name":"JhCounters","fields":[{"name":"name","type":"string"},{"name":"groups","type":{"type":"array","items":{"type":"record","name":"JhCounterGroup","fields":[{"name":"name","type":"string"},{"name":"displayName","type":"string"},{"name":"counts","type":{"type":"array","items":{"type":"record","name":"JhCounter","fields":[{"name":"name","type":"string"},{"name":"displayName","type":"string"},{"name":"value","type":"long"}]}}}]}}}]}},{"name":"mapCounters","type":"JhCounters"},{"name":"reduceCounters","type":"JhCounters"}]},{"type":"record","name":"JobInfoChange","fields":[{"name":"jobid","type":"string"},{"name":"submitTime","type":"long"},{"name":"launchTime","type":"long"}]},{"type":"record","name":"JobInited","fields":[{"name":"jobid","type":"string"},{"name":"launchTime","type":"long"},{"name":"totalMaps","type":"int"},{"name":"totalReduces","type":"int"},{"name":"jobStatus","type":"string"},{"name":"uberized","type":"boolean"}]},{"type":"record","name":"AMStarted","fields":[{"name":"applicationAttemptId","type":"string"},{"name":"startTime","type":"long"},{"name":"containerId","type":"string"},{"name":"nodeManagerHost","type":"string"},{"name":"nodeManagerPort","type":"int"},{"name":"nodeManagerHttpPort","type":"int"}]},{"type":"record","name":"JobPriorityChange","fields":[{"name":"jobid","type":"string"},{"name":"priority","type":"string"}]},{"type":"record","name":"JobQueueChange","fields":[{"name":"jobid","type":"string"},{"name":"jobQueueName","type":"string"}]},{"type":"record","name":"JobStatusChanged","fields":[{"name":"jobid","type":"string"},{"name":"jobStatus","type":"string"}]},{"type":"record","name":"JobSubmitted","fields":[{"name":"jobid","type":"string"},{"name":"jobName","type":"string"},{"name":"userName","type":"string"},{"name":"submitTime","type":"long"},{"name":"jobConfPath","type":"string"},{"name":"acls","type":{"type":"map","values":"string"}},{"name":"jobQueueName","type":"string"},{"name":"workflowId","type":["null","string"],"default":null},{"name":"workflowName","type":["null","string"],"default":null},{"name":"workflowNodeName","type":["null","string"],"default":null},{"name":"workflowAdjacencies","type":["null","string"],"default":null},{"name":"workflowTags","type":["null","string"],"default":null}]},{"type":"record","name":"JobUnsuccessfulCompletion","fields":[{"name":"jobid","type":"string"},{"name":"finishTime","type":"long"},{"name":"finishedMaps","type":"int"},{"name":"finishedReduces","type":"int"},{"name":"jobStatus","type":"string"},{"name":"diagnostics","type":["null","string"],"default":null}]},{"type":"record","name":"MapAttemptFinished","fields":[{"name":"taskid","type":"string"},{"name":"attemptId","type":"string"},{"name":"taskType","type":"string"},{"name":"taskStatus","type":"string"},{"name":"mapFinishTime","type":"long"},{"name":"finishTime","type":"long"},{"name":"hostname","type":"string"},{"name":"port","type":"int"},{"name":"rackname","type":"string"},{"name":"state","type":"string"},{"name":"counters","type":"JhCounters"},{"name":"clockSplits","type":{"type":"array","items":"int"}},{"name":"cpuUsages","type":{"type":"array","items":"int"}},{"name":"vMemKbytes","type":{"type":"array","items":"int"}},{"name":"physMemKbytes","type":{"type":"array","items":"int"}}]},{"type":"record","name":"ReduceAttemptFinished","fields":[{"name":"taskid","type":"string"},{"name":"attemptId","type":"string"},{"name":"taskType","type":"string"},{"name":"taskStatus","type":"string"},{"name":"shuffleFinishTime","type":"long"},{"name":"sortFinishTime","type":"long"},{"name":"finishTime","type":"long"},{"name":"hostname","type":"string"},{"name":"port","type":"int"},{"name":"rackname","type":"string"},{"name":"state","type":"string"},{"name":"counters","type":"JhCounters"},{"name":"clockSplits","type":{"type":"array","items":"int"}},{"name":"cpuUsages","type":{"type":"array","items":"int"}},{"name":"vMemKbytes","type":{"type":"array","items":"int"}},{"name":"physMemKbytes","type":{"type":"array","items":"int"}}]},{"type":"record","name":"TaskAttemptFinished","fields":[{"name":"taskid","type":"string"},{"name":"attemptId","type":"string"},{"name":"taskType","type":"string"},{"name":"taskStatus","type":"string"},{"name":"finishTime","type":"long"},{"name":"rackname","type":"string"},{"name":"hostname","type":"string"},{"name":"state","type":"string"},{"name":"counters","type":"JhCounters"}]},{"type":"record","name":"TaskAttemptStarted","fields":[{"name":"taskid","type":"string"},{"name":"taskType","type":"string"},{"name":"attemptId","type":"string"},{"name":"startTime","type":"long"},{"name":"trackerName","type":"string"},{"name":"httpPort","type":"int"},{"name":"shufflePort","type":"int"},{"name":"containerId","type":"string"},{"name":"locality","type":["null","string"],"default":null},{"name":"avataar","type":["null","string"],"default":null}]},{"type":"record","name":"TaskAttemptUnsuccessfulCompletion","fields":[{"name":"taskid","type":"string"},{"name":"taskType","type":"string"},{"name":"attemptId","type":"string"},{"name":"finishTime","type":"long"},{"name":"hostname","type":"string"},{"name":"port","type":"int"},{"name":"rackname","type":"string"},{"name":"status","type":"string"},{"name":"error","type":"string"},{"name":"counters","type":["null","JhCounters"],"default":null},{"name":"clockSplits","type":{"type":"array","items":"int"}},{"name":"cpuUsages","type":{"type":"array","items":"int"}},{"name":"vMemKbytes","type":{"type":"array","items":"int"}},{"name":"physMemKbytes","type":{"type":"array","items":"int"}}]},{"type":"record","name":"TaskFailed","fields":[{"name":"taskid","type":"string"},{"name":"taskType","type":"string"},{"name":"finishTime","type":"long"},{"name":"error","type":"string"},{"name":"failedDueToAttempt","type":["null","string"]},{"name":"status","type":"string"},{"name":"counters","type":["null","JhCounters"],"default":null}]},{"type":"record","name":"TaskFinished","fields":[{"name":"taskid","type":"string"},{"name":"taskType","type":"string"},{"name":"finishTime","type":"long"},{"name":"status","type":"string"},{"name":"counters","type":"JhCounters"},{"name":"successfulAttemptId","type":["null","string"],"default":null}]},{"type":"record","name":"TaskStarted","fields":[{"name":"taskid","type":"string"},{"name":"taskType","type":"string"},{"name":"startTime","type":"long"},{"name":"splitLocations","type":"string"}]},{"type":"record","name":"TaskUpdated","fields":[{"name":"taskid","type":"string"},{"name":"finishTime","type":"long"}]}]}]} {"type":"AM_STARTED","event":{"org.apache.hadoop.mapreduce.jobhistory.AMStarted":{"applicationAttemptId":"appattempt_1477534790849_0003_000001","startTime":1477538647394,"containerId":"container_1477534790849_0003_01_000001","nodeManagerHost":"hadoop1","nodeManagerPort":47596,"nodeManagerHttpPort":8042}}} {"type":"JOB_SUBMITTED","event":{"org.apache.hadoop.mapreduce.jobhistory.JobSubmitted":{"jobid":"job_1477534790849_0003","jobName":"wordcount","userName":"yunyu","submitTime":1477538645546,"jobConfPath":"hdfs://hadoop1/tmp/hadoop-yarn/staging/yunyu/.staging/job_1477534790849_0003/job.xml","acls":{},"jobQueueName":"default","workflowId":{"string":""},"workflowName":{"string":""},"workflowNodeName":{"string":""},"workflowAdjacencies":{"string":""},"workflowTags":{"string":""}}}} {"type":"JOB_QUEUE_CHANGED","event":{"org.apache.hadoop.mapreduce.jobhistory.JobQueueChange":{"jobid":"job_1477534790849_0003","jobQueueName":"default"}}} {"type":"JOB_INITED","event":{"org.apache.hadoop.mapreduce.jobhistory.JobInited":{"jobid":"job_1477534790849_0003","launchTime":1477538650360,"totalMaps":1,"totalReduces":1,"jobStatus":"INITED","uberized":false}}} {"type":"JOB_INFO_CHANGED","event":{"org.apache.hadoop.mapreduce.jobhistory.JobInfoChange":{"jobid":"job_1477534790849_0003","submitTime":1477538645546,"launchTime":1477538650360}}} {"type":"TASK_STARTED","event":{"org.apache.hadoop.mapreduce.jobhistory.TaskStarted":{"taskid":"task_1477534790849_0003_m_000000","taskType":"MAP","startTime":1477538650763,"splitLocations":"hadoop1,hadoop2"}}} {"type":"TASK_STARTED","event":{"org.apache.hadoop.mapreduce.jobhistory.TaskStarted":{"taskid":"task_1477534790849_0003_r_000000","taskType":"REDUCE","startTime":1477538650767,"splitLocations":""}}} {"type":"MAP_ATTEMPT_STARTED","event":{"org.apache.hadoop.mapreduce.jobhistory.TaskAttemptStarted":{"taskid":"task_1477534790849_0003_m_000000","taskType":"MAP","attemptId":"attempt_1477534790849_0003_m_000000_0","startTime":1477538652833,"trackerName":"hadoop2","httpPort":8042,"shufflePort":13562,"containerId":"container_1477534790849_0003_01_000002","locality":{"string":"NODE_LOCAL"},"avataar":{"string":"VIRGIN"}}}} {"type":"MAP_ATTEMPT_FINISHED","event":{"org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinished":{"taskid":"task_1477534790849_0003_m_000000","attemptId":"attempt_1477534790849_0003_m_000000_0","taskType":"MAP","taskStatus":"SUCCEEDED","mapFinishTime":1477538656711,"finishTime":1477538656933,"hostname":"hadoop2","port":34358,"rackname":"/default-rack","state":"map","counters":{"name":"COUNTERS","groups":[{"name":"org.apache.hadoop.mapreduce.FileSystemCounter","displayName":"File System Counters","counts":[{"name":"FILE_BYTES_READ","displayName":"FILE: Number of bytes read","value":0},{"name":"FILE_BYTES_WRITTEN","displayName":"FILE: Number of bytes written","value":115420},{"name":"FILE_READ_OPS","displayName":"FILE: Number of read operations","value":0},{"name":"FILE_LARGE_READ_OPS","displayName":"FILE: Number of large read operations","value":0},{"name":"FILE_WRITE_OPS","displayName":"FILE: Number of write operations","value":0},{"name":"HDFS_BYTES_READ","displayName":"HDFS: Number of bytes read","value":194},{"name":"HDFS_BYTES_WRITTEN","displayName":"HDFS: Number of bytes written","value":0},{"name":"HDFS_READ_OPS","displayName":"HDFS: Number of read operations","value":3},{"name":"HDFS_LARGE_READ_OPS","displayName":"HDFS: Number of large read operations","value":0},{"name":"HDFS_WRITE_OPS","displayName":"HDFS: Number of write operations","value":0}]},{"name":"org.apache.hadoop.mapreduce.TaskCounter","displayName":"Map-Reduce Framework","counts":[{"name":"MAP_INPUT_RECORDS","displayName":"Map input records","value":4},{"name":"MAP_OUTPUT_RECORDS","displayName":"Map output records","value":14},{"name":"MAP_OUTPUT_BYTES","displayName":"Map output bytes","value":141},{"name":"MAP_OUTPUT_MATERIALIZED_BYTES","displayName":"Map output materialized bytes","value":114},{"name":"SPLIT_RAW_BYTES","displayName":"Input split bytes","value":109},{"name":"COMBINE_INPUT_RECORDS","displayName":"Combine input records","value":14},{"name":"COMBINE_OUTPUT_RECORDS","displayName":"Combine output records","value":9},{"name":"SPILLED_RECORDS","displayName":"Spilled Records","value":9},{"name":"FAILED_SHUFFLE","displayName":"Failed Shuffles","value":0},{"name":"MERGED_MAP_OUTPUTS","displayName":"Merged Map outputs","value":0},{"name":"GC_TIME_MILLIS","displayName":"GC time elapsed (ms)","value":83},{"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":490},{"name":"PHYSICAL_MEMORY_BYTES","displayName":"Physical memory (bytes) snapshot","value":216502272},{"name":"VIRTUAL_MEMORY_BYTES","displayName":"Virtual memory (bytes) snapshot","value":666292224},{"name":"COMMITTED_HEAP_BYTES","displayName":"Total committed heap usage (bytes)","value":120721408}]},{"name":"org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter","displayName":"File Input Format Counters ","counts":[{"name":"BYTES_READ","displayName":"Bytes Read","value":85}]}]},"clockSplits":[3941,11,12,11,11,12,11,11,12,11,11,12],"cpuUsages":[40,41,41,41,41,41,40,41,41,41,41,41],"vMemKbytes":[27111,81334,135557,189780,244003,298226,352449,406672,460895,515118,569341,623564],"physMemKbytes":[8809,26428,44047,61666,79285,96904,114523,132142,149761,167380,184999,202618]}}} {"type":"TASK_FINISHED","event":{"org.apache.hadoop.mapreduce.jobhistory.TaskFinished":{"taskid":"task_1477534790849_0003_m_000000","taskType":"MAP","finishTime":1477538656933,"status":"SUCCEEDED","counters":{"name":"COUNTERS","groups":[{"name":"org.apache.hadoop.mapreduce.FileSystemCounter","displayName":"File System Counters","counts":[{"name":"FILE_BYTES_READ","displayName":"FILE: Number of bytes read","value":0},{"name":"FILE_BYTES_WRITTEN","displayName":"FILE: Number of bytes written","value":115420},{"name":"FILE_READ_OPS","displayName":"FILE: Number of read operations","value":0},{"name":"FILE_LARGE_READ_OPS","displayName":"FILE: Number of large read operations","value":0},{"name":"FILE_WRITE_OPS","displayName":"FILE: Number of write operations","value":0},{"name":"HDFS_BYTES_READ","displayName":"HDFS: Number of bytes read","value":194},{"name":"HDFS_BYTES_WRITTEN","displayName":"HDFS: Number of bytes written","value":0},{"name":"HDFS_READ_OPS","displayName":"HDFS: Number of read operations","value":3},{"name":"HDFS_LARGE_READ_OPS","displayName":"HDFS: Number of large read operations","value":0},{"name":"HDFS_WRITE_OPS","displayName":"HDFS: Number of write operations","value":0}]},{"name":"org.apache.hadoop.mapreduce.TaskCounter","displayName":"Map-Reduce Framework","counts":[{"name":"MAP_INPUT_RECORDS","displayName":"Map input records","value":4},{"name":"MAP_OUTPUT_RECORDS","displayName":"Map output records","value":14},{"name":"MAP_OUTPUT_BYTES","displayName":"Map output bytes","value":141},{"name":"MAP_OUTPUT_MATERIALIZED_BYTES","displayName":"Map output materialized bytes","value":114},{"name":"SPLIT_RAW_BYTES","displayName":"Input split bytes","value":109},{"name":"COMBINE_INPUT_RECORDS","displayName":"Combine input records","value":14},{"name":"COMBINE_OUTPUT_RECORDS","displayName":"Combine output records","value":9},{"name":"SPILLED_RECORDS","displayName":"Spilled Records","value":9},{"name":"FAILED_SHUFFLE","displayName":"Failed Shuffles","value":0},{"name":"MERGED_MAP_OUTPUTS","displayName":"Merged Map outputs","value":0},{"name":"GC_TIME_MILLIS","displayName":"GC time elapsed (ms)","value":83},{"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":490},{"name":"PHYSICAL_MEMORY_BYTES","displayName":"Physical memory (bytes) snapshot","value":216502272},{"name":"VIRTUAL_MEMORY_BYTES","displayName":"Virtual memory (bytes) snapshot","value":666292224},{"name":"COMMITTED_HEAP_BYTES","displayName":"Total committed heap usage (bytes)","value":120721408}]},{"name":"org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter","displayName":"File Input Format Counters ","counts":[{"name":"BYTES_READ","displayName":"Bytes Read","value":85}]}]},"successfulAttemptId":{"string":"attempt_1477534790849_0003_m_000000_0"}}}} {"type":"REDUCE_ATTEMPT_STARTED","event":{"org.apache.hadoop.mapreduce.jobhistory.TaskAttemptStarted":{"taskid":"task_1477534790849_0003_r_000000","taskType":"REDUCE","attemptId":"attempt_1477534790849_0003_r_000000_0","startTime":1477538659565,"trackerName":"hadoop2","httpPort":8042,"shufflePort":13562,"containerId":"container_1477534790849_0003_01_000003","locality":{"string":"OFF_SWITCH"},"avataar":{"string":"VIRGIN"}}}} {"type":"REDUCE_ATTEMPT_FINISHED","event":{"org.apache.hadoop.mapreduce.jobhistory.ReduceAttemptFinished":{"taskid":"task_1477534790849_0003_r_000000","attemptId":"attempt_1477534790849_0003_r_000000_0","taskType":"REDUCE","taskStatus":"SUCCEEDED","shuffleFinishTime":1477538662152,"sortFinishTime":1477538662173,"finishTime":1477538662652,"hostname":"hadoop2","port":34358,"rackname":"/default-rack","state":"reduce > reduce","counters":{"name":"COUNTERS","groups":[{"name":"org.apache.hadoop.mapreduce.FileSystemCounter","displayName":"File System Counters","counts":[{"name":"FILE_BYTES_READ","displayName":"FILE: Number of bytes read","value":114},{"name":"FILE_BYTES_WRITTEN","displayName":"FILE: Number of bytes written","value":115365},{"name":"FILE_READ_OPS","displayName":"FILE: Number of read operations","value":0},{"name":"FILE_LARGE_READ_OPS","displayName":"FILE: Number of large read operations","value":0},{"name":"FILE_WRITE_OPS","displayName":"FILE: Number of write operations","value":0},{"name":"HDFS_BYTES_READ","displayName":"HDFS: Number of bytes read","value":0},{"name":"HDFS_BYTES_WRITTEN","displayName":"HDFS: Number of bytes written","value":72},{"name":"HDFS_READ_OPS","displayName":"HDFS: Number of read operations","value":3},{"name":"HDFS_LARGE_READ_OPS","displayName":"HDFS: Number of large read operations","value":0},{"name":"HDFS_WRITE_OPS","displayName":"HDFS: Number of write operations","value":2}]},{"name":"org.apache.hadoop.mapreduce.TaskCounter","displayName":"Map-Reduce Framework","counts":[{"name":"COMBINE_INPUT_RECORDS","displayName":"Combine input records","value":0},{"name":"COMBINE_OUTPUT_RECORDS","displayName":"Combine output records","value":0},{"name":"REDUCE_INPUT_GROUPS","displayName":"Reduce input groups","value":9},{"name":"REDUCE_SHUFFLE_BYTES","displayName":"Reduce shuffle bytes","value":114},{"name":"REDUCE_INPUT_RECORDS","displayName":"Reduce input records","value":9},{"name":"REDUCE_OUTPUT_RECORDS","displayName":"Reduce output records","value":9},{"name":"SPILLED_RECORDS","displayName":"Spilled Records","value":9},{"name":"SHUFFLED_MAPS","displayName":"Shuffled Maps ","value":1},{"name":"FAILED_SHUFFLE","displayName":"Failed Shuffles","value":0},{"name":"MERGED_MAP_OUTPUTS","displayName":"Merged Map outputs","value":1},{"name":"GC_TIME_MILLIS","displayName":"GC time elapsed (ms)","value":60},{"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":820},{"name":"PHYSICAL_MEMORY_BYTES","displayName":"Physical memory (bytes) snapshot","value":120524800},{"name":"VIRTUAL_MEMORY_BYTES","displayName":"Virtual memory (bytes) snapshot","value":672628736},{"name":"COMMITTED_HEAP_BYTES","displayName":"Total committed heap usage (bytes)","value":15728640}]},{"name":"Shuffle Errors","displayName":"Shuffle Errors","counts":[{"name":"BAD_ID","displayName":"BAD_ID","value":0},{"name":"CONNECTION","displayName":"CONNECTION","value":0},{"name":"IO_ERROR","displayName":"IO_ERROR","value":0},{"name":"WRONG_LENGTH","displayName":"WRONG_LENGTH","value":0},{"name":"WRONG_MAP","displayName":"WRONG_MAP","value":0},{"name":"WRONG_REDUCE","displayName":"WRONG_REDUCE","value":0}]},{"name":"org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter","displayName":"File Output Format Counters ","counts":[{"name":"BYTES_WRITTEN","displayName":"Bytes Written","value":72}]}]},"clockSplits":[2689,35,35,35,35,35,35,35,35,35,35,36],"cpuUsages":[68,68,69,68,68,69,68,68,69,68,68,69],"vMemKbytes":[27369,82107,136846,191584,246323,301062,355801,410539,465278,520017,574755,629494],"physMemKbytes":[4904,14712,24520,34328,44137,53945,63754,73561,83370,93179,102986,112795]}}} {"type":"TASK_FINISHED","event":{"org.apache.hadoop.mapreduce.jobhistory.TaskFinished":{"taskid":"task_1477534790849_0003_r_000000","taskType":"REDUCE","finishTime":1477538662652,"status":"SUCCEEDED","counters":{"name":"COUNTERS","groups":[{"name":"org.apache.hadoop.mapreduce.FileSystemCounter","displayName":"File System Counters","counts":[{"name":"FILE_BYTES_READ","displayName":"FILE: Number of bytes read","value":114},{"name":"FILE_BYTES_WRITTEN","displayName":"FILE: Number of bytes written","value":115365},{"name":"FILE_READ_OPS","displayName":"FILE: Number of read operations","value":0},{"name":"FILE_LARGE_READ_OPS","displayName":"FILE: Number of large read operations","value":0},{"name":"FILE_WRITE_OPS","displayName":"FILE: Number of write operations","value":0},{"name":"HDFS_BYTES_READ","displayName":"HDFS: Number of bytes read","value":0},{"name":"HDFS_BYTES_WRITTEN","displayName":"HDFS: Number of bytes written","value":72},{"name":"HDFS_READ_OPS","displayName":"HDFS: Number of read operations","value":3},{"name":"HDFS_LARGE_READ_OPS","displayName":"HDFS: Number of large read operations","value":0},{"name":"HDFS_WRITE_OPS","displayName":"HDFS: Number of write operations","value":2}]},{"name":"org.apache.hadoop.mapreduce.TaskCounter","displayName":"Map-Reduce Framework","counts":[{"name":"COMBINE_INPUT_RECORDS","displayName":"Combine input records","value":0},{"name":"COMBINE_OUTPUT_RECORDS","displayName":"Combine output records","value":0},{"name":"REDUCE_INPUT_GROUPS","displayName":"Reduce input groups","value":9},{"name":"REDUCE_SHUFFLE_BYTES","displayName":"Reduce shuffle bytes","value":114},{"name":"REDUCE_INPUT_RECORDS","displayName":"Reduce input records","value":9},{"name":"REDUCE_OUTPUT_RECORDS","displayName":"Reduce output records","value":9},{"name":"SPILLED_RECORDS","displayName":"Spilled Records","value":9},{"name":"SHUFFLED_MAPS","displayName":"Shuffled Maps ","value":1},{"name":"FAILED_SHUFFLE","displayName":"Failed Shuffles","value":0},{"name":"MERGED_MAP_OUTPUTS","displayName":"Merged Map outputs","value":1},{"name":"GC_TIME_MILLIS","displayName":"GC time elapsed (ms)","value":60},{"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":820},{"name":"PHYSICAL_MEMORY_BYTES","displayName":"Physical memory (bytes) snapshot","value":120524800},{"name":"VIRTUAL_MEMORY_BYTES","displayName":"Virtual memory (bytes) snapshot","value":672628736},{"name":"COMMITTED_HEAP_BYTES","displayName":"Total committed heap usage (bytes)","value":15728640}]},{"name":"Shuffle Errors","displayName":"Shuffle Errors","counts":[{"name":"BAD_ID","displayName":"BAD_ID","value":0},{"name":"CONNECTION","displayName":"CONNECTION","value":0},{"name":"IO_ERROR","displayName":"IO_ERROR","value":0},{"name":"WRONG_LENGTH","displayName":"WRONG_LENGTH","value":0},{"name":"WRONG_MAP","displayName":"WRONG_MAP","value":0},{"name":"WRONG_REDUCE","displayName":"WRONG_REDUCE","value":0}]},{"name":"org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter","displayName":"File Output Format Counters ","counts":[{"name":"BYTES_WRITTEN","displayName":"Bytes Written","value":72}]}]},"successfulAttemptId":{"string":"attempt_1477534790849_0003_r_000000_0"}}}} {"type":"JOB_FINISHED","event":{"org.apache.hadoop.mapreduce.jobhistory.JobFinished":{"jobid":"job_1477534790849_0003","finishTime":1477538662701,"finishedMaps":1,"finishedReduces":1,"failedMaps":0,"failedReduces":0,"totalCounters":{"name":"TOTAL_COUNTERS","groups":[{"name":"org.apache.hadoop.mapreduce.FileSystemCounter","displayName":"File System Counters","counts":[{"name":"FILE_BYTES_READ","displayName":"FILE: Number of bytes read","value":114},{"name":"FILE_BYTES_WRITTEN","displayName":"FILE: Number of bytes written","value":230785},{"name":"FILE_READ_OPS","displayName":"FILE: Number of read operations","value":0},{"name":"FILE_LARGE_READ_OPS","displayName":"FILE: Number of large read operations","value":0},{"name":"FILE_WRITE_OPS","displayName":"FILE: Number of write operations","value":0},{"name":"HDFS_BYTES_READ","displayName":"HDFS: Number of bytes read","value":194},{"name":"HDFS_BYTES_WRITTEN","displayName":"HDFS: Number of bytes written","value":72},{"name":"HDFS_READ_OPS","displayName":"HDFS: Number of read operations","value":6},{"name":"HDFS_LARGE_READ_OPS","displayName":"HDFS: Number of large read operations","value":0},{"name":"HDFS_WRITE_OPS","displayName":"HDFS: Number of write operations","value":2}]},{"name":"org.apache.hadoop.mapreduce.JobCounter","displayName":"Job Counters ","counts":[{"name":"TOTAL_LAUNCHED_MAPS","displayName":"Launched map tasks","value":1},{"name":"TOTAL_LAUNCHED_REDUCES","displayName":"Launched reduce tasks","value":1},{"name":"DATA_LOCAL_MAPS","displayName":"Data-local map tasks","value":1},{"name":"SLOTS_MILLIS_MAPS","displayName":"Total time spent by all maps in occupied slots (ms)","value":4100},{"name":"SLOTS_MILLIS_REDUCES","displayName":"Total time spent by all reduces in occupied slots (ms)","value":3087},{"name":"MILLIS_MAPS","displayName":"Total time spent by all map tasks (ms)","value":4100},{"name":"MILLIS_REDUCES","displayName":"Total time spent by all reduce tasks (ms)","value":3087},{"name":"VCORES_MILLIS_MAPS","displayName":"Total vcore-seconds taken by all map tasks","value":4100},{"name":"VCORES_MILLIS_REDUCES","displayName":"Total vcore-seconds taken by all reduce tasks","value":3087},{"name":"MB_MILLIS_MAPS","displayName":"Total megabyte-seconds taken by all map tasks","value":4198400},{"name":"MB_MILLIS_REDUCES","displayName":"Total megabyte-seconds taken by all reduce tasks","value":3161088}]},{"name":"org.apache.hadoop.mapreduce.TaskCounter","displayName":"Map-Reduce Framework","counts":[{"name":"MAP_INPUT_RECORDS","displayName":"Map input records","value":4},{"name":"MAP_OUTPUT_RECORDS","displayName":"Map output records","value":14},{"name":"MAP_OUTPUT_BYTES","displayName":"Map output bytes","value":141},{"name":"MAP_OUTPUT_MATERIALIZED_BYTES","displayName":"Map output materialized bytes","value":114},{"name":"SPLIT_RAW_BYTES","displayName":"Input split bytes","value":109},{"name":"COMBINE_INPUT_RECORDS","displayName":"Combine input records","value":14},{"name":"COMBINE_OUTPUT_RECORDS","displayName":"Combine output records","value":9},{"name":"REDUCE_INPUT_GROUPS","displayName":"Reduce input groups","value":9},{"name":"REDUCE_SHUFFLE_BYTES","displayName":"Reduce shuffle bytes","value":114},{"name":"REDUCE_INPUT_RECORDS","displayName":"Reduce input records","value":9},{"name":"REDUCE_OUTPUT_RECORDS","displayName":"Reduce output records","value":9},{"name":"SPILLED_RECORDS","displayName":"Spilled Records","value":18},{"name":"SHUFFLED_MAPS","displayName":"Shuffled Maps ","value":1},{"name":"FAILED_SHUFFLE","displayName":"Failed Shuffles","value":0},{"name":"MERGED_MAP_OUTPUTS","displayName":"Merged Map outputs","value":1},{"name":"GC_TIME_MILLIS","displayName":"GC time elapsed (ms)","value":143},{"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":1310},{"name":"PHYSICAL_MEMORY_BYTES","displayName":"Physical memory (bytes) snapshot","value":337027072},{"name":"VIRTUAL_MEMORY_BYTES","displayName":"Virtual memory (bytes) snapshot","value":1338920960},{"name":"COMMITTED_HEAP_BYTES","displayName":"Total committed heap usage (bytes)","value":136450048}]},{"name":"Shuffle Errors","displayName":"Shuffle Errors","counts":[{"name":"BAD_ID","displayName":"BAD_ID","value":0},{"name":"CONNECTION","displayName":"CONNECTION","value":0},{"name":"IO_ERROR","displayName":"IO_ERROR","value":0},{"name":"WRONG_LENGTH","displayName":"WRONG_LENGTH","value":0},{"name":"WRONG_MAP","displayName":"WRONG_MAP","value":0},{"name":"WRONG_REDUCE","displayName":"WRONG_REDUCE","value":0}]},{"name":"org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter","displayName":"File Input Format Counters ","counts":[{"name":"BYTES_READ","displayName":"Bytes Read","value":85}]},{"name":"org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter","displayName":"File Output Format Counters ","counts":[{"name":"BYTES_WRITTEN","displayName":"Bytes Written","value":72}]}]},"mapCounters":{"name":"MAP_COUNTERS","groups":[{"name":"org.apache.hadoop.mapreduce.FileSystemCounter","displayName":"File System Counters","counts":[{"name":"FILE_BYTES_READ","displayName":"FILE: Number of bytes read","value":0},{"name":"FILE_BYTES_WRITTEN","displayName":"FILE: Number of bytes written","value":115420},{"name":"FILE_READ_OPS","displayName":"FILE: Number of read operations","value":0},{"name":"FILE_LARGE_READ_OPS","displayName":"FILE: Number of large read operations","value":0},{"name":"FILE_WRITE_OPS","displayName":"FILE: Number of write operations","value":0},{"name":"HDFS_BYTES_READ","displayName":"HDFS: Number of bytes read","value":194},{"name":"HDFS_BYTES_WRITTEN","displayName":"HDFS: Number of bytes written","value":0},{"name":"HDFS_READ_OPS","displayName":"HDFS: Number of read operations","value":3},{"name":"HDFS_LARGE_READ_OPS","displayName":"HDFS: Number of large read operations","value":0},{"name":"HDFS_WRITE_OPS","displayName":"HDFS: Number of write operations","value":0}]},{"name":"org.apache.hadoop.mapreduce.TaskCounter","displayName":"Map-Reduce Framework","counts":[{"name":"MAP_INPUT_RECORDS","displayName":"Map input records","value":4},{"name":"MAP_OUTPUT_RECORDS","displayName":"Map output records","value":14},{"name":"MAP_OUTPUT_BYTES","displayName":"Map output bytes","value":141},{"name":"MAP_OUTPUT_MATERIALIZED_BYTES","displayName":"Map output materialized bytes","value":114},{"name":"SPLIT_RAW_BYTES","displayName":"Input split bytes","value":109},{"name":"COMBINE_INPUT_RECORDS","displayName":"Combine input records","value":14},{"name":"COMBINE_OUTPUT_RECORDS","displayName":"Combine output records","value":9},{"name":"SPILLED_RECORDS","displayName":"Spilled Records","value":9},{"name":"FAILED_SHUFFLE","displayName":"Failed Shuffles","value":0},{"name":"MERGED_MAP_OUTPUTS","displayName":"Merged Map outputs","value":0},{"name":"GC_TIME_MILLIS","displayName":"GC time elapsed (ms)","value":83},{"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":490},{"name":"PHYSICAL_MEMORY_BYTES","displayName":"Physical memory (bytes) snapshot","value":216502272},{"name":"VIRTUAL_MEMORY_BYTES","displayName":"Virtual memory (bytes) snapshot","value":666292224},{"name":"COMMITTED_HEAP_BYTES","displayName":"Total committed heap usage (bytes)","value":120721408}]},{"name":"org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter","displayName":"File Input Format Counters ","counts":[{"name":"BYTES_READ","displayName":"Bytes Read","value":85}]}]},"reduceCounters":{"name":"REDUCE_COUNTERS","groups":[{"name":"org.apache.hadoop.mapreduce.FileSystemCounter","displayName":"File System Counters","counts":[{"name":"FILE_BYTES_READ","displayName":"FILE: Number of bytes read","value":114},{"name":"FILE_BYTES_WRITTEN","displayName":"FILE: Number of bytes written","value":115365},{"name":"FILE_READ_OPS","displayName":"FILE: Number of read operations","value":0},{"name":"FILE_LARGE_READ_OPS","displayName":"FILE: Number of large read operations","value":0},{"name":"FILE_WRITE_OPS","displayName":"FILE: Number of write operations","value":0},{"name":"HDFS_BYTES_READ","displayName":"HDFS: Number of bytes read","value":0},{"name":"HDFS_BYTES_WRITTEN","displayName":"HDFS: Number of bytes written","value":72},{"name":"HDFS_READ_OPS","displayName":"HDFS: Number of read operations","value":3},{"name":"HDFS_LARGE_READ_OPS","displayName":"HDFS: Number of large read operations","value":0},{"name":"HDFS_WRITE_OPS","displayName":"HDFS: Number of write operations","value":2}]},{"name":"org.apache.hadoop.mapreduce.TaskCounter","displayName":"Map-Reduce Framework","counts":[{"name":"COMBINE_INPUT_RECORDS","displayName":"Combine input records","value":0},{"name":"COMBINE_OUTPUT_RECORDS","displayName":"Combine output records","value":0},{"name":"REDUCE_INPUT_GROUPS","displayName":"Reduce input groups","value":9},{"name":"REDUCE_SHUFFLE_BYTES","displayName":"Reduce shuffle bytes","value":114},{"name":"REDUCE_INPUT_RECORDS","displayName":"Reduce input records","value":9},{"name":"REDUCE_OUTPUT_RECORDS","displayName":"Reduce output records","value":9},{"name":"SPILLED_RECORDS","displayName":"Spilled Records","value":9},{"name":"SHUFFLED_MAPS","displayName":"Shuffled Maps ","value":1},{"name":"FAILED_SHUFFLE","displayName":"Failed Shuffles","value":0},{"name":"MERGED_MAP_OUTPUTS","displayName":"Merged Map outputs","value":1},{"name":"GC_TIME_MILLIS","displayName":"GC time elapsed (ms)","value":60},{"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":820},{"name":"PHYSICAL_MEMORY_BYTES","displayName":"Physical memory (bytes) snapshot","value":120524800},{"name":"VIRTUAL_MEMORY_BYTES","displayName":"Virtual memory (bytes) snapshot","value":672628736},{"name":"COMMITTED_HEAP_BYTES","displayName":"Total committed heap usage (bytes)","value":15728640}]},{"name":"Shuffle Errors","displayName":"Shuffle Errors","counts":[{"name":"BAD_ID","displayName":"BAD_ID","value":0},{"name":"CONNECTION","displayName":"CONNECTION","value":0},{"name":"IO_ERROR","displayName":"IO_ERROR","value":0},{"name":"WRONG_LENGTH","displayName":"WRONG_LENGTH","value":0},{"name":"WRONG_MAP","displayName":"WRONG_MAP","value":0},{"name":"WRONG_REDUCE","displayName":"WRONG_REDUCE","value":0}]},{"name":"org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter","displayName":"File Output Format Counters ","counts":[{"name":"BYTES_WRITTEN","displayName":"Bytes Written","value":72}]}]}}}}
1
2
3
4
5
6
7
8
#### 任务运行日志(Container日志)
任务运行日志(即:Container日志)包含ApplicationMaster日志和普通Task日志等信息。默认情况下,这些日志信息是存放在${HADOOP_HOME}/logs/userlogs目录下
普通Task就是我们的MapReduce实例程序,这类日志就是我们在MapReduce程序中输出的log日志和System.out打印到控制台的输出,例如:logger.info("test");和System.out.println("test");。这部分日志默认是存储在${HADOOP_HOME}/logs/userlogs
##### yarn-site.xml
<property> <name>yarn.nodemanager.log-dirs</name> <value>${yarn.log.dir}/userlogs</value> </property>
1
2
对于Container日志,在Hadoop 2.x版本里面,Task是按照application->Container的层次来管理的,所以在NameNode机器上运行MapReduce程序的时候,在Console里面看到的log都可以通过在相应的DataNode/NodeManager里面的${HADOOP_HOME}/logs/userlogs下面找到。
# 查看默认日志存储路径下的文件(一个MapReduce程序对应一个Application,这里的Application会在Console提示出哪一个Application执行我们的MapReduce程序) $ ls ${HADOOP_HOME}/logs/userlogs drwx--x--- 5 yunyu yunyu 4096 Nov 2 20:13 application_1478088725123_0001/ drwx--x--- 5 yunyu yunyu 4096 Nov 2 20:31 application_1478088725123_0002/ drwx--x--- 5 yunyu yunyu 4096 Nov 2 21:05 application_1478088725123_0003/ drwx--x--- 3 yunyu yunyu 4096 Nov 2 21:08 application_1478088725123_0004/ drwx--x--- 3 yunyu yunyu 4096 Nov 2 21:34 application_1478088725123_0006/ drwx--x--- 4 yunyu yunyu 4096 Nov 2 23:49 application_1478101603149_0001/ drwx--x--- 3 yunyu yunyu 4096 Nov 3 00:04 application_1478101603149_0002/ drwx--x--- 3 yunyu yunyu 4096 Nov 3 11:03 application_1478138258749_0001/ # 查看某个Application下的Container日志(如果是Hadoop分布式集群,Container日志可能会分布在多个DataNode机器中) $ ll application_1478088725123_0003/ total 20 drwx--x--- 5 yunyu yunyu 4096 Nov 2 21:05 ./ drwxr-xr-x 33 yunyu yunyu 4096 Nov 3 14:57 ../ drwx--x--- 2 yunyu yunyu 4096 Nov 2 21:04 container_1478088725123_0003_01_000001/ drwx--x--- 2 yunyu yunyu 4096 Nov 2 21:05 container_1478088725123_0003_01_000002/ drwx--x--- 2 yunyu yunyu 4096 Nov 2 21:05 container_1478088725123_0003_01_000003/ # 查看container下的日志文件,都有三个日志文件 # stderr : 错误文件输出 # stdout : System.out.println控制台输出,我们自己写的MapReduce程序的System.out.println输出都将写入到此文件中 # syslog : logger系统日志输出,我们自己的MapReduce程序的logger.info日志记录都将写入到此文件中 $ ll application_1478088725123_0003/container_1478088725123_0003_01_000001/* -rw-rw-r-- 1 yunyu yunyu 760 Nov 2 21:05 application_1478088725123_0003/container_1478088725123_0003_01_000001/stderr -rw-rw-r-- 1 yunyu yunyu 0 Nov 2 21:04 application_1478088725123_0003/container_1478088725123_0003_01_000001/stdout -rw-rw-r-- 1 yunyu yunyu 34718 Nov 2 21:05 application_1478088725123_0003/container_1478088725123_0003_01_000001/syslog
1
2
如果是Hadoop分布式集群,Container日志文件有可能会被分配到多个机器中
# Hadoop1机器中运行了一部分的Application分配的任务,任务日志在container_1478101603149_0002_01_000001中 $ ll application_1478101603149_0002/ total 12 drwx--x--- 3 yunyu yunyu 4096 Nov 3 00:04 ./ drwxr-xr-x 33 yunyu yunyu 4096 Nov 3 15:03 ../ drwx--x--- 2 yunyu yunyu 4096 Nov 3 00:04 container_1478101603149_0002_01_000001/ # Hadoop2机器中运行了一部分的Application分配的任务,任务日志在container_1478101603149_0002_01_000002和container_1478101603149_0002_01_000003中 $ ll application_1478101603149_0002/ total 16 drwx--x--- 4 yunyu yunyu 4096 Nov 3 00:04 ./ drwxr-xr-x 36 yunyu yunyu 4096 Nov 3 15:03 ../ drwx--x--- 2 yunyu yunyu 4096 Nov 3 00:04 container_1478101603149_0002_01_000002/ drwx--x--- 2 yunyu yunyu 4096 Nov 3 00:04 container_1478101603149_0002_01_000003/
1
2
3
4
5
6
### 日志聚合
将作业和任务日志存放在各个节点上不便于统一管理和分析,为此,我们可以启用日志聚集功能。打开该功能后,各个任务运行完成后,会将生成的日志推送到HDFS的一个目录下(之前的并不会立即删除,在HDFS上,每个任务产生的三个文件,即syslog、stderr和stdout将合并一个文件,并通过索引记录各自位置)。这样我们可以使用HistoryServer统一查看Hadoop相关日志。
这里需要在yarn-site.xml配置文件中添加如下配置
<property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>
1
2
如果不添加此配置我们在HistoryServer中直接查看logs会报错如下
Aggregation is not enabled. Try the nodemanager at hadoop1:39175
1
2
3
4
#### 日志聚集相关配置参数
日志聚集是YARN提供的日志中央化管理功能,它能将运行完成的Container/任务日志上传到HDFS上,从而减轻NodeManager负载,且提供一个中央化存储和分析机制。默认情况下,Container/任务日志存在在各个NodeManager上,如果启用日志聚集功能需要额外的配置。
(1) yarn.log-aggregation-enable 参数解释:是否启用日志聚集功能。 默认值:false (2) yarn.log-aggregation.retain-seconds 参数解释:在HDFS上聚集的日志最多保存多长时间。 默认值:-1 (3) yarn.log-aggregation.retain-check-interval-seconds 参数解释:多长时间检查一次日志,并将满足条件的删除,如果是0或者负数,则为上一个值的1/10。 默认值:-1 (4) yarn.nodemanager.remote-app-log-dir 参数解释:当应用程序运行结束后,日志被转移到的HDFS目录(启用日志聚集功能时有效)。 默认值:/tmp/logs (5) yarn.log-aggregation.retain-seconds 参数解释:远程日志目录子目录名称(启用日志聚集功能时有效)。 默认值:日志将被转移到目录 ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam}下
1
2
我这里犯了一个比较低级的错误,导致我的日志结果一直和我预期的不同,这也让我困惑了许久。问题的现象是我只在Hadoop1的机器中启用了日志聚集功能(yarn.log-aggregation-enable为true),但是我在跑自己的MapReduce任务时还是看不到自己代码中的System.out.println和logger.info日志输出到Container日志文件中,而且发现Container日志文件会少,正常情况下会有container_XXX_00001,container_XXX_00002,container_XXX_00003三个容器运行日志,但是有的时候我发现只有container_XXX_00001和container_XXX_00003没有container_XXX_00002。后来我多次重新跑了MapReduce程序,发现container日志会被正常生成出来,但是MapReduce运行完很快就被删掉了。这样我困惑了很久,后来发现原来是我自己粗心导致了,日志聚集功能只在Hadoop1机器中配置了,而其他两台Hadoop机器没有配置,所以导致只有Hadoop1的Container日志文件被聚集到HDFS上,而且Hadoop1的Container日志聚集到HDFS之后,会将Hadoop1系统中的log文件删除,所以就会少了container_XXX_00002的日志文件。
# 可以在HDFS中查看到stderr, stdout, syslog聚集的日志文件 $ hdfs dfs -ls /tmp/logs/yunyu/logs/application_1478159146498_0001 Found 1 items -rw-r----- 2 yunyu supergroup 52122 2016-11-03 00:47 /tmp/logs/yunyu/logs/application_1478159146498_0001/hadoop1_35650

总结

这里我简单说一下我研究的结论

  • Q: 在Hadoop中如何看到MapReduce中的System.out.println输出
  • A: 在Container日志目录下的stdout日志文件中,默认是在${HADOOP_HOME}/logs/userlogs/application_XXXX/container_XXX_0000X/stdout

  • Q: 在Hadoop中如何看到Log4j或者其他日志组件的日志输出

  • A: 在Container日志目录下的stdout日志文件中,默认是在${HADOOP_HOME}/logs/userlogs/application_XXXX/container_XXX_0000X/syslog

  • Q: 日志文件保存在哪里

  • A: 默认保存在${HADOOP_HOME}/logs/userlogs/目录下

  • Q: 如何通过HistoryServer查看Hadoop日志

  • A: 需要在yarn-site.xml配置文件中配置yarn.log-aggregation-enable属性为true

参考文章:

Docker实战(二十)Docker镜像的导入导出

Docker镜像导入导出方式

最近公司需要做Docker私有化部署,需要将本地安装好的Docker容器部署到客户的环境,这里遇到了一些问题客户的服务器不能连接外网,无法在线做Docker镜像的构建,所以需要只能通过导入导出镜像的方式来做。下面是我总结的Docker镜像导入导出方式。

Docker提供了两种方式的导入导出:

  • load/save方式导入导出镜像
    • docker save:来导出本地镜像库中指定的镜像存储成文件
    • docker load:来导入镜像存储文件到本地镜像库
  • import/export方式导入导出容器
    • docker export:来导出一个容器快照到本地文件
    • docker import:来导入一个容器快照文件到本地镜像库
  • 区别:容器快照文件将丢弃所有的历史记录和元数据信息(即仅保存容器当时的快照状态),而镜像存储文件将保存完整记录,体积也要大。此外,从容器快照文件导入时可以重新指定标签等元数据信息。我个人比较推荐用load/save方式,这样所有之前的镜像都会存在,只是比较占用空间。

Docker镜像save/load方式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ sudo docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE birdben/zookeeper v1 20e4011b9286 2 minutes ago 1.658 GB ubuntu latest 37b164bb431e 7 days ago 126.6 MB centos 7 d83a55af4e75 5 weeks ago 196.7 MB centos latest d83a55af4e75 5 weeks ago 196.7 MB birdben/jdk7 v1 25c2f0e69206 8 months ago 583.4 MB
# 导出birdben/zookeeper:v1镜像到zookeeper_image.tar文件
$ sudo docker save birdben/zookeeper:v1 > zookeeper_image.tar
# 删除之前的birdben/zookeeper:v1镜像
$ sudo docker rmi "birdben/zookeeper:v1"
# 导入zookeeper_image.tar镜像文件
$ sudo docker load < zookeeper_image.tar
# 再次查看所有的镜像,可以看到birdben/zookeeper:v1又回来了
# 注意:这里import回来的ImageID也和原来是一样的
$ sudo docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE birdben/zookeeper v1 20e4011b9286 6 minutes ago 1.658 GB ubuntu latest 37b164bb431e 7 days ago 126.6 MB centos 7 d83a55af4e75 5 weeks ago 196.7 MB centos latest d83a55af4e75 5 weeks ago 196.7 MB birdben/jdk7 v1 25c2f0e69206 8 months ago 583.4 MB
# 这时候在查看birdben/zookeeper:v1镜像的tree结构,发现之前所有的镜像历史都在
$ sudo docker images --tree
├─3690474eb5b4 Virtual Size: 0 B │ └─b200b2d33d98 Virtual Size: 196.7 MB │ └─3fbd5972aaac Virtual Size: 196.7 MB │ └─d83a55af4e75 Virtual Size: 196.7 MB Tags: centos:7, centos:latest │ └─1df8e9ff4de7 Virtual Size: 196.7 MB │ └─b37af9ce019a Virtual Size: 196.7 MB │ └─7858b8d134c6 Virtual Size: 403.3 MB │ └─c872974343d2 Virtual Size: 403.3 MB │ └─d4c0e59dc712 Virtual Size: 403.3 MB │ └─30c3076be68f Virtual Size: 556.8 MB │ └─0e66c066e1de Virtual Size: 571 MB │ └─69f8ec0b7932 Virtual Size: 889.1 MB │ └─7cfcd6d4c6e7 Virtual Size: 911.4 MB │ └─c2bc26e11781 Virtual Size: 911.4 MB │ └─31d728531f9a Virtual Size: 911.4 MB │ └─6434457046ec Virtual Size: 911.4 MB │ └─651290e3ddef Virtual Size: 911.4 MB │ └─d99d028fca92 Virtual Size: 911.6 MB │ └─5d4d89731a7d Virtual Size: 911.6 MB │ └─a530df3b220c Virtual Size: 925.6 MB │ └─39381e27bf53 Virtual Size: 1.232 GB │ └─cda80cbe8e7f Virtual Size: 1.276 GB │ └─287a8cf1090c Virtual Size: 1.289 GB │ └─d5672fcec9a4 Virtual Size: 1.289 GB │ └─e63cb61422e1 Virtual Size: 1.289 GB │ └─aa8f6ecc78ca Virtual Size: 1.303 GB │ └─b44f1969877f Virtual Size: 1.303 GB │ └─d17184db904f Virtual Size: 1.303 GB │ └─7df628e7fd36 Virtual Size: 1.303 GB │ └─dfe01b409095 Virtual Size: 1.303 GB │ └─238718f45aa6 Virtual Size: 1.303 GB │ └─a678149d4c34 Virtual Size: 1.303 GB │ └─d3f7fb8e3bc2 Virtual Size: 1.658 GB │ └─ff152402a43c Virtual Size: 1.658 GB │ └─fdf82aa49b89 Virtual Size: 1.658 GB │ └─0dbfccd66315 Virtual Size: 1.658 GB │ └─4cae49ef2ecb Virtual Size: 1.658 GB Tags: birdben/zookeeper:v1

Docker镜像import/export方式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE birdben/zookeeper v1 20e4011b9286 11 seconds ago 1.658 GB ubuntu latest 37b164bb431e 7 days ago 126.6 MB centos 7 d83a55af4e75 5 weeks ago 196.7 MB centos latest d83a55af4e75 5 weeks ago 196.7 MB birdben/jdk7 v1 25c2f0e69206 8 months ago 583.4 MB
$ sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f99771de17b0 20e4011b9286:latest "/bin/bash" 6 seconds ago Up 5 seconds 0.0.0.0:3306->3306/tcp, 0.0.0.0:8080->8080/tcp birdben/zookeeper:v1
# 导出容器ID为f99771de17b0的Docker容器
$ sudo docker export f99771de17b0 > container.tar.gz
# 删除之前的镜像birdben/zookeeper:v1
$ sudo docker rmi "birdben/zookeeper:v1"
# 导入容器文件container.tar.gz
$ cat container.tar.gz | sudo docker import - birdben/zookeeper:v1
# 注意:这里import回来的ImageID也和原来不一样了
$ sudo docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE birdben/zookeeper v1 e80c1046dc12 9 minutes ago 1.119 GB ubuntu latest 37b164bb431e 7 days ago 126.6 MB centos 7 d83a55af4e75 5 weeks ago 196.7 MB centos latest d83a55af4e75 5 weeks ago 196.7 MB birdben/jdk7 v1 25c2f0e69206 8 months ago 583.4 MB
# 这时候在查看birdben/zookeeper:v1镜像的tree结构,发现只有最有的镜像,没有以前的历史镜像
$ sudo docker images --tree Warning: '--tree' is deprecated, it will be removed soon. See usage. ├─e80c1046dc12 Virtual Size: 1.119 GB Tags: birdben/zookeeper:v1 ├─f1b49dd0c243 Virtual Size: 126.6 MB │ └─008ecf8686ec Virtual Size: 126.6 MB │ └─fd74137ff5ae Virtual Size: 126.6 MB │ └─35371c8124e2 Virtual Size: 126.6 MB │ └─99dc4d8f603d Virtual Size: 126.6 MB │ └─37b164bb431e Virtual Size: 126.6 MB Tags: ubuntu:latest

参考文章:

Flume学习(八)Flume解析自定义日志

环境简介

  • JDK1.7.0_79
  • Flume1.6.0
  • Elasticsearch2.0.0

这里是基于上一篇《Flume学习(七)Flume整合Elasticsearch2.x》解析自定义的日志格式

解析的日志格式

这里由于篇幅原因,我简单列举了两条典型的日志格式

日志文件格式

1
2
[{"name":"rp.api.call","request":"GET /api/test/settings","status":"succeeded","uid":889,"did":13,"duid":"app001","ua":"Dalvik/2.1.0 (Linux; U; Android 6.0.1; MI NOTE LTE MIUI/6.5.12)","device_id":"65768768252343","ip":"10.190.1.67","server_timestamp":1463713488740}]
[{"name":"rp.api.call","request":"GET /api/test/search","errorStatus":200,"errorCode":"0000","errorMsg":"操作成功","status":"failed","uid":889,"did":13,"duid":"app002","ua":"Dalvik/2.1.0 (Linux; U; Android 6.0.1; MI NOTE LTE MIUI/6.5.12)","device_id":"4543657687878989","ip":"10.190.1.66","server_timestamp":1463650301701}]

上一篇已经讲过了Flume解析日志格式主要使用interceptors,interceptors本身又支持多种type,这里我们主要介绍regex_extractor,即正则表达式匹配方式。下面的正则表达式可以匹配上面的两种格式的日志,上面两种日志格式的主要区别就是errorStatus,errorCode,errorMsg这三个字段有可能不存在,当没有报错的时候,这三个字段是不需要的。

日志解析正则表达式

1
"name":(.*),"request":(.*),("errorStatus":(.*),)?("errorCode":(.*),)?("errorMsg":(.*),)?"status":(.*),"uid":(.*),"did":(.*),"duid":(.*),"ua":(.*),"device_id":(.*),"ip":(.*),"server_timestamp":([0-9]*)

但是在Flume的interceptors的regex表达式中配置上面的正则表达式会报错,我自己分析的原因是Flume的interceptor.serializers需要指定正则表达式拆分后的对应的字段和值,没有办法做到根据正则表达式动态处理。(这里我的分析可能不一定对,如果有疑问我们可以私下交流)

下面是我的解决办法,我根据上面两种日志格式写了两个interceptor,分别是es_interceptor和es_error_interceptor,每个interceptor对应不同的正则表达式,分别用来处理上面两种不同的日志格式。这样interceptor.serializers就能根据对应的正则表达式格式解析出来日志中对应的字段和值,再插入到ES索引中。

flume.conf配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# 原始的正则表达式:"name":(.*),"request":(.*),("errorStatus":(.*),)?("errorCode":(.*),)?("errorMsg":(.*),)?"status":(.*),"uid":(.*),"did":(.*),"duid":(.*),"ua":(.*),"device_id":(.*),"ip":(.*),"server_timestamp":([0-9]*)
agentX.sources.flume-avro-sink.interceptors = es_interceptor es_error_interceptor
agentX.sources.flume-avro-sink.interceptors.es_interceptor.type = regex_extractor
agentX.sources.flume-avro-sink.interceptors.es_interceptor.regex = "name":(.*),"request":(.*),"status":(.*),"uid":(.*),"did":(.*),"duid":(.*),"ua":(.*),"device_id":(.*),"ip":(.*),"server_timestamp":([0-9]*)
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s1.name = name
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s2.name = request
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s3.name = status
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s4.name = uid
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s5.name = did
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s6.name = duid
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s7.name = ua
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s8.name = device_id
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s9.name = ip
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s10.name = server_timestamp
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.type = regex_extractor
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.regex = "name":(.*),"request":(.*),"errorStatus":(.*),"errorCode":(.*),"errorMsg":(.*),"status":(.*),"uid":(.*),"did":(.*),"duid":(.*),"ua":(.*),"device_id":(.*),"ip":(.*),"server_timestamp":([0-9]*)
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s1.name = name
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s2.name = request
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s3.name = errorStatus
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s4.name = errorCode
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s5.name = errorMsg
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s6.name = status
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s7.name = uid
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s8.name = did
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s9.name = duid
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s10.name = ua
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s11.name = device_id
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s12.name = ip
agentX.sources.flume-avro-sink.interceptors.es_error_interceptor.serializers.s13.name = server_timestamp

ES的mapping如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
{
"mappings": {
"hb": {
"properties": {
"@fields": {
"properties": {
"uid": {
"type": "string"
},
"duid": {
"type": "string"
},
"status": {
"type": "string"
},
"request": {
"type": "string"
},
"name": {
"type": "string"
},
"errorCode": {
"type": "string"
},
"ua": {
"type": "string"
},
"did": {
"type": "string"
},
"errorMsg": {
"type": "string"
},
"device_id": {
"type": "string"
},
"server_timestamp": {
"type": "string"
},
"ip": {
"type": "string"
},
"errorStatus": {
"type": "string"
}
}
},
"@message": {
"type": "string"
}
}
}
}
}

ES索引中的日志信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
{
"_index": "test_log_index-2016-09-03",
"_type": "test",
"_id": "AVbvrWIPe8IcP1cQoXS2",
"_version": 1,
"_score": 1,
"_source": {
"@message": "[
{"name":"1","request":"GET /api/test/settings","status":"succeeded","uid":889,"did":13,"duid":"app001","ua":"Dalvik/2.1.0 (Linux; U; Android 6.0.1; MI NOTE LTE MIUI/6.5.12)","device_id":,"ip":"10.190.1.67","server_timestamp":1463713488740}
]",
"@fields": {
"uid": "889",
"duid": ""app001"",
"status": ""succeeded"",
"name": ""1"",
"request": ""GET /api/test/settings"",
"did": "13",
"ua": ""Dalvik/2.1.0 (Linux; U; Android 6.0.1; MI NOTE LTE MIUI/6.5.12)"",
"device_id": "",
"server_timestamp": "1463713488740",
"ip": ""10.190.1.67""
}
}
}
{
"_index": "test_log_index-2016-09-03",
"_type": "test",
"_id": "AVbvrWIPe8IcP1cQoXS3",
"_version": 1,
"_score": 1,
"_source": {
"@message": "[
{"name":"rp.api.call","request":"GET /api/test/search","errorStatus":200,"errorCode":"0000","errorMsg":"操作成功","status":"failed","uid":889,"did":13,"duid":"app001","ua":"Dalvik/2.1.0 (Linux; U; Android 6.0.1; MI NOTE LTE MIUI/6.5.12)","device_id":"4543657687878989","ip":"10.190.1.66","server_timestamp":1463650301701}
]",
"@fields": {
"uid": "889",
"status": ""failed"",
"did": "13",
"device_id": ""4543657687878989"",
"errorMsg": ""操作成功"",
"errorStatus": "200",
"ip": ""10.190.1.66"",
"duid": ""app001"",
"request": ""GET /api/test/search"",
"name": ""rp.api.call"",
"errorCode": ""0000"",
"ua": ""Dalvik/2.1.0 (Linux; U; Android 6.0.1; MI NOTE LTE MIUI/6.5.12)"",
"server_timestamp": "1463650301701"
}
}
}

总结

个人觉得这样的做法并不理想,因为日志格式肯定会多种多样,如果每种日志格式都需要不同的正则表达式来处理,显得太过笨重和冗余,因为刚接触Flume暂时没有发现有更好的做法,日后发现有更好的处理方式会重新更新上来。

Docker实战(十九)Docker环境安装问题

环境描述

本地环境

1
2
3
4
Ubuntu14.04
Client version: 1.6.2 Client API version: 1.18 Go version (client): go1.2.1 Git commit (client): 7c8fca2 OS/Arch (client): linux/amd64
Server version: 1.6.2 Server API version: 1.18 Go version (server): go1.2.1 Git commit (server): 7c8fca2 OS/Arch (server): linux/amd64

10.10.1.15测试环境

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Ubuntu15.04
Client:
Version: 1.10.3
API version: 1.21
Go version: go1.5.3
Git commit: 20f81dd
Built: Thu Mar 10 21:49:11 2016
OS/Arch: linux/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:16:54 UTC 2015
OS/Arch: linux/amd64

Docker的安装和使用

本地环境安装

直接使用apt方式安装

1
2
3
$ apt-get update
$ apt-get install docker
$ apt-get install docker.io

10.10.1.15测试环境

使用apt方式安装报错E: Sub-process /usr/bin/dpkg returned an error code (1),之后尝试了一些解决方式,但都没有解决成功,最后决定将docker卸载掉重新按照官网的步骤安装成功

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# 安装前先查看Linux内核版本,内核版本需要高于3.10
$ uname -r
3.19.0-68-generic
# Update your apt sources
$ sudo apt-get update
$ sudo apt-get install apt-transport-https ca-certificates
$ sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
# 创建并保存docker.list更新源文件
$ /etc/apt/sources.list.d/docker.list
# 根据自己的系统版本选择不同的数据源
- On Ubuntu Precise 12.04 (LTS)
deb https://apt.dockerproject.org/repo ubuntu-precise main
- On Ubuntu Trusty 14.04 (LTS)
deb https://apt.dockerproject.org/repo ubuntu-trusty main
- Ubuntu Wily 15.10
deb https://apt.dockerproject.org/repo ubuntu-wily main
- Ubuntu Xenial 16.04 (LTS)
deb https://apt.dockerproject.org/repo ubuntu-xenial main
# 再次更新源
$ sudo apt-get update
# 删除旧的资源文件
$ sudo apt-get purge lxc-docker
# 验证apt的更新源是从正确的仓库获取
$ apt-cache policy docker-engine
# 安装Ubuntu必备的安装包linux-image-extra-*
$ sudo apt-get update
$ sudo apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual
# 安装Docker
$ sudo apt-get update
$ sudo apt-get install docker-engine
# 启动Docker服务
$ sudo service docker start
# 检查Docker版本
$ sudo docker version

遇到的问题和解决方法

Depends: libdevmapper1.02.1 (>= 2:1.02.99) but 2:1.02.90-2ubuntu1 is to be installed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ sudo apt install docker-engine
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
docker-engine : Depends: libdevmapper1.02.1 (>= 2:1.02.99) but 2:1.02.90-2ubuntu1 is to be installed
E: Unable to correct problems, you have held broken packages.
# 遇到这个问题是因为更新源不正确的原因,因为我们用的Ubuntu15.04版本,所以上面官网提供的数据源中并不包含我们的版本
$ sudo lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 15.04
Release: 15.04
Codename: vivid
# 这里在github上找到了解决方法,依次执行下面的命令可以修复正确的数据源
$ sudo sed -i '/wily/d' /etc/apt/sources.list.d/docker.list
$ sudo sed -i '/trusty/d' /etc/apt/sources.list.d/docker.list
$ sudo sed -i '/precise/d' /etc/apt/sources.list.d/docker.list
$ sudo apt-get update
$ sudo apt-get install docker-engine

Error response from daemon: client is newer than server (client API version: 1.22, server API version: 1.21)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ docker version
Client:
Version: 1.10.3
API version: 1.22
Go version: go1.5.3
Git commit: 20f81dd
Built: Thu Mar 10 21:49:11 2016
OS/Arch: linux/amd64
Error response from daemon: client is newer than server (client API version: 1.22, server API version: 1.21)
# 遇到这个问题的原因是Docker API version的版本号不一致导致的,这个我们需要添加一个环境变量来指定Docker API version的版本号
# 这里建议更改/etc/profile文件,而不是临时更改环境变量,修改/etc/profile之后需要source /etc/profile,如果要在sudo也生效,需要切换到root账号也source /etc/profile
$ export DOCKER_API_VERSION=1.21

参考文章:

Zookeeper学习(二)Zookeeper集群环境搭建

ZooKeeper Cluster模式

/etc/hosts文件配置

1
2
3
172.17.0.51 zoo1
172.17.0.52 zoo2
172.17.0.53 zoo3

/var/zookeeper/myid文件配置

1
2
3
4
5
6
7
8
# zoo1的myid配置文件
1
# zoo2的myid配置文件
2
# zoo3的myid配置文件
3

Zookeeper配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
# tickTime 这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个tickTime时间就会发送一个心跳。
tickTime=2000
# the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
# dataDir 顾名思义就是Zookeeper保存数据的目录,默认情况下Zookeeper将写数据的日志文件也保存在这个目录里。
dataDir=/var/lib/zookeeper
# the port to listen for client connections
# clientPort 这个端口就是客户端(应用程序)连接Zookeeper服务器的端口,Zookeeper会监听这个端口接受客户端的访问请求。
clientPort=2181
# initLimit 这个配置项是用来配置Zookeeper接受客户端(这里所说的客户端不是用户连接Zookeeper 服务器的客户端,而是Zookeeper服务器集群中连接到Leader的Follower服务器)初始化连接时最长能忍受多少个心跳时间间隔数。当已经超过10个心跳的时间(也就是tickTime)长度后Zookeeper服务器还没有收到客户端的返回信息,那么表明这个客户端连接失败。总的时间长度就是 10*2000=20 秒。
initLimit=10
# syncLimit 这个配置项标识Leader与Follower之间发送消息,请求和应答时间长度,最长不能超过多少个tickTime的时间长度,总的时间长度就是 5*2000=10 秒。
syncLimit=5
# 第一个port是从机器(follower)连接到主机器(leader)的端口号,第二个port是进行leadership选举的端口号。
# 值得重点注意的一点是,所有三个机器都应该打开端口 2181、2888 和 3888。在本例中,端口 2181 由 ZooKeeper 客户端使用,用于连接到 ZooKeeper 服务器;端口 2888 由对等 ZooKeeper 服务器使用,用于互相通信;而端口 3888 用于领导者选举。您可以选择自己喜欢的任何端口。通常建议在所有 ZooKeeper 服务器上使用相同的端口。
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

启动Zookeeper服务端

1
2
3
4
5
6
7
8
9
10
11
12
13
# 分别启动Hadoop1,Hadoop2,Hadoop3的Zookeeper服务
$ ./bin/zkServer.sh start
ZooKeeper JMX enabled by default Using config: /software/zookeeper-3.4.8/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
# 检查Hadoop1的Zookeeper服务状态(这里Hadoop1节点的zk是leader,Hadoop2和Hadoop3节点的zk是follower)
$ ./bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /data/zookeeper-3.4.8/bin/../conf/zoo.cfg Mode: leader
# 检查Hadoop2的Zookeeper服务状态
$ ./bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /data/zookeeper-3.4.8/bin/../conf/zoo.cfg Mode: follower
# 检查Hadoop3的Zookeeper服务状态
$ ./bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /data/zookeeper-3.4.8/bin/../conf/zoo.cfg Mode: follower

需要注意的地方

1
2
3
JMX enabled by default
Using config: /root/zookeeper-3.4.6/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

确认下面两点,应该就能排查出问题,我就遇到过重启Docker容器IP
地址变化,导致/etc/hosts中的IP地址配置不正确

  • 确认/etc/hosts中是否有各个节点域名解析
  • 是否/var/zookeeper/myid有重复值
  • 集群模式只启动一台也会遇到该问题,最好等把其他集群的机器启动好在查看状态

启动Zookeeper Client端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# -server:client端连接的IP和端口号
$ ./bin/zkCli.sh -server 127.0.0.1:2181 Connecting to 127.0.0.1:2181 2016-09-30 01:51:22,268 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.8--1, built on 02/06/2016 03:18 GMT 2016-09-30 01:51:22,271 [myid:] - INFO [main:Environment@100] - Client environment:host.name=hadoop1 2016-09-30 01:51:22,271 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_79 2016-09-30 01:51:22,272 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2016-09-30 01:51:22,272 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/data/jdk1.7.0_79/jre 2016-09-30 01:51:22,272 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/data/zookeeper-3.4.8/bin/../build/classes:/data/zookeeper-3.4.8/bin/../build/lib/*.jar:/data/zookeeper-3.4.8/bin/../lib/slf4j-log4j12-1.6.1.jar:/data/zookeeper-3.4.8/bin/../lib/slf4j-api-1.6.1.jar:/data/zookeeper-3.4.8/bin/../lib/netty-3.7.0.Final.jar:/data/zookeeper-3.4.8/bin/../lib/log4j-1.2.16.jar:/data/zookeeper-3.4.8/bin/../lib/jline-0.9.94.jar:/data/zookeeper-3.4.8/bin/../zookeeper-3.4.8.jar:/data/zookeeper-3.4.8/bin/../src/java/lib/*.jar:/data/zookeeper-3.4.8/bin/../conf: 2016-09-30 01:51:22,273 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2016-09-30 01:51:22,273 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2016-09-30 01:51:22,273 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2016-09-30 01:51:22,273 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2016-09-30 01:51:22,273 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2016-09-30 01:51:22,273 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.16.0-77-generic 2016-09-30 01:51:22,273 [myid:] - INFO [main:Environment@100] - Client environment:user.name=yunyu 2016-09-30 01:51:22,273 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/yunyu 2016-09-30 01:51:22,273 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/data/zookeeper-3.4.8 2016-09-30 01:51:22,275 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=127.0.0.1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@71adff7c Welcome to ZooKeeper! 2016-09-30 01:51:22,299 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2016-09-30 01:51:22,303 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@876] - Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session JLine support is enabled 2016-09-30 01:51:22,337 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x1577a41e9b90000, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: 127.0.0.1:2181(CONNECTED) 0]
# zkShell中输入help会提示出所有的命令参数 [zk: 127.0.0.1:2181(CONNECTED) 0] help
ZooKeeper host:port cmd args
get path [watch]
ls path [watch]
set path data [version]
delquota [-n|-b] path
quit
printwatches on|off
create path data acl
stat path [watch]
listquota path
history
setAcl path acl
getAcl path
sync path
redo cmdno
addauth scheme auth
delete path [version]
deleteall path
setquota -n|-b val path
# 查看znode节点 [zk: 127.0.0.1:2181(CONNECTED) 0] ls
[zookeeper]
# 创建新的znode节点,关联到"my_data"
[zk: 127.0.0.1:2181(CONNECTED) 3] create /zk_test my_data Created /zk_test [zk: 127.0.0.1:2181(CONNECTED) 4] ls /
[zookeeper, zk_test]
# 验证/zk_test节点已经关联到"my_data"
[zk: 127.0.0.1:2181(CONNECTED) 5] get /zk_test my_data cZxid = 0x6 ctime = Mon Aug 29 20:42:40 PDT 2016 mZxid = 0x6 mtime = Mon Aug 29 20:42:40 PDT 2016 pZxid = 0x6 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 7 numChildren = 0
# 修改/zk_test节点的数据关联
[zk: 127.0.0.1:2181(CONNECTED) 6] set /zk_test junk cZxid = 0x6 ctime = Mon Aug 29 20:42:40 PDT 2016 mZxid = 0x7 mtime = Mon Aug 29 20:47:08 PDT 2016 pZxid = 0x6 cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 4 numChildren = 0
[zk: 127.0.0.1:2181(CONNECTED) 7] get /zk_test junk cZxid = 0x6 ctime = Mon Aug 29 20:42:40 PDT 2016 mZxid = 0x7 mtime = Mon Aug 29 20:47:08 PDT 2016 pZxid = 0x6 cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 4 numChildren = 0
# 删除/zk_test节点
[zk: 127.0.0.1:2181(CONNECTED) 8] delete /zk_test [zk: 127.0.0.1:2181(CONNECTED) 9] ls / [zookeeper]

OK,Zookeeper的cluster模式的配置就大功告成了 ^_^

参考文章:

Docker实战(十八)Docker安装Zookeeper集群环境

Dockerfile文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
############################################
# version : birdben/zookeeper_cluster:v1
# desc : 当前版本安装的zookeeper_cluster
############################################
# 设置继承自我们创建的 jdk7 镜像
FROM birdben/jdk7:v1
# 下面是一些创建者的基本信息
MAINTAINER birdben (191654006@163.com)
# 设置环境变量,所有操作都是非交互式的
ENV DEBIAN_FRONTEND noninteractive
# 添加 supervisord 的配置文件,并复制配置文件到对应目录下面。(supervisord.conf文件和Dockerfile文件在同一路径)
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# 设置 zookeeper 的环境变量,若读者有其他的环境变量需要设置,也可以在这里添加。
ENV ZOOKEEPER_HOME /software/zookeeper-3.4.8
ENV PATH ${ZOOKEEPER_HOME}/bin:$PATH
# 复制 zookeeper-3.4.8 文件到镜像中(zookeeper-3.4.8 文件夹要和Dockerfile文件在同一路径)
ADD zookeeper-3.4.8 /software/zookeeper-3.4.8
# 创建myid文件存储路径
RUN mkdir -p /var/zookeeper/myid
# 授权ZOOKEEPER_HOME路径给admin用户
RUN sudo chown -R admin /software/zookeeper-3.4.8
# 容器需要开放Zookeeper 2181, 2888, 3888端口
EXPOSE 2181
EXPOSE 2888
EXPOSE 3888
# 执行supervisord来同时执行多个命令,使用 supervisord 的可执行路径启动服务。
CMD ["/usr/bin/supervisord"]
Dockerfile源文件链接:

https://github.com/birdben/birdDocker/blob/master/zookeeper_cluster/Dockerfile

supervisor配置文件内容
1
2
3
4
5
6
7
8
9
10
11
12
# 配置文件包含目录和进程
# 第一段 supervsord 配置软件本身,使用 nodaemon 参数来运行。
# 第二段包含要控制的 2 个服务。每一段包含一个服务的目录和启动这个服务的命令。
[supervisord]
nodaemon=true
[program:sshd]
command=/usr/sbin/sshd -D
[program:zookeeper]
command=/bin/bash -c "exec ${ZOOKEEPER_HOME}/bin/zkServer.sh start-foreground"
配置ZOOKEEPER_HOME/conf/zoo.cfg并不在conf目录, 需要复制zoo_sample.cfg并改名为zoo.cfg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/var/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
# 第一个port是从机器(follower)连接到主机器(leader)的端口号,第二个port是进行leadership选举的端口号。
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
控制台终端
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 构建镜像
$ docker build -t "birdben/zookeeper_cluster:v1" .
# 启动Docker容器,这里分别对每个docker容器指定了不同的hostname
# 需要暴露2181客户端连接端口号,否则Docker容器外无法连接到zookeeper集群
$ sudo docker run -p 2181:2181 -p 2888:2888 -p 3888:3888 -h zoo1 --name zoo1 -t -i 'birdben/zookeeper_cluster:v1'
$ sudo docker run -p 2181:2181 -p 2888:2888 -p 3888:3888 -h zoo2 --name zoo2 -t -i 'birdben/zookeeper_cluster:v1'
$ sudo docker run -p 2181:2181 -p 2888:2888 -p 3888:3888 -h zoo3 --name zoo3 -t -i 'birdben/zookeeper_cluster:v1'
# 查询Docker容器对应的IP地址
$ sudo docker inspect --format='{{.NetworkSettings.IPAddress}}' ${CONTAINER_ID}
# 需要exec进入Docker容器配置myid和hosts文件
$ sudo docker exec -it ${CONTAINER_ID} /bin/bash
# 配置每个Docker容器的myid,对应zoo序号执行
$ echo 1 > /var/zookeeper/myid
$ echo 2 > /var/zookeeper/myid
$ echo 3 > /var/zookeeper/myid
# 配置每个Docker容器的/etc/hosts文件
172.17.0.51 zoo1
172.17.0.52 zoo2
172.17.0.53 zoo3
# 分别启动每个Docker容器中的zookeeper服务
$ ./{ZOOKEEPER_HOME}/bin/zkServer.sh start
# 查看每个Docker容器的zookeeper运行状态
$ ./{ZOOKEEPER_HOME}/bin/zkServer.sh status
# 下面是我查看每个zookeeper的状态,zoo2的Docker容器的zk是leader,zoo1和zoo3是follower
root@zoo1:/software/zookeeper-3.4.8/bin# ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /software/zookeeper-3.4.8/bin/../conf/zoo.cfg Mode: follower
root@zoo2:/software/zookeeper-3.4.8/bin# ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /software/zookeeper-3.4.8/bin/../conf/zoo.cfg Mode: leader
root@zoo3:/software/zookeeper-3.4.8/bin# ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /software/zookeeper-3.4.8/bin/../conf/zoo.cfg Mode: follower
使用zkCli.sh连接服务端进行操作
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ ./zkCli.sh -server 10.10.1.167:2181
Connecting to 10.10.1.167:2181
2016-09-02 11:01:56,761 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.8--1, built on 02/06/2016 03:18 GMT
2016-09-02 11:01:56,764 [myid:] - INFO [main:Environment@100] - Client environment:host.name=localhost
2016-09-02 11:01:56,764 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_79
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/jre
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/Users/yunyu/dev/zookeeper-3.4.8/bin/../build/classes:/Users/yunyu/dev/zookeeper-3.4.8/bin/../build/lib/*.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/slf4j-log4j12-1.6.1.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/slf4j-api-1.6.1.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/netty-3.7.0.Final.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/log4j-1.2.16.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/jline-0.9.94.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../zookeeper-3.4.8.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../src/java/lib/*.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../conf:
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/Users/yunyu/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/var/folders/0h/jtjrr7g95mv2pt4ts1tgmzyh0000gn/T/
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Mac OS X
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=x86_64
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:os.version=10.11.5
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:user.name=yunyu
2016-09-02 11:01:56,766 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/Users/yunyu
2016-09-02 11:01:56,767 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/Users/yunyu/dev/zookeeper-3.4.8/bin
2016-09-02 11:01:56,767 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=10.10.1.167:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@5be7d8b1
Welcome to ZooKeeper!
2016-09-02 11:01:56,791 [myid:] - INFO [main-SendThread(10.10.1.167:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 10.10.1.167/10.10.1.167:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2016-09-02 11:01:56,798 [myid:] - INFO [main-SendThread(10.10.1.167:2181):ClientCnxn$SendThread@876] - Socket connection established to 10.10.1.167/10.10.1.167:2181, initiating session
2016-09-02 11:01:56,821 [myid:] - INFO [main-SendThread(10.10.1.167:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server 10.10.1.167/10.10.1.167:2181, sessionid = 0x156e8d804300000, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: 10.10.1.167:2181(CONNECTED) 0]
需要注意的地方
  • 因为我们zookeeper的启动方式是用的supervisor启动,但是Docker容器启动的时候,我们还不知道Docker容器的IP地址,无法指定hosts文件配置,所以我们要先进入到Docker容器指定好hosts文件配置,然后重新启动zookeeper服务
  • myid的配置也是每个Docker容器都不一样,最好跟hosts配置对应
  • 需要Docker容器外连接zookeeper集群需要在启动Docker容器时,指定一个Docker容器对外开放2181客户端连接端口号,否则Docker容器外无法连接到zookeeper集群
  • 如果查看zookeeper运行状态提示有问题
  • 值得重点注意的一点是,所有三个机器都应该打开端口 2181、2888 和 3888。在本例中,端口 2181 由 ZooKeeper 客户端使用,用于连接到 ZooKeeper 服务器;端口 2888 由对等 ZooKeeper 服务器使用,用于互相通信;而端口 3888 用于领导者选举。您可以选择自己喜欢的任何端口。通常建议在所有 ZooKeeper 服务器上使用相同的端口。
1
2
3
JMX enabled by default
Using config: /root/zookeeper-3.4.6/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

确认下面两点,应该就能排查出问题,我就遇到过重启Docker容器IP
地址变化,导致/etc/hosts中的IP地址配置不正确

  • 确认/etc/hosts中是否有各个节点域名解析
  • 是否/var/zookeeper/myid有重复值

参考文章:

Zookeeper学习(一)Zookeeper环境搭建

Zookeeper安装

1
2
3
4
5
6
$ wget http://apache.fayea.com/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz
$ tar -zxvf zookeeper-3.4.8.tar.gz
$ cd zookeeper-3.4.8
# 修改zookeeper配置文件
$ cp conf/zoo_sample.cfg conf/zoo.cfg

ZooKeeper Standalone模式

Zookeeper配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
# tickTime 这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个tickTime时间就会发送一个心跳。
tickTime=2000
# the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
# dataDir 顾名思义就是Zookeeper保存数据的目录,默认情况下Zookeeper将写数据的日志文件也保存在这个目录里。
dataDir=/var/lib/zookeeper
# the port to listen for client connections
# clientPort 这个端口就是客户端(应用程序)连接Zookeeper服务器的端口,Zookeeper会监听这个端口接受客户端的访问请求。
clientPort=2181
# initLimit 这个配置项是用来配置Zookeeper接受客户端(这里所说的客户端不是用户连接Zookeeper 服务器的客户端,而是Zookeeper服务器集群中连接到Leader的Follower服务器)初始化连接时最长能忍受多少个心跳时间间隔数。当已经超过10个心跳的时间(也就是tickTime)长度后Zookeeper服务器还没有收到客户端的返回信息,那么表明这个客户端连接失败。总的时间长度就是 10*2000=20 秒。
initLimit=10
# syncLimit 这个配置项标识Leader与Follower之间发送消息,请求和应答时间长度,最长不能超过多少个tickTime的时间长度,总的时间长度就是 5*2000=10 秒。
syncLimit=5

启动Zookeeper服务端

1
2
3
$ ./bin/zkServer.sh start
ZooKeeper JMX enabled by default Using config: /software/zookeeper-3.4.8/bin/../conf/zoo.cfg Starting zookeeper ... STARTED

检查Zookeeper启动状态

1
2
3
4
5
6
7
8
9
10
11
12
$ ./bin/zkServer.sh status
ZooKeeper JMX enabled by default Using config: /software/zookeeper-3.4.8/bin/../conf/zoo.cfg Mode: standalone
# 查看zookeeper的PID
$ ps -ef | grep zookeeper
yunyu 16845 2068 0 11:06 pts/1 00:00:02 /usr/local/jdk1.7.0_79/bin/java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /software/zookeeper-3.4.8/bin/../build/classes:/software/zookeeper-3.4.8/bin/../build/lib/*.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-api-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/netty-3.7.0.Final.jar:/software/zookeeper-3.4.8/bin/../lib/log4j-1.2.16.jar:/software/zookeeper-3.4.8/bin/../lib/jline-0.9.94.jar:/software/zookeeper-3.4.8/bin/../zookeeper-3.4.8.jar:/software/zookeeper-3.4.8/bin/../src/java/lib/*.jar:/software/zookeeper-3.4.8/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /software/zookeeper-3.4.8/bin/../conf/zoo.cfg
yunyu 21506 2748 0 11:31 pts/1 00:00:00 grep --color=auto zookeeper
# 查看一下在监听2181端口的PID
$ lsof -i:2181
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 16845 yunyu 19u IPv6 50306 0t0 TCP *:2181 (LISTEN)

启动Zookeeper Client端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# -server:client端连接的IP和端口号
$ ./zkCli.sh -server 127.0.0.1:2181
# Zookeeper Client端控制台会有类似如下的输出信息
Connecting to 127.0.0.1:2181 2016-08-29 20:35:25,389 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.8--1, built on 02/06/2016 03:18 GMT 2016-08-29 20:35:25,392 [myid:] - INFO [main:Environment@100] - Client environment:host.name=ubuntu 2016-08-29 20:35:25,393 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_79 2016-08-29 20:35:25,396 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2016-08-29 20:35:25,396 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/software/jdk1.7.0_79/jre 2016-08-29 20:35:25,396 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/software/zookeeper-3.4.8/bin/../build/classes:/software/zookeeper-3.4.8/bin/../build/lib/*.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-api-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/netty-3.7.0.Final.jar:/software/zookeeper-3.4.8/bin/../lib/log4j-1.2.16.jar:/software/zookeeper-3.4.8/bin/../lib/jline-0.9.94.jar:/software/zookeeper-3.4.8/bin/../zookeeper-3.4.8.jar:/software/zookeeper-3.4.8/bin/../src/java/lib/*.jar:/software/zookeeper-3.4.8/bin/../conf: 2016-08-29 20:35:25,396 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2016-08-29 20:35:25,396 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2016-08-29 20:35:25,396 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2016-08-29 20:35:25,397 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2016-08-29 20:35:25,397 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2016-08-29 20:35:25,397 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.16.0-77-generic 2016-08-29 20:35:25,397 [myid:] - INFO [main:Environment@100] - Client environment:user.name=yunyu 2016-08-29 20:35:25,397 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/yunyu 2016-08-29 20:35:25,397 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/software/zookeeper-3.4.8/bin 2016-08-29 20:35:25,398 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=127.0.0.1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@56606032 Welcome to ZooKeeper! 2016-08-29 20:35:25,416 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2016-08-29 20:35:25,420 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@876] - Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session JLine support is enabled [zk: 127.0.0.1:2181(CONNECTING) 0] 2016-08-29 20:35:25,447 [myid:] - INFO [main-SendThread(127.0.0.1:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x156d969c5940002, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: 127.0.0.1:2181(CONNECTED) 0] [zk: 127.0.0.1:2181(CONNECTED) 0] [zk: 127.0.0.1:2181(CONNECTED) 0]
# zkShell中输入help会提示出所有的命令参数 [zk: 127.0.0.1:2181(CONNECTED) 0] help
ZooKeeper host:port cmd args
get path [watch]
ls path [watch]
set path data [version]
delquota [-n|-b] path
quit
printwatches on|off
create path data acl
stat path [watch]
listquota path
history
setAcl path acl
getAcl path
sync path
redo cmdno
addauth scheme auth
delete path [version]
deleteall path
setquota -n|-b val path
# 查看znode节点 [zk: 127.0.0.1:2181(CONNECTED) 0] ls
[zookeeper]
# 创建新的znode节点,关联到"my_data"
[zk: 127.0.0.1:2181(CONNECTED) 3] create /zk_test my_data Created /zk_test [zk: 127.0.0.1:2181(CONNECTED) 4] ls /
[zookeeper, zk_test]
# 验证/zk_test节点已经关联到"my_data"
[zk: 127.0.0.1:2181(CONNECTED) 5] get /zk_test my_data cZxid = 0x6 ctime = Mon Aug 29 20:42:40 PDT 2016 mZxid = 0x6 mtime = Mon Aug 29 20:42:40 PDT 2016 pZxid = 0x6 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 7 numChildren = 0
# 修改/zk_test节点的数据关联
[zk: 127.0.0.1:2181(CONNECTED) 6] set /zk_test junk cZxid = 0x6 ctime = Mon Aug 29 20:42:40 PDT 2016 mZxid = 0x7 mtime = Mon Aug 29 20:47:08 PDT 2016 pZxid = 0x6 cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 4 numChildren = 0
[zk: 127.0.0.1:2181(CONNECTED) 7] get /zk_test junk cZxid = 0x6 ctime = Mon Aug 29 20:42:40 PDT 2016 mZxid = 0x7 mtime = Mon Aug 29 20:47:08 PDT 2016 pZxid = 0x6 cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 4 numChildren = 0
# 删除/zk_test节点
[zk: 127.0.0.1:2181(CONNECTED) 8] delete /zk_test [zk: 127.0.0.1:2181(CONNECTED) 9] ls / [zookeeper]

检查Zookeeper Client连接后的进程状态

1
2
3
4
5
6
# 查看zookeeper的PID
$ ps -ef | grep zookeeper
yunyu@ubuntu:/software/zookeeper-3.4.8/bin$ ps -ef | grep zookeeper yunyu 16845 2068 0 11:06 pts/1 00:00:02 /usr/local/jdk1.7.0_79/bin/java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /software/zookeeper-3.4.8/bin/../build/classes:/software/zookeeper-3.4.8/bin/../build/lib/*.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-api-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/netty-3.7.0.Final.jar:/software/zookeeper-3.4.8/bin/../lib/log4j-1.2.16.jar:/software/zookeeper-3.4.8/bin/../lib/jline-0.9.94.jar:/software/zookeeper-3.4.8/bin/../zookeeper-3.4.8.jar:/software/zookeeper-3.4.8/bin/../src/java/lib/*.jar:/software/zookeeper-3.4.8/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /software/zookeeper-3.4.8/bin/../conf/zoo.cfg yunyu 17569 17564 0 11:09 pts/7 00:00:02 /usr/local/jdk1.7.0_79/bin/java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /software/zookeeper-3.4.8/bin/../build/classes:/software/zookeeper-3.4.8/bin/../build/lib/*.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-api-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/netty-3.7.0.Final.jar:/software/zookeeper-3.4.8/bin/../lib/log4j-1.2.16.jar:/software/zookeeper-3.4.8/bin/../lib/jline-0.9.94.jar:/software/zookeeper-3.4.8/bin/../zookeeper-3.4.8.jar:/software/zookeeper-3.4.8/bin/../src/java/lib/*.jar:/software/zookeeper-3.4.8/bin/../conf: org.apache.zookeeper.ZooKeeperMain -server 127.0.0.1:2181 yunyu 21506 2748 0 11:31 pts/1 00:00:00 grep --color=auto zookeeper
# 查看一下在监听2181端口的PID
yunyu@ubuntu:/software/zookeeper-3.4.8/bin$ lsof -i:2181 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 16845 yunyu 19u IPv6 50306 0t0 TCP *:2181 (LISTEN) java 16845 yunyu 20u IPv6 51041 0t0 TCP localhost:2181->localhost:40895 (ESTABLISHED) java 17569 yunyu 13u IPv6 51020 0t0 TCP localhost:40895->localhost:2181 (ESTABLISHED)

OK,Zookeeper的standalone模式的配置就大功告成了 ^_^

参考文章:

Docker实战(十七)Docker安装Zookeeper环境

Zookeeper安装
1
2
3
4
5
6
7
# 下载Zookeeper
$ wget http://apache.fayea.com/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz
# 解压Zookeeper压缩包
$ tar -zxvf zookeeper-3.4.8.tar.gz
# 需要修改下面Zookeeper相关的配置文件
Dockerfile文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
############################################
# version : birdben/zookeeper:v1
# desc : 当前版本安装的zookeeper
############################################
# 设置继承自我们创建的 jdk7 镜像
FROM birdben/jdk7:v1
# 下面是一些创建者的基本信息
MAINTAINER birdben (191654006@163.com)
# 设置环境变量,所有操作都是非交互式的
ENV DEBIAN_FRONTEND noninteractive
# 添加 supervisord 的配置文件,并复制配置文件到对应目录下面。(supervisord.conf文件和Dockerfile文件在同一路径)
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# 设置 zookeeper 的环境变量,若读者有其他的环境变量需要设置,也可以在这里添加。
ENV ZOOKEEPER_HOME /software/zookeeper-3.4.8
ENV PATH ${ZOOKEEPER_HOME}/bin:$PATH
# 复制 zookeeper-3.4.8 文件到镜像中(zookeeper-3.4.8 文件夹要和Dockerfile文件在同一路径)
ADD zookeeper-3.4.8 /software/zookeeper-3.4.8
# 创建myid文件存储路径
RUN mkdir -p /var/zookeeper/myid
# 授权ZOOKEEPER_HOME路径给admin用户
RUN sudo chown -R admin /software/zookeeper-3.4.8
# 容器需要开放Zookeeper 2181端口
EXPOSE 2181
# 执行supervisord来同时执行多个命令,使用 supervisord 的可执行路径启动服务。
CMD ["/usr/bin/supervisord"]
Dockerfile源文件链接:

https://github.com/birdben/birdDocker/blob/master/zookeeper/Dockerfile

supervisor配置文件内容
1
2
3
4
5
6
7
8
9
10
11
12
# 配置文件包含目录和进程
# 第一段 supervsord 配置软件本身,使用 nodaemon 参数来运行。
# 第二段包含要控制的 2 个服务。每一段包含一个服务的目录和启动这个服务的命令。
[supervisord]
nodaemon=true
[program:sshd]
command=/usr/sbin/sshd -D
[program:zookeeper]
command=/bin/bash -c "exec ${ZOOKEEPER_HOME}/bin/zkServer.sh start-foreground"
配置ZOOKEEPER_HOME/conf/zoo.cfg并不在conf目录, 需要复制zoo_sample.cfg并改名为zoo.cfg
1
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/var/zookeeper # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1
控制台终端
1
2
3
4
# 构建镜像
$ docker build -t "birdben/zookeeper:v1" .
# 执行已经构件好的镜像
$ docker run -p 9999:22 -p 2181:2181 -t -i "birdben/zookeeper:v1"
supervisor无法监控zookeeper
1
2
supervisor启动的程序必须是非daemon的启动方式,这里找到zookeeper的启动脚本${ZOOKEEPER_HOME}/bin/zkServer.sh,正常的参数可以选择start, status, stop等等。这里我们找到参数start-foreground,这个就是非守护进程的方式启动。所以supervisord.conf配置文件修改成下面的方式
${ZOOKEEPER_HOME}/bin/zkServer.sh start-foreground
使用zkCli.sh连接服务端进行操作
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 我们通过另一个zk客户端连接到Docker容器的zk服务端
$ ./zkCli.sh -server 10.10.1.167:2181
Connecting to 10.10.1.167:2181
2016-09-01 14:32:42,398 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.8--1, built on 02/06/2016 03:18 GMT
2016-09-01 14:32:42,402 [myid:] - INFO [main:Environment@100] - Client environment:host.name=localhost
2016-09-01 14:32:42,402 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_79
2016-09-01 14:32:42,405 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2016-09-01 14:32:42,405 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/jre
2016-09-01 14:32:42,406 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/Users/yunyu/dev/zookeeper-3.4.8/bin/../build/classes:/Users/yunyu/dev/zookeeper-3.4.8/bin/../build/lib/*.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/slf4j-log4j12-1.6.1.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/slf4j-api-1.6.1.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/netty-3.7.0.Final.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/log4j-1.2.16.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../lib/jline-0.9.94.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../zookeeper-3.4.8.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../src/java/lib/*.jar:/Users/yunyu/dev/zookeeper-3.4.8/bin/../conf:
2016-09-01 14:32:42,406 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/Users/yunyu/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
2016-09-01 14:32:42,406 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/var/folders/0h/jtjrr7g95mv2pt4ts1tgmzyh0000gn/T/
2016-09-01 14:32:42,406 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2016-09-01 14:32:42,406 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Mac OS X
2016-09-01 14:32:42,406 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=x86_64
2016-09-01 14:32:42,406 [myid:] - INFO [main:Environment@100] - Client environment:os.version=10.11.5
2016-09-01 14:32:42,406 [myid:] - INFO [main:Environment@100] - Client environment:user.name=yunyu
2016-09-01 14:32:42,406 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/Users/yunyu
2016-09-01 14:32:42,407 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/Users/yunyu/dev/zookeeper-3.4.8/bin
2016-09-01 14:32:42,408 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=10.10.1.167:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@3823ed31
Welcome to ZooKeeper!
2016-09-01 14:32:42,441 [myid:] - INFO [main-SendThread(10.10.1.167:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 10.10.1.167/10.10.1.167:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2016-09-01 14:32:42,453 [myid:] - INFO [main-SendThread(10.10.1.167:2181):ClientCnxn$SendThread@876] - Socket connection established to 10.10.1.167/10.10.1.167:2181, initiating session
2016-09-01 14:32:42,492 [myid:] - INFO [main-SendThread(10.10.1.167:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server 10.10.1.167/10.10.1.167:2181, sessionid = 0x156e4682ae30000, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: 10.10.1.167:2181(CONNECTED) 0]
# 查看znode节点 [zk: 10.10.1.167:2181(CONNECTED) 0] ls
[zookeeper]
# 创建新的znode节点,关联到"my_data"
[zk: 10.10.1.167:2181(CONNECTED) 3] create /zk_test my_data Created /zk_test [zk: 10.10.1.167:2181(CONNECTED) 4] ls /
[zookeeper, zk_test]
遇到问题zkServer.sh status
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# SSH登录到Docker容器,查看supervisor的状态
$ sudo supervisorctl status
sshd RUNNING pid 8, uptime 0:35:36
zookeeper RUNNING pid 9, uptime 0:35:36
# 查看zookeeper进程
$ ps -ef | grep zookeeper
root 9 1 0 06:20 ? 00:00:04 /software/jdk7/bin/java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /software/zookeeper-3.4.8/bin/../build/classes:/software/zookeeper-3.4.8/bin/../build/lib/*.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/slf4j-api-1.6.1.jar:/software/zookeeper-3.4.8/bin/../lib/netty-3.7.0.Final.jar:/software/zookeeper-3.4.8/bin/../lib/log4j-1.2.16.jar:/software/zookeeper-3.4.8/bin/../lib/jline-0.9.94.jar:/software/zookeeper-3.4.8/bin/../zookeeper-3.4.8.jar:/software/zookeeper-3.4.8/bin/../src/java/lib/*.jar:/software/zookeeper-3.4.8/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /software/zookeeper-3.4.8/bin/../conf/zoo.cfg
admin 198 191 0 06:57 pts/0 00:00:00 grep zookeeper
# 使用zkServer.sh status查看zk的状态,发现提示zk并没有运行
$ ./zkServer.sh status
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
ZooKeeper JMX enabled by default
Using config: /software/zookeeper-3.4.8/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
# 这个问题是因为zk需要依赖jdk环境,我们需要配置java环境变量,因为Dockerfile中的配置对于我们ssh的admin用户是不生效的,需要单独export一下
$ export JAVA_HOME=/software/jdk7
# 再次查看就OK了
$ ./zkServer.sh status
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
ZooKeeper JMX enabled by default
Using config: /software/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: standalone

参考文章:

Ubuntu安装Shadowsocks

安装Shadowsocks和Supervisor

1
2
3
4
5
sudo apt-get update
sudo apt-get install python-pip
sudo pip install --upgrade pip
sudo pip install shadowsocks
sudo apt-get install supervisor

添加Shadowsocks配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ vi /etc/shadowsocks.json
# 具体配置内容
{
"server":"0.0.0.0",
"server_port":443,
"local_address":"127.0.0.1",
"local_port":1080,
"password":"123456",
"timeout":500,
"method":"aes-256-cfb",
"fast_open":false
}
# 配置内容描述
server : 服务端监听的地址,服务端可填写 0.0.0.0
server_port : 服务端的端口
local_address : 本地端监听的地址
local_port : 本地端的端口
password : 用于加密的密码
timeout : 超时时间,单位秒
method : 默认"aes-256-cfb",建议chacha20或者rc4-md5,因为这两个速度快
fast_open : 是否使用 TCP_FASTOPEN, true / false(后面优化部分会打开系统的 TCP_FASTOPEN,所以这里填 true,否则填 false)

启动或停止Shadowsocks

1
2
$ ssserver -c /etc/shadowsocks.json -d start
$ ssserver -c /etc/shadowsocks.json -d stop

检查Shadowsocks日志

1
$ tailf /var/log/shadowsocks.log

将shadowsocks加入开机启动

1
2
$ sudo vi /etc/rc.local
/usr/bin/python /usr/local/bin/ssserver -c /etc/shadowsocks.json -d start

添加supervisord配置文件

1
2
3
4
5
6
7
8
9
10
11
12
$ sudo vi /etc/supervisord.conf
# 添加配置
[program:shadowsocks]
command=ssserver -c /etc/shadowsocks.json
autostart=true
autorestart=true
user=root
stderr_logfile=/var/log/supervisor/supervisor.log
stopsignal=INT
[supervisord]

启动 supervisord

1
$ sudo supervisord -c /etc/supervisord.conf

检查supervisor日志

1
$ tailf /var/log/supervisor/supervisor.log

把 supervisor 加入开机启动进程

1
2
3
4
$ vi /etc/rc.local
# 添加配置
supervisord -c /etc/supervisord.conf

Flume学习(七)Flume整合Elasticsearch2.x

环境简介

  • JDK1.7.0_79
  • Flume1.6.0
  • Elasticsearch2.0.0

Flume不支持Elasticsearch2.x版本

目前官方Flume最新的版本是1.6.0,该版本只支持Elasticsearch1.7.x的版本,暂时不支持Elasticsearch2.x版本,因为Elasticsearch2.x版本做了比较大的改动,很多API都已经废弃不用了,具体可以参考下面的连接

第三方ElasticsearchSink2支持2.x版本

这里我找到了一个第三方开源的FlumeSink插件来支持Elasticsearch2.x版本

但是这个项目是使用Gradle编译打包的,所以下面先简单介绍下Gradle的安装和使用

我这里使用的Mac,所以安装Gradle很简单。但是Gradle是依赖于JVM的

1
2
$ brew install gradle
$ gradle -v

然后从github下载ElasticsearchSink2的源代码,并且导入到idea中,然后执行下面gradle命令构建项目(下面这两个脚本是ElasticsearchSink2项目自带的),会在ElasticsearchSink2/build/libs/目录下生成对应的jar包

1
2
3
4
5
# 构建标准的jar包
$ ./gradlew build
# 构建包含Elasticsearch依赖的jar包
$ ./gradlew assembly

添加刚才构建好的elasticsearch-sink2-1.0.jar到Flume的classpath或者是Flume的lib目录,删除Flume的lib目录下的guava-.jar,jackson-core-.jar。

具体的配置文件

Flume Collector端的flume_collector_es2.conf配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
agentX.sources = flume-avro-sink
agentX.channels = chX
agentX.sinks = flume-es-sink
agentX.sources.flume-avro-sink.channels = chX
agentX.sources.flume-avro-sink.type = avro
agentX.sources.flume-avro-sink.bind = 127.0.0.1
agentX.sources.flume-avro-sink.port = 41414
agentX.sources.flume-avro-sink.threads = 8
agentX.sources.flume-avro-sink.interceptors = es_interceptor
agentX.sources.flume-avro-sink.interceptors.es_interceptor.type = regex_extractor
#agentX.sources.flume-avro-sink.interceptors.es_interceptor.regex = (\"([^,^\"]+)\":\"([^:^\"]+)\")|(\"([^,^\"]+)\":([\\d]+))
#agentX.sources.flume-avro-sink.interceptors.es_interceptor.regex = (\\d):(\\d):(\\d):(\\d):(\\d):(\\d)
# mapping不正确没有匹配成功
#agentX.sources.flume-avro-sink.interceptors.es_interceptor.regex = (TIME:(.*?)),(HOSTNAME:(.*?)),(LI:(.*?)),(LU:(.*?)),(NU:(.*?)),(CMD:(.*?))
# mapping正确,数据匹配不正确包含了多余的字段名
#agentX.sources.flume-avro-sink.interceptors.es_interceptor.regex = (\"TIME\":(.*?)),(\"HOSTNAME\":(.*?)),(\"LI\":(.*?)),(\"LU\":(.*?)),(\"NU\":(.*?)),(\"CMD\":(.*?))
# mapping正确,数据也正确({}需要转义,转义符是\\)
agentX.sources.flume-avro-sink.interceptors.es_interceptor.regex = "TIME":(.*?),"HOSTNAME":(.*?),"LI":(.*?),"LU":(.*?),"NU":(.*?),"CMD":(.*?)
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers = s1 s2 s3 s4 s5 s6
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s1.name = aaa
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s2.name = bbb
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s3.name = s3
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s4.name = s4
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s5.name = s5
agentX.sources.flume-avro-sink.interceptors.es_interceptor.serializers.s6.name = s6
agentX.channels.chX.type = memory
agentX.channels.chX.capacity = 1000
agentX.channels.chX.transactionCapacity = 100
agentX.sinks.flume-es-sink.channel = chX
agentX.sinks.flume-es-sink.type = com.frontier45.flume.sink.elasticsearch2.ElasticSearchSink
# 每个事务写入多少个Event
agentX.sinks.flume-es-sink.batchSize = 100
agentX.sinks.flume-es-sink.hostNames = 127.0.0.1:9300
# 注意:indexName必须小写
agentX.sinks.flume-es-sink.indexName = command_index
agentX.sinks.flume-es-sink.indexType = logs
agentX.sinks.flume-es-sink.clusterName = ben-es
# ttl 的时间,过期了会自动删除文档,如果没有设置则永不过期,ttl使用integer或long型,单位可以是:ms (毫秒), s (秒), m (分), h (小时), d (天) and w (周)。例如:a1.sinks.k1.ttl = 5d则表示5天后过期。这里没用到
# agentX.sinks.flume-es-sink.ttl = 5d
agentX.sinks.flume-es-sink.serializer = com.frontier45.flume.sink.elasticsearch2.ElasticSearchLogStashEventSerializer

flume_collector_es2.conf这里的配置需要修改

1
2
sink.type = com.frontier45.flume.sink.elasticsearch2.ElasticSearchSink
sink.serializer = com.frontier45.flume.sink.elasticsearch2.ElasticSearchLogStashEventSerializer

Flume Agent端的flume_agent_file.conf配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
agent3.sources = command-logfile-source
agent3.channels = ch3
agent3.sinks = flume-avro-sink
agent3.sources.command-logfile-source.channels = ch3
agent3.sources.command-logfile-source.type = exec
agent3.sources.command-logfile-source.command = tail -F /Users/yunyu/Downloads/command.log
agent3.channels.ch3.type = memory
agent3.channels.ch3.capacity = 1000
agent3.channels.ch3.transactionCapacity = 100
agent3.sinks.flume-avro-sink.channel = ch3
agent3.sinks.flume-avro-sink.type = avro
agent3.sinks.flume-avro-sink.hostname = 127.0.0.1
agent3.sinks.flume-avro-sink.port = 41414

启动ES服务

1
$ ./bin/elasticseach -d

启动Flume Collector端

1
$ ./bin/flume-ng agent --conf ./conf/ -f conf/flume_collector_es2.conf -Dflume.root.logger=DEBUG,console -n agentX

启动Flume Agent端

1
$ ./bin/flume-ng agent --conf ./conf/ -f conf/flume_agent_file.conf -Dflume.root.logger=DEBUG,console -n agentX

启动Flume Agent可能会遇到如下的错误:

1
2
3
4
5
2016-08-27 14:25:58,045 (lifecycleSupervisor-1-1) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@76d45adb counterGroup:{ name:null counters:{} } } - Exception follows.
java.lang.NoSuchFieldError: LUCENE_5_3_1
at org.elasticsearch.Version.<clinit>(Version.java:279)
at org.elasticsearch.client.transport.TransportClient$Builder.build(TransportClient.java:131)
at com.frontier45.flume.sink.elasticsearch2.client.ElasticSearchTransportClient.openClient(ElasticSearchTransportClient.java:198)

出现上面的错误是因为lucene的版本不对,这里我开始尝试安装的是下面的几个版本:

  • Elasticsearch-2.2.2:没有选择Elasticsearch-2.2.2版本是因为ik分词插件没有对应的版本
  • Elasticsearch-2.2.1:没有选择Elasticsearch-2.2.1版本是因为报错没有找到LUCENE_5_3_1对应的Lucene-5.3.1版本,但是这里下载Elasticsearch-2.2.1版本中的jar包依赖的是Lucene-5.4.1版本,所以找不到Lucene-5.3.1版本。这里可能是因为ElasticsearchSink2的问题,暂时先换成ElasticsearchSink2使用的2.0.0版本,后续在尝试单独升级Lucene-5.4.1试试。
  • Elasticsearch-2.0.0:ElasticsearchSink2使用的此版本

检查一下ES源码的pom文件就可以知道ES和Lucene的版本对应关系如下:

ES版本 Lucene版本 ik插件版本
Elasticsearch-2.2.2 Lucene-5.4.1
Elasticsearch-2.2.1 Lucene-5.4.1 1.8.1
Elasticsearch-2.0.0 Lucene-5.2.1 1.5.0

检查ES的索引数据

如下图所示,ES的mapping和索引数据都正确,说明我们使用ElasticsearchSink2的方式成功将Flume1.6.0采集的command.log日志数据写入到Elasticsearch2.0.0版本里

mapping

data

参考文章: