The default behavior of the file input plugin is to ignore files whose last modification is greater than 86400s. To change this default behavior and process the tutorial file (which date can be much older than a day), we need to specify to not ignore old files.
smallest or largest - (optional, default largest) If the consumer does not already have an established offset or offset is invalid, start with the earliest message present in the log (smallest) or after the last message in the log (largest).
reset_beginning
Value type is boolean
Default value is false
Reset the consumer group to start at the earliest message present in the log by clearing any offsets for the group stored in Zookeeper. This is destructive! Must be used in conjunction with auto_offset_reset ⇒ smallest
analyze_wildcard:默认情况下,通配符查询是不会被分词的,如果该属性设置为true,将尽力去分词。(原文:By default, wildcards terms in a query string are not analyzed. By setting this value to true, a best effort will be made to analyze those as well.)
下面是ES官方文档的相关说明
1
2
3
4
5
6
7
8
9
10
Wildcards
Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters:
qu?ck bro*
Be aware that wildcard queries can use an enormous amount of memory and perform very badly — just think how many terms need to be queried to match the query string "a* b* c*".
Warning
Allowing a wildcard at the beginning of a word (eg "*ing") is particularly heavy, because all terms in the index need to be examined, just in case they match. Leading wildcards can be disabled by setting allow_leading_wildcard to false.
Wildcarded terms are not analyzed by default — they are lowercased (lowercase_expanded_terms defaults to true) but no further analysis is done, mainly because it is impossible to accurately analyze a word that is missing some of its letters. However, by setting analyze_wildcard to true, an attempt will be made to analyze wildcarded words before searching the term list for matching terms.
$ storm kill AdLog
Running: /usr/local/java/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/data/storm-0.10.2 -Dstorm.log.dir=/data/storm-0.10.2/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /data/storm-0.10.2/lib/storm-core-0.10.2.jar:/data/storm-0.10.2/lib/slf4j-api-1.7.7.jar:/data/storm-0.10.2/lib/clojure-1.6.0.jar:/data/storm-0.10.2/lib/disruptor-2.10.4.jar:/data/storm-0.10.2/lib/servlet-api-2.5.jar:/data/storm-0.10.2/lib/log4j-api-2.1.jar:/data/storm-0.10.2/lib/log4j-core-2.1.jar:/data/storm-0.10.2/lib/minlog-1.2.jar:/data/storm-0.10.2/lib/reflectasm-1.07-shaded.jar:/data/storm-0.10.2/lib/log4j-over-slf4j-1.6.6.jar:/data/storm-0.10.2/lib/asm-4.0.jar:/data/storm-0.10.2/lib/hadoop-auth-2.4.0.jar:/data/storm-0.10.2/lib/kryo-2.21.jar:/data/storm-0.10.2/lib/log4j-slf4j-impl-2.1.jar:/usr/local/storm/conf:/data/storm-0.10.2/bin backtype.storm.command.kill_topology AdLog
1331 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
1401 [main] INFO b.s.u.Utils - Using storm.yaml from resources
1954 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
1971 [main] INFO b.s.u.Utils - Using storm.yaml from resources
1987 [main] INFO b.s.thrift - Connecting to Nimbus at hadoop1:6627 as user:
1987 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
2024 [main] INFO b.s.u.Utils - Using storm.yaml from resources
2045 [main] INFO b.s.u.StormBoundedExponentialBackoffRetry - The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2094 [main] INFO b.s.c.kill-topology - Killed topology: AdLog
在分布式模式下,Storm由一堆机器组成。当你提交Topology给master的时候,你同时也把Topology的代码提交了。master负责分发你的代码并且负责给你的Topolgoy分配工作进程。如果一个工作进程挂掉了, master节点会把认为重新分配到其它节点。关于如何在一个集群上面运行Topology,你可以看看Running topologies on a production cluster文章。
Nimbus: Run the command "bin/storm nimbus" under supervision on the master machine.
Supervisor: Run the command "bin/storm supervisor" under supervision on each worker machine. The supervisor daemon is responsible for starting and stopping worker processes on that machine.
UI: Run the Storm UI (a site you can access from the browser that gives diagnostics on the cluster and topologies) by running the command "bin/storm ui" under supervision. The UI can be accessed by navigating your web browser to http://{ui host}:8080.
# 进入项目根目录下
$ cd /Users/yunyu/workspace_git/birdHadoop
# 编译打包
$ mvn clean package
# 执行java -jar运行我们打好的jar包,这里将相关操作写成了Shell脚本
$ sh scripts/storm/runWordCount_Local.sh
1
2
#### runWordCount_Local.sh脚本文件
#!/bin/bash
local_path=~/workspace_git/birdHadoop
local_inputfile_path=$local_path/inputfile/WordCount
local_outputfile_path=$local_path/outputfile/WordCount
if [ -f $local_inputfile_path/input_WordCount.bak ]; then
# 如果本地bak文件存在,就重命名去掉bak
echo "正在重命名$local_inputfile_path/input_WordCount.bak文件"
mv $local_inputfile_path/input_WordCount.bak $local_inputfile_path/input_WordCount
fi
if [ ! -d $local_outputfile_path ]; then
# 如果本地文件目录不存在,就自动创建
echo "自动创建$outputfile_path目录"
mkdir -p $local_outputfile_path
else
# 如果本地文件已经存在,就删除
echo "删除$local_outputfile_path/*目录下的所有文件"
rm -rf $local_outputfile_path/*
fi
# 需要在Maven的pom.xml文件中指定jar的入口类
echo "开始执行birdHadoop.jar..."
java -jar $local_path/target/birdHadoop.jar $local_inputfile_path $local_outputfile_path
echo "结束执行birdHadoop.jar..."
1
2
下面是执行过程中的输出
$ sh scripts/storm/runWordCount_Local.sh
正在重命名/Users/yunyu/workspace_git/birdHadoop/inputfile/WordCount/input_WordCount.bak文件
删除/Users/yunyu/workspace_git/birdHadoop/outputfile/WordCount/*目录下的所有文件
开始执行birdHadoop.jar...
log4j:WARN No appenders could be found for logger (backtype.storm.utils.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
WordCounter prepare out start
WordCounter clean out start
WordCounter result : hadoop 3
WordCounter result : hive 2
WordCounter result : logstash 1
WordCounter result : hbase 2
WordCounter result : flume 1
WordCounter result : kafka 2
WordCounter result : storm 1
WordCounter result : spark 1
WordCounter result : es 1
WordCounter clean out end
结束执行birdHadoop.jar...
Caused by: java.io.IOException: Found multiple defaults.yaml resources. You're probably bundling the Storm jars with your topology jar. [jar:file:/data/storm-0.10.2/lib/storm-core-0.10.2.jar!/defaults.yaml, jar:file:/home/yunyu/Downloads/birdHadoop/target/birdHadoop.jar!/defaults.yaml]
at backtype.storm.utils.Utils.getConfigFileInputStream(Utils.java:266)
at backtype.storm.utils.Utils.findAndReadConfigFile(Utils.java:220)
... 103 more
1
2
集群模式提交Topology,需要将代码打包成jar包,然后在Storm集群机器上运行"storm jar birdHadoop.jar com.birdben.storm.demo.WordCountMain inputpath outputpath"命令,这样该Topology会运行在不同的JVM或物理机器上,并且可以在Storm UI中监控到。使用集群模式时,不能在代码中控制集群,这和LocalCluster是不一样的。无法在代码中控制集群的停止
$ storm kill WordCount
Running: /usr/local/java/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/data/storm-0.10.2 -Dstorm.log.dir=/data/storm-0.10.2/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /data/storm-0.10.2/lib/kryo-2.21.jar:/data/storm-0.10.2/lib/servlet-api-2.5.jar:/data/storm-0.10.2/lib/hadoop-auth-2.4.0.jar:/data/storm-0.10.2/lib/minlog-1.2.jar:/data/storm-0.10.2/lib/storm-core-0.10.2.jar:/data/storm-0.10.2/lib/log4j-core-2.1.jar:/data/storm-0.10.2/lib/reflectasm-1.07-shaded.jar:/data/storm-0.10.2/lib/clojure-1.6.0.jar:/data/storm-0.10.2/lib/disruptor-2.10.4.jar:/data/storm-0.10.2/lib/log4j-over-slf4j-1.6.6.jar:/data/storm-0.10.2/lib/asm-4.0.jar:/data/storm-0.10.2/lib/log4j-slf4j-impl-2.1.jar:/data/storm-0.10.2/lib/slf4j-api-1.7.7.jar:/data/storm-0.10.2/lib/log4j-api-2.1.jar:/usr/local/storm/conf:/data/storm-0.10.2/bin backtype.storm.command.kill_topology WordCount
1467 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
1535 [main] INFO b.s.u.Utils - Using storm.yaml from resources
2180 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
2200 [main] INFO b.s.u.Utils - Using storm.yaml from resources
2227 [main] INFO b.s.thrift - Connecting to Nimbus at hadoop1:6627 as user:
2228 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
2251 [main] INFO b.s.u.Utils - Using storm.yaml from resources
2269 [main] INFO b.s.u.StormBoundedExponentialBackoffRetry - The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
$ sudo sh -c 'echo deb http://pkg.jenkins.io/debian-stable binary/ > /etc/apt/sources.list.d/jenkins.list'
$ sudo apt-get update
$ sudo apt-get install jenkins
下面还提到了一些注意事项
1
2
3
4
5
6
7
8
9
10
This package installation will:
Setup Jenkins as a daemon launched on start. See /etc/init.d/jenkins for more details.
Create a jenkins user to run this service.
Direct console log output to the file /var/log/jenkins/jenkins.log. Check this file if you are troubleshooting Jenkins.
Populate /etc/default/jenkins with configuration parameters for the launch, e.g JENKINS_HOME
Set Jenkins to listen on port 8080. Access this port with your browser to start configuration.
# 如何修改默认使用的8080端口
If your /etc/init.d/jenkins file fails to start Jenkins, edit the /etc/default/jenkins to replace the line ----HTTP_PORT=8080---- with ----HTTP_PORT=8081---- Here, "8081" was chosen but you can put another port available.