Ubuntu hadoop 2.2+hive 0.12安装配置流程

Ubuntu hadoop 2.2+hive 0.12 安装配置流程

环境 ubuntu linux 12.04 32 位桌面版，默认 linux 用户 hu 。

1. 安装 java

下载 JDK 压缩包，解压到 /usr/java 目录

修改环境变量； /etc/environment; /etc/profile

命令行 source /etc/environment / source /etc/ profile 使配置生效。

命令行 java – version ，确认 java 环境配置好。

2. 安装 hadoop

下载 hadoop2.2 版本。创建目录 /home/hduser 。
执行 tar zxf hadoop-2.2.0.tar.gz 解压至当前目录 /home/hduser 目录下。

3 ：配置 hadoop:

编辑 /home/hduser/hadoop/etc/hadoop/hadoop-env.sh ，如下。

编辑 /home/hduser/hadoop/etc/hadoop/core-site.xml ，在 <configuration> 中添加如下：

<name>hadoop.tmp.dir</name>

<value>/home/hduser/hadoop/tmp/hadoop-${user.name}</value>

<description>A base for other temporarydirectories.</description>

</property>

<name>fs.default.name</name>

<value>hdfs://localhost:8010</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

备注：配置了 /home/hduser/hadoop/tmp/ 这个目录，必须执行 mkdir /home/hduser/hadoop/tmp/ 创建它，否则后面运行会报错。

编辑 /home/hduser/hadoop/etc/hadoop/mapred-site.xml ：

(1) mv /home/hduser/hadoop/etc/hadoop/mapred-site.xml.template/ home/hduser/hadoop/etc/hadoop/mapred-site.xml

(2) 在 <configuration> 中添加如下 :

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs at. If "local", thenjobs are run in-process as a single map and reduce task.

</description>

</property>

<name>mapred.map.tasks</name>

<description>As a rule of thumb, use 10x the number of slaves(i.e., number of tasktrackers). </description>

</property>

<name>mapred.reduce.tasks</name>

<description>As a rule of thumb, use 2x the number of slaveprocessors (i.e., number of tasktrackers).

</description>

</property>

编辑 /home/hduser/hadoop/etc/hadoop/hdfs-site.xml ，在 <configuration> 中添加如下：

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file iscreated.

The default is used if replication is not specified in create time.

</description>

</property>

4 ：运行 Hadoop

在初次运行 Hadoop 的时候需要初始化 Hadoop 文件系统，命令如下：

$cd /home/hduser/hadoop/bin

$./hdfs namenode -format

如果执行成功，你会在日志中 ( 倒数几行 ) 找到如下成功的提示信息：

common.Storage: Storage directory/home/hduser/hadoop/tmp/hadoop-hduser/dfs/name has been successfully formatted.

运行命令如下：

$cd /home/hduser/hadoop/sbin/

$./start-dfs.sh

注：该过程需要多次输入密码 , 如果不想多次输入密码，可先用 ssh 建立信任。

hduser@ubuntu:~/hadoop/sbin$ jps

4266 SecondaryNameNode

4116 DataNode

4002 NameNode

注：用 jps 查看启动了三个进程。

$./start-yarn.sh

hduser@ubuntu:~/hadoop/sbin$ jps

4688 NodeManager

4266 SecondaryNameNode

4116 DataNode

4002 NameNode

4413 ResourceManager

5 ：查看 Hadoop 资源管理器

http://192.168.128.129:8088/ ，将其中的 192.168.128.129 替换为你的实际 IP 地址。

6 ：测试 Hadoop

cd /home/hduser

$wget http://www.gutenberg.org/cache/epub/20417/pg20417.txt

$cd hadoop

$ bin/hdfs dfs -mkdir /tmp

$ bin/hdfs dfs -copyFromLocal /home/hduser/pg20417.txt /tmp

bin/hdfs dfs -ls /tmp

$bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /tmp//tmp-output

如果一切正常的话，会输入相应的结果，可以从屏幕输出看到。

八：停止 Hadoop

若停止 hadoop ，依次运行如下命令：

$./stop-yarn.sh

$./stop-dfs.sh

安装 Hive

H ive 解压到 /home/hduser/hive-0.12.0 目录。

修改环境变量 /etc/profile

S ource /etc/profile ，使环境变量生效。

3 、配置文件

在目录 <HIVE_HOME>/conf 目录下有 4 个模板文件：

1	hive-default.xml.template

2	hive-env.sh.template

3	hive-exec-log4j.properties.template

4	hive-log4j.properties.template

copy 生成四个配置文件然后既可自定义相关属性：

$ copy hive-default.xml.template hive-site.xml

$ copy hive-env.sh.template hive-env.sh

$ copy hive-exec-log4j.properties.template hive-exec-log4j.properties

$ copy hive-log4j.properties.template hive-log4j.properties

不过官方 0.12.0 的发布版本中的 hive-default.xml.template 中有 bug ，在 2000 行：

命令行 schematool -dbType derby –initSchema ，初始化知识库。

查看初始化后的信息： schematool -dbType derby – info

修改配置文件

第一次运行前先将 hive.metastore.schema.verification 设为 false

1 ......

2

3 <name>hive.metastore.schema.verification</name>

4 <value>false</value>

知识库创业完成后，修改 ConnectionURL ，将 create=true 改为 create=false ，避免每次启动 hive 都要重建一次元数据库。

修改 bind.host ，原值为 localhost ，修改为 ip 地址。

配置 h dfs 中得目录和权限

1 $ hdfs dfs -mkdir /tmp

2 $ hdfs dfs -mkdir /user/hive

2 $ hdfs dfs -mkdir /user/hive/warehouse

3 $ hdfs dfs -chmod g+w /tmp

4 $ hdfs dfs -chmod g+w /user/hive/warehouse

这里不是在 linux 下建立的目录，而是 hadoop hdfs 下的目录。

测试 Hive ，命令行 hive 。

启动 hiveserver2

H ive – service hiveserver2

用 hive 自带的 beeline 工具查询数据

$C:\Users\hu\AppData\Roaming\Foxmail7\Temp-14596-20140302223442\image002(03-03-00-29-48).jpg$

在 Smartbi 中连接 hive

首先替换 lib 中的 hive jdbc 驱动。

页面树结构

评论

黄潮勇发表：

关注我们

服务支持

页面树结构

Ubuntu hadoop 2.2+hive 0.12安装配置流程

评论

黄潮勇 发表：

黄潮勇发表：