Cloudera CDH 大数据集成环境搭建,第一部分: 制作 Ubuntu 16.04 xenial cloudera CDH 5.12.1可安装ISO镜像文件
一.制作CDH的可安装ISO文件.(所有操作都在root权限下)
1. 首先在宿主系统里安装一些必要的软件.
1 | apt-get install squashfs-tools genisoimage |
2. 在宿主机上(物理或者虚拟机上安装Ubuntu 16.04以上版本),把ubuntu-16.04 的ISO文件在线下载或者下载好了传到 /tmp/cdh 目录, 然后执行以下命令.
1 2 3 4 5 6 | mkdir -p /tmp/cdh/iso cd /tmp/cdh wget http://cn.releases.ubuntu.com/xenial/ubuntu-16.04.3-server-amd64.iso mount -o loop ubuntu-16.04.3-server-amd64.iso iso cp -rp iso edit umount iso |
3.解压filesystem.squashfs文件解压到当前目录,解压出来的目录为squashfs-root,可以将需要的其他文件或者软件复制到 squashfs-root 目录中的相应位置.
1 | unsquashfs edit/install/filesystem.squashfs |
4.切换到解压后的文件系统中进行相关定制.
1 2 3 4 | chroot squashfs-root mount -t proc none /proc mount -t sysfs none /sys mount -t devpts none /dev/pts |
5.进入到临时的文件系统中后,首先进行一些必要的设置.
1 2 3 4 5 | hostname localhost source /etc/profile echo 'nameserver 8.8.8.8' > /etc/resolv.conf echo 'localhost' > /etc/hostname echo '127.0.0.1 localhost' > /etc/hosts |
6.在新的系统中安装一些需要定制的软件以及配置.
1 2 3 4 5 6 | apt-get update -y apt-get install -y bzip2 subversion git autoconf automake libtool cmake libncurses5-dev apt-get install -y libssl-dev build-essential unzip vim git ftp lsof nmap tcpdump wget curl apt-get install -y mysql-server ntp libsqlite3-0 libsqlite3-dev sqlite3 ntpdate rpcbind apt-get install -y ntp lsb-base psmisc libsasl2-modules libsasl2-modules-gssapi-mit zlib1g apt-get install -y libxslt1.1 libsqlite3-0 libfuse2 fuse openssh-server net-tools |
7.添加CDH的key,下载CDH官网的源.
1 2 3 | curl -s https://archive.cloudera.com/cdh5/ubuntu/xenial/amd64/cdh/archive.key | sudo apt-key add - cd /etc/apt/sources.list.d wget http://archive.cloudera.com/cm5/ubuntu/xenial/amd64/cm/cloudera.list |
8.下载安装CDH manager.
1 2 3 | apt-get update apt-get install -y cloudera-manager-agent cloudera-manager-daemons apt-get install -y cloudera-manager-server cloudera-manager-server-db-2 |
安装好CDH的manager后在 /opt/目录下会有个 cloudera 的文件夹, 其中 /opt/cloudera/parcel-repo/是放下载好后的 parcel 文件的,这样可以离线安装,不然就得联网,/opt/cloudera/csd/ 文件夹下放的是第三方包的元数据jar包,让CDH Manager在后台可以找到.
9.下载CDH的相关包,其中 cloudera-manager-xenial-cm5.12.1_amd64.tar.gz 这个包可以不用下载了,已经使用apt安装好了.
1 2 3 4 5 6 7 8 9 | cd /opt/cloudera/parcel-repo wget http://archive.cloudera.com/cdh5/parcels/5.12.1/CDH-5.12.1-1.cdh5.12.1.p0.3-xenial.parcel wget http://archive.cloudera.com/cdh5/parcels/5.12.1/CDH-5.12.1-1.cdh5.12.1.p0.3-xenial.parcel.sha1 # wget http://archive.cloudera.com/cm5/cm/5/cloudera-manager-xenial-cm5.12.1_amd64.tar.gz wget http://archive.cloudera.com/spark2/parcels/2.2.0.cloudera1/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354-xenial.parcel wget http://archive.cloudera.com/spark2/parcels/2.2.0.cloudera1/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354-xenial.parcel.sha1 wget http://archive.cloudera.com/kafka/parcels/latest/KAFKA-2.2.0-1.2.2.0.p0.68-xenial.parcel wget http://archive.cloudera.com/kafka/parcels/latest/KAFKA-2.2.0-1.2.2.0.p0.68-xenial.parcel.sha1 wget http://archive.cloudera.com/cdh5/parcels/5.12.1/manifest.json |
下载完之后必须要将 sha1 文件重命名成后缀为 sha的,不然会验证失败导致从官网重新下载.
1 2 3 | mv CDH-5.12.1-1.cdh5.12.1.p0.3-xenial.parcel.sha1 CDH-5.12.1-1.cdh5.12.1.p0.3-xenial.parcel.sha mv SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354-xenial.parcel.sha1 SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354-xenial.parcel.sha mv KAFKA-2.2.0-1.2.2.0.p0.68-xenial.parcel.sha1 KAFKA-2.2.0-1.2.2.0.p0.68-xenial.parcel.sha |
然后下载相关的第三方扩展csd文件
1 2 3 4 | cd /opt/cloudera/csd wget http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.2.0.cloudera1.jar wget http://archive.cloudera.com/csds/kafka/KAFKA-1.2.0.jar chown cloudera-scm:cloudera-scm /opt/cloudera/ -R #修改相关用户属组 |
10.设置一些重要的服务,注意以下提示中的内容(重要).
1 2 3 4 5 6 | systemctl enable rpcbind #必须要启动的 systemctl enable ntp #必须要启动的 systemctl disable mysql #定制的ISO文件里禁用掉,可选 systemctl disable cloudera-scm-agent #这个必须禁用掉,不然会出现找不到主机等问题 mv /etc/init.d/cloudera-scm-server /usr/local/bin/ #必须移动到其他位置,不然会导致安装失败或出问题 ln -s /etc/init.d/cloudera-scm-agent /usr/local/bin/ |
修改ntp配置文件,改之前先备份.
1 2 | mv /etc/ntp.conf /etc/ntp.conf.default vim /etc/ntp.conf |
添加如下内容
1 2 3 4 5 6 7 | ftfile /var/lib/ntp/drift restrict 127.0.0.1 server 127.127.1.0 prefer fudge 127.127.1.0 stratum 4 broadcast 192.168.0.255 ttl 4 includefile /etc/ntp/crypto/pw keys /etc/ntp/keys |
11.修改mysql的相关配置,可根据自身进行相关设置.
1 | vim /etc/mysql/mysql.conf.d/mysqld.cnf |
mysqld 段添加如下内容
1 2 3 4 | interactive_timeout = 65535 wait_timeout = 65535 max_connections = 5000 max_connect_errors = 6000 |
注释掉只允许登录的限制
1 | # bind-address = 127.0.0.1 |
启动mysql,如果宿主机器上有mysql服务并且在运行状态,则先停止掉,
1 | /etc/init.d/mysql start |
启动过程提示 No directory, logging in with HOME=/ 可以忽略. 登录到系统中,设置允许远程登录信息,mysql密码是在安装的时候输入的密码.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | root@localhost:/# mysql -uroot -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 Server version: 5.7.19-0ubuntu0.16.04.1 (Ubuntu) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use mysql Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select user,host from user; +------------------+-----------+ | user | host | +------------------+-----------+ | debian-sys-maint | localhost | | mysql.session | localhost | | mysql.sys | localhost | | root | localhost | +------------------+-----------+ 4 rows in set (0.00 sec) mysql> update user set host='%' where user='root'; Query OK, 0 rows affected (0.01 sec) Rows matched: 1 Changed: 0 Warnings: 0 |
然后重启mysql,如果提示重启失败等,则强制kill掉相关的mysql进程,然后重新启动.
1 | /etc/init.d/mysql restart |
12.设置jdk以及环境变量.
首先创建自定义目录,将需要的一些软件,脚本放到自定义的目录
1 2 3 | mkdir -p /usr/local/kernelstudio/local #放第三方软件的位置 mkdir -p /usr/local/kernelstudio/lib/jars #cdh集群环境需要用的第三方jar包存放位置 mkdir -p /usr/local/kernelstudio/sbin #自定义脚本位置 |
默认使用apt安装的CDH Manager只会找 /usr/lib/jvm/java-7-oracle-cloudera 目录下的jdk,所以如果不需要安装spark2(安装spark2必须是jdk 1.8版本以上)的话,只需要这样安装jdk即可,这个版本是jdk1.7的,cloudera官方提供的.
1 | apt-get -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold install oracle-j2sdk1.7 |
如果需要使用spark2或者需要jdk1.8以上的环境,则去官网下载jdk最新版本, http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 放到 /usr/local/kernelstudio/local 目录下,我这里下载的是 jdk-8u144-linux-x64.tar.gz,然后解压
1 2 3 | cd /usr/local/kernelstudio/local tar xvf jdk-8u144-linux-x64.tar.gz rm -rf jdk-8u144-linux-x64.tar.gz |
编辑自定义脚本,加入环境变量以及其他配置.
1 | vim /usr/local/kernelstudio/sbin/env.sh |
添加如下内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #!/bin/sh alias ll='ls -alph' export KS_ROOT_DIR="/usr/local/kernelstudio" export KS_DIR_LOCAL="${KS_ROOT_DIR}/local" export KS_DIR_LIB="${KS_ROOT_DIR}/lib" export KS_DIR_SBIN="${KS_ROOT_DIR}/sbin" export KS_APP_JAR_LIB="${KS_ROOT_DIR}/lib/jars" # java export JAVA_HOME=${KS_DIR_LOCAL}/jdk1.8.0_144 # 如果是apt安装的则此处的目录为 /usr/lib/jvm/java-7-oracle-cloudera export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$KS_APP_JAR_LIB:/usr/share/java export PATH=$JAVA_HOME/bin:$PATH |
然后添加脚本到相应的系统环境中
1 2 3 4 | chmod +x /usr/local/kernelstudio/sbin/env.sh echo 'source /usr/local/kernelstudio/sbin/env.sh' >> /etc/profile echo 'source /usr/local/kernelstudio/sbin/env.sh' >> ~/.bashrc source ~/.bashrc #应用脚本的配置,然后执行java -version看是否显示对应java信息 |
如果不是apt安装的jdk,则要执行如下命令
1 2 | mkdir -p /usr/lib/jvm ln -s /usr/local/kernelstudio/local/jdk1.8.0_144 /usr/lib/jvm/java-7-oracle-cloudera |
13. CDH Manager的jar包搜索路径设置.此处不执行,则在导入scm的数据库的时候会提示找不到驱动. CDH默认的数据库驱动搜索目录都在 /usr/share/java 目录下,所以可以下载相应的驱动到此目录,特别注意驱动的文件名,可按照驱动错误提示重命名文件.
1 2 3 4 | cd /usr/share/cmf/lib ln -s ../common_jars/mysql-connector-java-5.1.15.jar . mkdir -p /usr/share/java ln -s /usr/share/cmf/common_jars/mysql-connector-java-5.1.15.jar /usr/share/java/mysql-connector-java.jar #这里只能是这个名称 |
14. 导入CDH Manager的数据以及创建其他相关数据库.看到如下成功的信息则表示可以了.
1 2 3 4 5 6 7 8 | root@localhost:/# /usr/share/cmf/schema/scm_prepare_database.sh mysql -uroot -p --scm-host localhost ks_scm ks_scm kernelstudio Enter database password: JAVA_HOME=/usr/local/kernelstudio/local/jdk1.8.0_144 Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/local/kernelstudio/local/jdk1.8.0_144/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. 2017-09-13 05:15:05,337 [main] INFO com.cloudera.enterprise.dbutil.DbCommandExecutor - Successfully connected to database. All done, your SCM database is configured correctly! |
登录到mysql, 创建CDH 需要用到的相关数据库.
1 2 3 4 5 6 7 8 | create database ks_hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database ks_amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database ks_hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database ks_report DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database ks_activity DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database ks_oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database ks_audit DEFAULT CHARSET utf8 COLLATE utf8_general_ci; create database ks_metadata DEFAULT CHARSET utf8 COLLATE utf8_general_ci; |
15.编辑内核配置脚本,因为CDH运行的时候需要对内核参数进行调整,不然会提示主机运行不良等警告.
1 | vim /usr/local/kernelstudio/sbin/statup-initializer.sh |
添加如下内容,其他的按需进行设置即可
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | #!/bin/sh ################################################################################## # # # This file is part of the kernelstudio cdh package. # # # # (c) 2014-2017 kernelstudio.com # # @author libertyspy <supports@kernelstudio.com> # # @link http://www.kernelstudio.com # # # # For the full copyright and license information, please view the LICENSE file # # that was distributed with this source code. # # # ################################################################################## # setup hostname KS_HOSTNAME='localhost' hostname $KS_HOSTNAME echo $KS_HOSTNAME > /etc/hostname # setup default locale export LC_ALL=C # setup dns echo 'nameserver 8.8.8.8' > /etc/resolv.conf echo 'nameserver 8.8.8.4' >> /etc/resolv.conf echo 'nameserver 114.114.114.114' >> /etc/resolv.conf echo 'nameserver 192.168.0.1' >> /etc/resolv.conf echo 'search lan' >> /etc/resolv.conf # disable kernel hug page if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/transparent_hugepage/enabled fi if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/transparent_hugepage/defrag fi if test -f /proc/sys/vm/swappiness; then echo 10 > /proc/sys/vm/swappiness fi |
添加到 /etc/profile 中
1 2 | chmod +x /usr/local/kernelstudio/sbin/statup-initializer.sh echo 'source /usr/local/kernelstudio/sbin/statup-initializer.sh' >> /etc/profile |
16.设置ssh
允许root远程登录
1 | sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/g' /etc/ssh/sshd_config |
禁用掉ssh的dns查询
1 | echo 'UseDNS no' >> /etc/ssh/sshd_config |
开机启动sshd
1 | systemctl enable ssh |
设置ssh的免密码登录, 此处按照实际需要进行,有一定的系统安全风险性,批量安装的系统能进行root免密码登录.
1 2 3 4 | cd ~ ssh-keygen -t rsa #全部默认,一直回车即可 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys echo 'StrictHostKeyChecking no' >> /etc/ssh/ssh_config |
16.生成当前系统所有安装过的包信息
1 | dpkg-query -W --showformat='${Package} ${Version}\n' > /filesystem.manifest |
17.清理当前系统.
1 2 3 | rm -rf /tmp/* apt-get clean apt-get autoremove |
18.卸载文件系统,如果还有相关的进程还在运行,则强制kill掉,然后退出当前系统,回到宿主系统中
1 2 3 4 | umount /proc || umount -lf /proc umount /sys umount /dev/pts exit |
19.制作定制后的squashfs文件系统.
1 2 3 4 | mv squashfs-root/filesystem.manifest edit/install/ chmod +w edit/install/filesystem.manifest rm edit/install/filesystem.squashfs mksquashfs squashfs-root edit/install/filesystem.squashfs |
20.定制自动安装配置
编辑安装界面菜单配置
1 | vim edit/isolinux/txt.cfg |
添加完成后如下( /cdrom/preseed/kernelstudio-ubuntu-server-autoinstall.seed 指定的是自动化安装的配置文件名称以及路径 ),默认自动安装
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | default autoinstall label autoinstall menu label ^Auto install Kernelstudio CDH 5.12.1 kernel /install/vmlinuz append file=/cdrom/preseed/kernelstudio-ubuntu-server-autoinstall.seed debian-installer/locale=en_US console-setup/layoutcode=us keyboard-configuration/layoutcode=us console-setup/ask_detect=false localechooser/translation/warn-light=true localechooser/translation/warn-severe=true initrd=/install/initrd.gz root=/dev/ram rw quiet label install menu label ^Install Ubuntu Server kernel /install/vmlinuz append file=/cdrom/preseed/ubuntu-server.seed vga=788 initrd=/install/initrd.gz quiet --- label maas menu label ^Install MAAS Region Controller kernel /install/vmlinuz append modules=maas-region-udeb vga=788 initrd=/install/initrd.gz quiet --- label maasrack menu label ^Install MAAS Rack Controller kernel /install/vmlinuz append modules=maas-rack-udeb vga=788 initrd=/install/initrd.gz quiet --- label check menu label ^Check disc for defects kernel /install/vmlinuz append MENU=/bin/cdrom-checker-menu vga=788 initrd=/install/initrd.gz quiet --- label memtest menu label Test ^memory kernel /install/mt86plus label hd menu label ^Boot from first hard disk localboot 0x80 |
21. 编辑自动化安装配置
1 | vim edit/preseed/kernelstudio-ubuntu-server-autoinstall.seed |
添加完之后如下,可根据需要进行定制
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | d-i auto-install/enable boolean true #locate d-i debian-installer/locale string en_US d-i debian-installer/language string en d-i debian-installer/country string us d-i localechooser/supported-locales multiselect en_US.UTF-8, zh_CN.UTF-8 #keyboard d-i console-setup/ask_detect boolean false d-i console-configuration/layoutcode string us d-i keyboard-configuration/modelcode string SKIP #network d-i netcfg/choose_interface select auto d-i netcfg/dhcp_failed note d-i netcfg/dhcp_options select Do not configure the network at this time d-i netcfg/get_hostname string localhost d-i netcfg/get_domain string localhost d-i netcfg/wireless_wep string # Mirror d-i mirror/protocol string http d-i mirror/country string china d-i mirror/http/hostname string mirrors.163.com d-i mirror/http/directory string /ubuntu d-i mirror/http/proxy string # Clock and time zone setup d-i clock-setup/utc boolean false d-i time/zone string Asia/Shanghai # partition 这里设置了lvm,方便以后做扩容 d-i partman-auto/method string lvm d-i partman-lvm/device_remove_lvm boolean true d-i partman-md/device_remove_md boolean true d-i partman-lvm/confirm boolean true d-i partman-auto-lvm/guided_size string max d-i partman-auto/choose_recipe select atomic d-i partman/confirm_write_new_label boolean true d-i partman/choose_partition select finish # 这里在自动设置好分区写入磁盘之前进行手工确认,以免造成数据损失 d-i partman/confirm boolean true d-i partman/confirm_nooverwrite boolean true #user #运行root登录并且设置root的密码为 kernelstudio d-i passwd/root-login boolean true d-i passwd/root-password password kernelstudio d-i passwd/root-password-again password kernelstudio #创建新的用户名和密码都为kernelstudio的用户 d-i passwd/make-user boolean true d-i passwd/user-fullname string kernelstudio d-i passwd/username string kernelstudio d-i passwd/user-password password kernelstudio d-i passwd/user-password-again password kernelstudio d-i user-setup/allow-password-weak boolean true d-i user-setup/encrypt-home boolean false #package tasksel tasksel/first multiselect none d-i pkgsel/include string openssh-server build-essential d-i pkgsel/upgrade select none d-i pkgsel/install-language-support boolean true d-i pkgsel/language-packs multiselect en d-i pkgsel/update-policy select none # popularity-contest popularity-contest/participate boolean false d-i pkgsel/updatedb boolean true #grub d-i grub-installer/skip boolean false d-i lilo-installer/skip boolean true d-i grub-installer/grub2_instead_of_grup_legacy boolean true d-i grub-installer/only_debian boolean true d-i grub-installer/with_other_os boolean true # Finish d-i finish-install/keep-consoles boolean true d-i finish-install/reboot_in_progress note d-i cdrom-detect/eject boolean true d-i debian-installer/exit/halt boolean false d-i debian-installer/exit/poweroff boolean false |
21. 重新生成md5sum文件
1 2 3 | cd edit/ rm -rf md5sum.txt find -type f -print0 |xargs -0 md5sum | grep -v isolinux/boot.cat | tee md5sum.txt |
22.如果没有其他定制需要的话,则这时候可以生成最终的ISO文件,此时的工作目录还是在 edit下,注意以下命令最后的点(.)号不能丢.
1 | mkisofs -D -r -V "Kernelstudio CDH 5.12.1" -cache-inodes -J -l -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -o ../kernelstudio-cdh-5.12.1-ubuntu-server-amd64-16.04.iso . |
23. 如果从临时文件系统中切换到宿主系统中,需要执行18步,反之从宿主系统中切到临时文件系统中,需要执行第4步以及可选的第5步,制作好的ISO文件可以在此下载 kernelstudio-cdh-5.12.1-ubuntu-server-amd64-16.04.iso
注: 下载制作好的ISO文件,所有的登录密码都为 kernelstudio