docker ubuntu hadoop

DOCKER – UBUNTO – HADDOP 2.9.1 – R 설치


# 은 root 명령
$ hduser 사용자 명령

DOCKER


설치

UBUNTU 설치


도커에서 설치

UBUNTU 접속 및 컨테이너 확인

필수 패키지 설치

자바 8 설치 (openjdk)

export JAVA_HOME=/usr/lib/jvm/java8
export PATH=$PATH:$JAVA_HOME/bin
export CLASS_PATH="."
```

- source /etc/profile
- java -version

##### 하둡 계정 설정
- apt-get install sudo
- addgroup hadoop
- adduser –ingroup hadoop hduser
- adduser hduser sudo
- groups hduser

##### SSH 설치 및 설정
- # atp-get install ssh
- # atp-get install openssh-server
- # which ssh sshd
- # su hduser
- $ ssh-keygen -t rsa
- $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- $ sudo service ssh start
- $ ssh localhost
- yes 선택
- $ exit

##### 프로토콜 버퍼 설치
- apt-get install autoconf automake libtool curl make g++ unzip
- # wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
- # ./configure
- # make
- # make install
- # ldconfig
- # protoc –version

##### 하둡(Hadoop)2 다운로드 및 압축 해제
- $ cd ~
- $ wget "http://mirror.apache-kr.org/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz"
- $ sudo mkdir /usr/local/hadoop
- $ sudo mv haoop* /usr/local/hadoop
- $ sudo chown -R hduser:hadoop /usr/local/hadoop
- $ tar xvfz hadoop-2.9.1.tar.gz
- $ ln -s hadoop-2.9.1 hadoop

##### 하둡 환경설정 파일 수정
- ./etc/hadoop/hadoop-env.sh

```
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME_WARN_SUPPRESS="TRUE"
export HADOOP_PID_DIR=/usr/local/hadoop/hadoop/pids
```

- vim masters

```
localhost
```

- vim slaves

```
localhost
```

- core-site.xml

```
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9010</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/tmp</value>
    </property>
</configuration>
hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
   <property>
        <name>dfs.namenode.name.dir</name>
        <value>/usr/local/hadoop/data/dfs/namenode</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>/usr/local/hadoop/data/dfs/namesecondary</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/usr/local/hadoop/data/dfs/datanode</value>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>localhost:50070</value>
    </property>
    <property>
        <name>dfs.secondary.http.address</name>
        <value>localhost:50090</value>
    </property>

</configuration>
```

- mapred-site.xml

```
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
```

- yarn-env.xml
 - /etc/profile 이나 ~/.bashrc 에 JAVA_HOME 있으면 설정 불필요

- yarn-site.xml

```
<configuration>                                                    

    <property>                                                         
        <name>yarn.nodemanager.aux-services</name>                         
        <value>mapreduce_shuffle</value>                                   
    </property>                                                        
    <property>                                                         
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> 
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>             
    </property>                                                        
    <property>                                                         
        <name>yarn.nodemanager.local-dirs</name>                           
        <value>/usr/local/hadoop/data/yarn/nm-local-dir</value>            
    </property>                                                        
    <property>                                                         
        <name>yarn.resourcemanager.fs.state-store.uri</name>               
        <value>/usr/local/hadoop/data/yarn/system/rmstore</value>          
        </property>                                                        
    <property>                                                         
        <name>yarn.resourcemanager.hostname</name>                         
        <value>localhost</value>                                           
    </property>                                                        
    <property>                                                         
        <name>yarn.web-proxy.address</name>                                
        <value>0.0.0.0:8089</value>                                        
    </property>                                                        
</configuration>                                                   
```

- 초기화 : ./bin/hdfs namenode – format
- 실행 : ./sbin/start-dfs.sh
- 실행 : ./sbin/start-yarn.sh
- 브라우저에서 확인 : apt-get install w3m
- w3m "http://localhost:50070"

##### 하둡 예제 실행
- ./bin/hdfs dfs -mkdir /user
- ./bin/hdfs dfs -mkdir /user/hadoop
- ./bin/hdfs dfs -mkdir /user/hadoop/conf
- HDFS에 파일 업로드: ./bin/hdfs dfs -put etc/hadoop/hadoop-env.sh /user/hadoop/conf/
- jar 파일 실행 : ./bin/yarn jar share/hadoop/mapreduce/hadoop-mappreduce-exmaple-2.9.1.jar wordcount - /user/hadoop/conf/ output
- HDFD에 저장된 출력 값 확인 : ./bin/hdfs dfs -cat output/part-r-00000 | tail -5

##### RHIPE 설치(R 과 하둡 연결)
- 환경 설정 정리
- /etc/profile

```
export JAVA_HOME=/usr/lib/jvm/java8
export PATH=$PATH:$JAVA_HOME/bin
export CLASS_PATH="."
export PKG_CONFIG_PATH=/usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib
export HADOOP_LIBS=`hdfs classpath | tr -d '*'`
/root/.bashrc
export JAVA_HOME=/usr/lib/jvm/java8
export PATH=$PATH:$JAVA_HOME/bin
export CLASS_PATH="."
```

- /home/hduser/.bashrc

```
export JAVA_HOME=/usr/lib/jvm/java8
export HADOOP_HOME=/usr/local/hadoop/haddop
export PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_LIBS=`hdfs classpath | tr -d '*'`
```

- /etc/profile 에 모두 설정 하면 될 것 같은데..
- /etr/R/Renviron에 환경변수 추가

```
HADOOP_HOME=/usr/local/hadoop/hadoop
HADOOP_BIN=/usr/local/hadoop/hadoop/bin
HADOOP_CONF_DIR=/usr/local/hadoop/hadoop/conf
```

- R 설치
- # apt-get install r-base

##### R에서 작업
- R CMD javareconf (자바 경로 재 설정)
- update.packages()
- install.package("rJava") 설치
- http://ririsdata.blogspot.com/2016/10/rubuntu-java-rjava.html
- install.package("testthat")
- wget http://ml.stat.purdue.edu/rhipebin/Rhipe_0.75.2_hadoop-2.tar.gz
- apt-get install pkg-config
- R CMD INSTALL Rhipe_0.74.0.tar.gz
- R CMD INSTALL Rhipe_0.75.2_hadoocdp-2.tar.gz
- library(Rhipe)
- rhinit()