尽管 CDH 已经大大的方便了 HADOOP 的管理, 它的官方文档的已经很详尽,但有时候遇到的一些报错还是不知所云,一些 bug 和新的特性不能使用.这时就需要阅读,甚至修改 CDH 组件的源码. CDH 的组件跟社区的版本不是一一对应的,包含了自己的补丁和特性.
下面总结了获取 CDH 源码的方法.
1 从 GitHub 仓库
首先想到的都是从 GitHub 上Cloudera 页面去找的源代码,这也是最理想的情况.因为 Git 仓库保留了代码所有的版本信息,用户可以维护自己的源码仓库,并同步官方仓库.
到目前为止,还没有以下的源码:
以下的源码比较齐全:
2 从 Maven 仓库
CDH 的 maven 仓库中提供了每个组件的源码.通过修改下面地址中的组件名称和版本号,现在即可. https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-hdfs/2.5.0-cdh5.3.0/hadoop-hdfs-2.5.0-cdh5.3.0-sources.jar
https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-common/2.5.0-cdh5.3.0/hadoop-common-2.5.0-cdh5.3.0-sources.jar
3 从 tar包仓库
3.1 CDH5
在这个链接 https://archive.cloudera.com/cdh5/cdh/5/ 看到的目录都是以 CDH 的组件的版本来命名。复制目录名 填充http://archive.cloudera.com/cdh5/cdh/5/*******.gz
例如用hadoop-2.6.0-cdh5.14.0
填充得到 http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.14.0.tar.gz
用hive-1.1.0-cdh5.14.0
填充得到 http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.14.0.tar.gz
参考:
解压 tarball 可以发现,里面既包含编译好的二进制,也包括源码:
├── LICENSE.txt
├── NOTICE.txt
├── README.txt
├── bin
├── bin-mapreduce1
├── cloudera
├── etc
├── examples
├── examples-mapreduce1
├── include
├── lib
├── libexec
├── sbin
├── share
└── src
├── BUILDING.txt
├── LICENSE.txt
├── NOTICE.txt
├── README.txt
├── build
├── dev-support
├── hadoop-assemblies
├── hadoop-build-tools
├── hadoop-client
├── hadoop-common-project
├── hadoop-dist
├── hadoop-hdfs-project
├── hadoop-mapreduce-project
├── hadoop-mapreduce1-project
├── hadoop-maven-plugins
├── hadoop-minicluster
├── hadoop-project
├── hadoop-project-dist
├── hadoop-tools
├── hadoop-yarn-project
└── pom.xml
42 directories, 88 files
3.2 CDH6
Tarball installation of Cloudera Manager is deprecated as of Cloudera Manager 5.9.0 and will be removed in version 6.0.0.
在https://archive.cloudera.com/cdh6/ 可以发现, CDH6 不再提供 tarball. 只提供 parcel 和二进制包(yum apt)两种安装方式.
参考:https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_6_download.html