一般kubernetes集群搭建的方式有kubeadm(官方推荐),二进制搭建,minikube等方式,本文使用官方推荐的kubeadm搭建。

基础环境配置

准备四台虚拟机

Kubernetes Master01 192.168.100.11 kube-master-01 master
Kubernetes Minion01 192.168.100.12 kube-minion-01 minion
Kubernetes Minion02 192.168.100.13 kube-minion-02 minion
Kubernetes Minion03 192.168.100.14 kube-minion-03 minion

配置hosts文件

1
2
3
4
5
6
cat >> /etc/hosts<<EOF
192.168.100.11  kube-master-01
192.168.100.12  kube-minion-01
192.168.100.13  kube-minion-02
192.168.100.14  kube-minion-03
EOF

修改hostname文件

sudo hostnamectl set-hostname

关闭系统防火墙

1
2
systemctl stop firewalld
systemctl disable firewalld

禁用swap内存交换

1
2
3
swapoff -a
echo "swapoff -a" >>/etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local

注意:或开机禁用swap: 编辑/etc/fstab –> 注释掉swap 分区

关闭selinux服务

临时关闭:setenforce 0 永久关闭:vi /etc/selinux/config
将SELINUX=enforcing改为SELINUX=disabled 设置后需要重启才能生效,命令如下:
sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/sysconfig/selinux

配置iptable管理ipv4/6请求

1
2
3
4
5
6
sudo echo "1" > /proc/sys/net/ipv4/ip_forward
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

执行 sysctl –system 使配置生效

校对系统时间

1
2
3
yum -y install ntp
systemctl start ntpd
systemctl enable ntpd

集群环境配置

安装docker服务

配置源wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo

安装docker-ce容器服务:yum -y install docker-ce-18.06.1.ce-3.el7
查看docker版本号:docker –version和详细信息:docker info

添加开机自启动和启动服务:systemctl enable docker && systemctl start docker
修改docker启动参数:

1
2
3
4
5
6
cat  > /etc/docker/daemon.json  <<EOF
{
     "registry-mirrors": ["https://yywkvob3.mirror.aliyuncs.com"],
     "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF


修改docker的启动服务脚本docker.service:
在[Service]节点下增加:ExecStartPost=/sbin/iptables -I FORWARD -s 0.0.0.0/0 -j ACCEPT
修改完成使用:systemctl daemon-reload && systemctl restart docker重启服务

安装Kubernetes组件

配置源:

1
2
3
4
5
6
7
8
9
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

安装组件:yum install -y kubelet kubeadm kubectl

配置启动kubelet 组件

配置kubelet使用国内pause镜像和配置kubelet的cgroups:
cgroups要和docker的配置一样,使用:dokcer info可查看

1
2
3
4
5
6
cat >/etc/sysconfig/kubelet<<EOF
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd
--pod-infra-container-image=k8s.gcr.io/pause:3.1 
--runtime-cgroups=/systemd/system.slice
--kubelet-cgroups=/systemd/system.slice
EOF

使配置生效:systemctl daemon-reload
添加自启动:systemctl enable kubelet

配置Master节点

在master节点上创建初始化脚本:vi /etc/kubernetes/kubeadm-init.sh

1
2
3
4
kubeadm init \
--kubernetes-version=v1.16.0 \
--pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address=192.168.100.11

修改脚本权限:chmod +x /etc/kubernetes/kubeadm-init.sh

初始化Master节点

由于初始化时会从k8s.gcr.io拉取镜像,该镜像被墙,我们手动从国内镜像源拉取
首选我们:kubeadm config images list查看需要手动拉取镜像资源

在master节点上创建拉取镜像脚本:vi /etc/kubernetes/kubeadm-pull.sh

1
2
3
4
5
6
7
8
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.16.0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.16.0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.16.0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.16.0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.3.15-0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.6.2
docker pull quay.io/coreos/flannel:v0.11.0-amd64

修改脚本权限:chmod +x /etc/kubernetes/kubeadm-pull.sh
执行拉取镜像脚本:/etc/kubernetes/kubeadm-pull.s

然后镜像拉取完成后我们需要打tag为k8s.gcr.io,让初始化时不在拉不到镜像
在master节点上创建打标镜像脚本:vi /etc/kubernetes/kubeadm-tags.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.16.0 k8s.gcr.io/kube-apiserver:v1.16.0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.16.0 k8s.gcr.io/kube-controller-manager:v1.16.0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.16.0 k8s.gcr.io/kube-scheduler:v1.16.0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.16.0 k8s.gcr.io/kube-proxy:v1.16.0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.3.15-0 k8s.gcr.io/etcd:3.3.15-0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.6.2 k8s.gcr.io/coredns:1.6.2
docker tag quay.io/coreos/flannel:v0.11.0-amd64 k8s.gcr.io/flannel:v0.11.0

docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.16.0
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.16.0
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.16.0
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.16.0
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.3.15-0
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.6.2
docker rmi quay.io/coreos/flannel:v0.11.0-amd64

修改脚本权限:chmod +x /etc/kubernetes/kubeadm-tags.sh
执行打标镜像脚本:/etc/kubernetes/kubeadm-tags.sh

执行/etc/kubernetes/kubeadm-init.sh此时会初始化
注意:如果初始化过程出现问题,使用如下命令重置:

1
2
kubeadm reset
rm -rf /var/lib/cni/ $HOME/.kube/config

初始化成功如下图:

1
2
kubeadm join 192.168.100.11:6443 --token orpb71.4ntdi3oq3ct9fmap --discovery-token-ca-cert-hash
sha256:c392f20abfc6f58da1140a7112a68bf29e68322bb96397c2ffdb7589079bc512

上面这一句是给其他节点加入集群用的,要保存下来,后面要用。
配置master上通过 kubectl 管理集群,执行下面的命令:

1
2
3
4
rm -rf $HOME/.kube
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

使用kubectl get nodes查看刚初始化的主节点信息:

我们看到master节点的状态时未就绪状态,需要配置使用网络flannel插件:
下载flannel配置文件:
wget -P /etc/kubernetes/conf
https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
修改下载的flannel.yml文件,删除多余部分,并指定网卡信息:

启动flannel组件:kubectl apply -f /etc/kubernetes/conf/kube-flannel.yml

使用kubeadm初始化的集群,出于安全考虑Pod不会被调度到Master Node上,可使用如下命令使Master节点参与工作负载:
kubectl taint nodes –all node-role.kubernetes.io/master-

加入各Node节点

首先要给节点拉取镜像:

1
2
3
4
5
6
7
8
9
10
11
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.16.0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
docker pull quay.io/coreos/flannel:v0.11.0-amd64

docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.16.0 k8s.gcr.io/kube-proxy:v1.16.0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
docker tag quay.io/coreos/flannel:v0.11.0-amd64 k8s.gcr.io/flannel:v0.11.0

docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.16.0
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
docker rmi quay.io/coreos/flannel:v0.11.0-amd64

等每个节点机器上都拉取完镜像后,执行下面的加入集群的命令:

1
2
kubeadm join 192.168.100.11:6443 --token orpb71.4ntdi3oq3ct9fmap --discovery-token-ca-cert-hash
sha256:c392f20abfc6f58da1140a7112a68bf29e68322bb96397c2ffdb7589079bc512

部署Kubernetes Web

从kubernetes官方github下载配置文件:
wget https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml

手动从阿里仓库拉取镜像到各个节点上:

1
2
3
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kubernetes-dashboard-amd64:v1.10.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kubernetes-dashboard-amd64:v1.10.1 k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kubernetes-dashboard-amd64:v1.10.1

修改配置文件访问类型为:NodePort

启动webui组件:kubectl apply -f /etc/kubernetes/conf/kubernetes-dashboard.yaml

查看dashboard pod的状态:kubectl get pods -n kube-system

查看端口映射:kubectl get svc -n kube-system

然后通过:https://192.168.100.11:31080/访问

我们看到有两种访问方式,下面我们配置这两种访问方式:
我们创建dashboard用户yaml文件:
vi /etc/kubernetes/conf/kubernetes-dashboard-user.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Create Dashboard Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
name: dashboard-admin-user
namespace: kube-system
---
# Create ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: dashboard-admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: dashboard-admin-user
namespace: kube-system


然后kubectl apply -f /etc/kubernetes/conf/kubernetes-dashboard-user.yaml
完成后执行kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep dashboard-admin-user | awk ‘{print $1}’) 查看token

获取的token即可用来在页面上输入登录:

1
eyJhbGciOiJSUzI1NiIsImtpZCI6IkF3QmIxYmVOYUcweXIxODVTdXhxYmZaZG5aQ2FFTzVod2V3bDlzUS1XeFkifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tcTZwejgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZmM4MGU3MTYtODc3Ny00MmZjLTk2MjQtYmU0NWY5YTI5MjZmIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.NZ7zswny6DO1VkUbXB54b2CFZyNz-IB0nVX9yGOgJP8scAcFU5f6Mvg6AeFnT5Tmw6vdm_B6aXuJouAQEhDwVsYqpa3sI0zzyAfequYs5utWwz_R96gWCLBsrktKNxBpQG2r6JawzWOC3P-vdt1YYgN9jpU5gLo3uyyg0wKYM7KemSPmevqAncXUrm73N-L-4ubKRnYHjuJey1EVnzlSBe0_brV_KRrF5jFiy7Te3ziTmQUa4Z_wgK_yQ_eUoOEMIyu2qNlNfTEr6qdqqczCQo879EXGW4boTHopGQsjlSoI-GUbmrhA9H3H597qKbhmz7cgfA_6lgHpsOeSrBWi0g


此时登录我们发现会出现:the server could not find the requested resource错误

我们查看pod的日志:kubectl logs kubernetes-dashboard-7c54d59f66-lcz2g -n kube-system

通过上面的日志,我们发现有查找heapster服务失败,因为dashboard要显示图表数据需要依赖heapster服务,于是我们部署heapster服务:
下载 heapster 相关 yaml 文件:

1
2
3
4
wget /etc/kubernetes/heapster https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/grafana.yaml
wget /etc/kubernetes/heapster https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/heapster.yaml
wget /etc/kubernetes/heapster https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/influxdb.yaml
wget /etc/kubernetes/heapster https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml


查看需要部署的镜像:

1
2
3
cat grafana.yaml | grep image
cat heapster.yaml | grep image
cat influxdb.yaml | grep image


手动部署heapster相关的镜像:

1
2
3
4
5
6
7
8
9
10
11
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-amd64:v1.5.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-influxdb-amd64:v1.5.2
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-grafana-amd64:v5.0.4

docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-amd64:v1.5.4 k8s.gcr.io/heapster-amd64:v1.5.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-influxdb-amd64:v1.5.2 k8s.gcr.io/heapster-influxdb-amd64:v1.5.2
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-grafana-amd64:v5.0.4 k8s.gcr.io/heapster-grafana-amd64:v5.0.4

docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-amd64:v1.5.4
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-influxdb-amd64:v1.5.2
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-grafana-amd64:v5.0.4

修改yaml文件:
因为k8s高版本的api版本进行了变化,将上面四个yaml文件中的apiVersion: extensions/v1beta1 改为apiVersion: apps/v1
因为kubelet 只在 10250 监听 https 请求,将heapster.yaml中的- –source=kubernetes:https://kubernetes.default 修改为:

1
- --source=kubernetes:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true


修改上面四个yaml文件中的spec节点,增加selector,如下图:

然后在heapster配置文件的当前目录下执行部署:kubectl apply -f .
注意:如果部署发生错误,我们执行kubectl delete -f . 进行回退
Heapster各个组件部署成功如下:

然后我们生成使用config登录的文件:
##将secret中的token使用base64方式进行解码,然后使用变量引用
DASH_TOCKEN=$(kubectl get secret -n kube-system dashboard-admin-token-q6pz8 -o jsonpath={.data.token}|base64 -d)
##创建一个集群
kubectl config set-cluster cluster-admin –server=192.168.100.11:6443 –kubeconfig=/etc/kubernetes/conf/dashbord-admin.conf
##创建一个集群用户,并引用sa的token
kubectl config set-credentials dashboard-admin-user –token=$DASH_TOCKEN –kubeconfig=/etc/kubernetes/conf/dashbord-admin.conf
##创建一个上下文,指定集群名、集群用户名

1
2
kubectl config set-context dashboard-admin-user@cluster-admin --cluster=cluster-admin --user=dashboard-admin-user
--kubeconfig=/etc/kubernetes/conf/dashbord-admin.conf

##设置集群中当前使用的用户
kubectl config use-context dashboard-admin-user@cluster-admin –kubeconfig=/etc/kubernetes/conf/dashbord-admin.conf
然后使用token或生成的文件登录成功如下:

验证集群状态

使用kubectl get nodes -n kube-system -owide 查看节点列表

使用kubectl get pods -n kube-system -owide查看pod列表

使用kubectl get svc -n kube-system -owide 查看服务列表

集群问题解决

初始化集群时异常

问题描述:执行kubeadm init时报出/proc/sys/net/ipv4/ip_forward contents are not set to 1的错误
解决方案:sudo echo “1” > /proc/sys/net/ipv4/ip_forward

安装好网络插件后组件还是NotReady状态

问题描述:安装好网络插件flannel后,node还是NotReady状态,coredns 是padding状态,通过systemctl status kubelet 能看到是cni-flannel版本问题
解决方案:vi /etc/cni/net.d/10-flannel.conflist ,增加”cniVersion”:”0.2.0”,

部署heapster提示版本问题

问题描述:部署heapster 组件提示no matches for king “Deployment” in version “extensions/v1beta1”
![](https://oscimg.oschina.net/oscnet/9a00ca3e6bd3de71d55b81a9fe78df30fea.jpg
解决方案:是因为k8s高版本的api版本进行了变化,将对应的yaml文件中的extensions/v1beta1 改为apiVersion: apps/v1

部署heapster提示selector错误

问题描述:部署heapster 组件提示missing required field “selector” in io.k8s.api.apps.v1.DeploymentSpec的错误

解决方案:修改heapster上面四个yaml文件中的spec节点,增加selector,如下图:

Dashboard访问异常

问题描述:部署完dashboard访问提示the server could not find the requested resource :404
解决方案:这是由于安装的最新的k8s 1.16.0的api不支持dashboard,等待dashboard更新支持,即可,或者先降级到1.15.x版本