54-Kubernetes集群资源监控-监控指标和方案

zm-157 / 2023-09-06 / 原文

集群资源监控

1.监控指标

集群监控
- 节点资源利用率（node1/2节点，每个节点cpu用了多少还剩多少）
- 节点数（3个在忙碌1个在空闲状态要考虑节点数的监控）
- 运行pods（1个节点运行了多少pod）
Pod监控
- 容器指标（pod中运行了多少容器）
- 应用程序（看到当前集群的完整情况）

2.监控平台搭建方案

prometheus + Grafana

(1)prometheus

开源的
监控、报警、数据库（把数据存储到自带的数据库中）（时间序列）
以HTTP协议周期性（每隔10分钟或15分钟）抓取被监控组件状态
不需要很复杂的集成过程，使用http接口接入就可以了
定时去搜索或抓取被监控组件状态的数据：当前哪些组件的利用率，他里面通过http协议进行系统监控：
通过http接口就可以接入监控系统，监控当前节点：比如1个master节点，2个node节点不需要做其他的开发
特别适合虚拟化环境：比如docker、虚拟机

(2) Grafana（组件）

这个组件主要做展现：比如通过prometheus抓取到很多状态的数据，然后通过一个页面展现出来，

开源的数据分析和可视化工具（用图表或表格的形式展现）
支持多种数据源（mysql 、infuloDB）

在集群运行环境中会产生各种数据：当前节点的cpu、节点数量、pod容器的相关指标；通过prometheus周期性的去抓取数据：进行存储：为最终做分析展现：Grafana读取prometheus里的数据：分析数据，通过可视化工具展现

55-Kubernetes集群资源监控-搭建监控平台

搭建监控平台

在Grafana里做一些设置让他指定prometheus里的数据源，指定显示的模板，最终通过里面的ip端口号进行访问

第一步部署 prometheus + Grafana

1、可以联网下载二进制的prometheus的安装包

2、用yaml文件直接部署

configmay.yaml # 存储
prometheus.deploy.yaml # 部署
prometheus.svc.yml # 对外暴露端口
rbac-setup.yaml # 权限

node-exporter.yaml
先查看下这些yaml文件

# vim rbac-setup.yaml

# 为当前这个分配相应的角色包括他的权限，在集群内部默认是不能访问的需要给他分配一些角色，报一些权限


apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]  # 相关的权限
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount  # 服务账号（用他进行访问）
metadata:
  name: prometheus
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: kube-system

# vim configmay.yaml

做一些配置文件相关的配置
apiVersion: v1
kind: ConfigMap  # 建一个configmap 不加密的用于一些配置文件
metadata:
  name: prometheus-config
  namespace: kube-system
data:
  prometheus.yml: |
    global:
      scrape_interval:     15s
      evaluation_interval: 15s
    scrape_configs:

    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https

    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics

    - job_name: 'kubernetes-cadvisor'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

    - job_name: 'kubernetes-services'
      kubernetes_sd_configs:
      - role: service   # 与这个匹配
      metrics_path: /probe
      params:
        module: [http_2xx]
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox-exporter.example.com:9115
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_name

    - job_name: 'kubernetes-ingresses'
      kubernetes_sd_configs:
      - role: ingress
      relabel_configs:
      - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
        regex: (.+);(.+);(.+)
        replacement: ${1}://${2}${3}
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox-exporter.example.com:9115
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_ingress_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_ingress_name]
        target_label: kubernetes_name

    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod    #与当前集群中pod相关内容进行匹配
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)  # 用正则表达式做一些相应内容的匹配
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name

# vim prometheus.deploy.yaml

---
apiVersion: apps/v1beta2
kind: Deployment  # 部署一个Deployment
metadata:
  labels:
    name: prometheus-deployment
  name: prometheus
  namespace: kube-system     # 把prometheus部署到名称空间是 kube-system 系统里面去
                             # 做当前节点的监控，放到 kube-system这个名称空间中
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - image: prom/prometheus:v2.0.0
        name: prometheus
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/prometheus"
        - "--storage.tsdb.retention=24h"
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: "/prometheus"
          name: data
        - mountPath: "/etc/prometheus"
          name: config-volume
#-----------------------------------------------------
# 资源限制
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 500m
            memory: 2500Mi
#----------------------------------------------------
      serviceAccountName: prometheus    
      volumes:     # 用数据卷做了一个数据挂载
      - name: data
        emptyDir: {}
      - name: config-volume
        configMap:
          name: prometheus-config

# vim prometheus.svc.yml

---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus
  namespace: kube-system
spec:
  type: NodePort  # 对外暴露端口，做一个访问
  ports:
  - port: 9090
    targetPort: 9090
    nodePort: 30003
  selector:
    app: prometheus

# vim node-exporter.yaml-apiVersion: apps/v1

kind: DaemonSet  # 部署这么一个守护进程，我当中有很多的节点，每隔段时间其中的数据会不一样，监控也一直在执行，当前和后加入的节点都要执行监控
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
      k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      containers:
      - image: prom/node-exporter
        name: node-exporter
        ports:
        - containerPort: 9100
          protocol: TCP
          name: http
---
# service 对外暴露端口，用他进行访问
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: node-exporter
  name: node-exporter
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 9100
    nodePort: 31672
    protocol: TCP
  type: NodePort
  selector:
    k8s-app: node-exporter

部署守护进程
[root@k8smaster ~]# mkdir pg
[root@k8smaster ~]# rm -rf pg
[root@k8smaster ~]# mkdir pgmonitor
[root@k8smaster ~]# cd pgmonitor
[root@k8smaster pgmonitor]# ls
grafana  node-exporter.yaml  prometheus
[root@k8smaster pgmonitor]# kubectl create -f node-exporter.yaml
service/node-exporter created
error: unable to recognize "node-exporter.yaml": no matches for kind "DaemonSet" in version "extensions/v1beta1"  # 版本不一样给他换一下，换成正式版

[root@k8smaster pgmonitor]# vim node-exporter.yaml

修改的地方   apiVersion: apps/v1

[root@k8smaster pgmonitor]# kubectl create -f node-exporter.yaml
error: error validating "node-exporter.yaml": error validating data: ValidationError(DaemonSet.spec): missing required field "selector" in io.k8s.api.apps.v1.DaemonSetSpec; if you choose to ignore these errors, turn validation off with --validate=false

# 文件中少了 selector   DaemonSetSpec中少了内容

[root@k8smaster pgmonitor]# vim node-exporter.yaml

修改的内容放在第一个spec:下面
spec:
  selector:
    matchLabels:
      k8s-app: node-exporter

[root@k8smaster pgmonitor]# kubectl create -f node-exporter.yaml
daemonset.apps/node-exporter created
The Service "node-exporter" is invalid: spec.ports[0].nodePort: Invalid value: 31672: provided port is already allocated # 有残留文件没删就是这状态

把之前测试时的文件删了

[root@k8smaster pgmonitor]# kubectl delete -f node-exporter.yaml
daemonset.apps "node-exporter" deleted
service "node-exporter" deleted

[root@k8smaster pgmonitor]# kubectl create -f node-exporter.yaml
daemonset.apps/node-exporter created
service/node-exporter created

部署其他yaml文件

部署rbac-setup.yaml

[root@master pgmonitor]# cd prometheus/
[root@master prometheus]# ls
configmap.yaml  prometheus.deploy.yml  prometheus.svc.yml  rbac-setup.yaml
[root@master prometheus]# kubectl create -f rbac-setup.yaml
clusterrole.rbac.authorization.k8s.io/prometheus created
serviceaccount/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created

部署 configmap.yaml

[root@master prometheus]# kubectl create -f configmap.yaml
configmap/prometheus-config created


[root@master prometheus]# kubectl create -f prometheus.deploy.yml

error: unable to recognize "prometheus.deploy.yml": no matches for kind "Deployment" in version "apps/v1beta2" # 报错标签问题给他改一下

[root@master prometheus]# vim prometheus.deploy.yml

改的地方：apiVersion: apps/v1

[root@k8smaster prometheus]# kubectl create -f prometheus.deploy.yml
deployment.apps/prometheus created

部署 prometheus.svc.yml

[root@master prometheus]# kubectl create -f prometheus.svc.yml service/prometheus created


[root@master prometheus]# kubectl get pods -n kube-system #查看，默认在kube-system中

NAME READY STATUS RESTARTS AGE
prometheus-7486bf7f4b-625gs 1/1 Running 0 3m35s

第二步部署 Grafana

先分别查看下这3个文件

grafana-deploy.yaml

apiVersion: apps/v1
kind: Deployment  # 部署一个Deployment应用
metadata:
  name: grafana-core
  namespace: kube-system
  labels:
    app: grafana
    component: core
spec:
  replicas: 1
  selector:
    matchLabels:
      selector:
    matchLabels:
      app: grafana
      component: core
  template:
    metadata:
      labels:
        app: grafana
        component: core
    spec:
      containers:
      - image: grafana/grafana:4.2.0
        name: grafana-core
        imagePullPolicy: IfNotPresent
        # env:
        resources:
          # keep request = limit to keep this container in guaranteed class
          limits:
            cpu: 100m
            memory: 100Mi
          requests:
            cpu: 100m
            memory: 100Mi
        env:
          # The following env variables set up basic auth twith the default admin user and admin password.
          - name: GF_AUTH_BASIC_ENABLED
            value: "true"
          - name: GF_AUTH_ANONYMOUS_ENABLED
            value: "false"
          # - name: GF_AUTH_ANONYMOUS_ORG_ROLE
          #   value: Admin
          # does not really work, because of template variables in exported dashboards:
          # - name: GF_DASHBOARDS_JSON_ENABLED
          #   value: "true"
        readinessProbe:
          httpGet:
            path: /login
            port: 3000
          # initialDelaySeconds: 30
          # timeoutSeconds: 1
        volumeMounts:
        - name: grafana-persistent-storage
          mountPath: /var
      volumes:
      - name: grafana-persistent-storage
        emptyDir: {}

grafana-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: kube-system  都属于kube-system这个名称空间下
  labels:
    app: grafana
    component: core
spec:
  type: NodePort  # 暴露端口
  ports:
    - port: 3000
  selector:
    app: grafana
    component: core

grafana-ing.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
   name: grafana
   namespace: kube-system
spec:
   rules:
   - host: k8s.grafana
     http:
       paths:
       - path: /
         backend:
          serviceName: grafana
          servicePort: 3000   # 对外暴露端口

[root@k8smaster prometheus]# cd ..

[root@master pgmonitor]# cd grafana/
[root@master grafana]# ls
grafana-deploy.yaml  grafana-ing.yaml  grafana-svc.yaml

[root@master grafana]# kubectl create -f grafana-deploy.yaml

error: unable to recognize "grafana-deploy.yaml": no matches for kind "Deployment" in version "extensions/v1beta1" # 需要修改标签

[root@master grafana]# vim grafana-deploy.yaml
改的地方：apiVersion: apps/v1

修改的内容放在第一个spec:下面

spec:
replicas: 1
selector:
matchLabels:
app: grafana
component: core

[root@k8smaster grafana]# kubectl create -f grafana-deploy.yaml
deployment.apps/grafana-core created

[root@master grafana]# kubectl create -f grafana-svc.yaml
service/grafana created

[root@master grafana]# kubectl create -f grafana-ing.yaml
ingress.extensions/grafana created

[root@master grafana]# kubectl get pods -n kube-system

NAME READY STATUS RESTARTS AGE
grafana-core-768b6bf79c-srmlt 1/1 Running 0 2m29s
prometheus-7486bf7f4b-625gs 1/1 Running 0 26m

第三步打开Granfana，配置数据源（就是prometheus），导入显示模板

查看下

[root@k8smaster grafana]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana NodePort 10.104.198.92 <none> 3000:31959/TCP 12h
prometheus NodePort 10.99.212.51 <none> 9090:30003/TCP 12h

详细查看
[root@master grafana]# kubectl get svc -n kube-system -o wide

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
grafana NodePort 10.104.198.92 <none> 3000:31959/TCP 12h app=grafana,component=core
prometheus NodePort 10.99.212.51 <none> 9090:30003/TCP 12h app=prometheus

通过查看端口号访问：

192.168.5.4:31959

默认的用户名和密码都是admin

配置数据源，使用prometheus

设置显示数据的模板 315固定的值

315固定值

name可以自己改

选择当前数据源mydb

现在就能查看到监控数据了

删除yaml文件的命令(这两个命令都可以删除)

kubectl delete -f xxx.yaml

rm xxx.yaml

--------------------------------------------------------------------------------------------------------------------------------

以上单master集群的问题：如果master宕机了，再通过node节点就访问不到了，

至少要有两个master一个做备用，当某个节点挂掉了集群还是能正常使用做到高可用

56-Kubernetes集群搭建-搭建高可用集群（实现过程介绍）

此虚拟ip不配置到具体的节点中，（不管连master1还是master2都先去访问这个虚拟ip）他只是配置一个同网络ip，通过虚拟ip具体分发到master1/2中 load balancer（可以检查master的状态）

通过 load balancer检查节点的状态是否正常，这个过程中要用到keepalived，他可以检查master的状态和配置虚拟ip，下面就来做一个负载均衡的服务器

当不加haproxy直接使用keepalived的时候：虚拟ip要绑定某个节点，在节点中进行执行，当节点正常的时候这里面会有个问题，所有请求都会到这个vip所在的master节点上，他会飘到里面建立一个虚拟网卡（虚拟ip）

加上haproxy 后他会把我的请求平均分配到其他的master节点上，让其他master也加入到我这个请求中来，如果不加haproxy，他都是在当前虚拟ip所在节点中进行请求，把所有请求压力都给某一个节点，加上haproxy可以做到一个负载的效果

keepalived的两个作用

1、配置虚拟ip

2、检查master节点的状态

haproxy 还可以用nginx实现，过程相似 haproxy 下面是一些相关的组件

57-Kubernetes集群搭建-搭建高可用集群（初始化和部署keepalived）

master节点的操作

1.部署keepalived

2.部署haproxy

3.初始化操作

4.安装docker ,网络插件

node节点的操作

加入到集群中

安装docker

网络插件

# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld

# 关闭selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config  # 永久
setenforce 0  # 临时

# 关闭swap
swapoff -a  # 临时
sed -ri 's/.*swap.*/#&/' /etc/fstab    # 永久

# 根据规划设置主机名
hostnamectl set-hostname <hostname>

# 在master添加hosts(这个在master1和master2里面都得创建)
cat >> /etc/hosts << EOF
192.168.44.158    master.k8s.io   k8s-vip
阿里云公网ip   master01.k8s.io master1
阿里云公网ip   master02.k8s.io master2
阿里云公网ip    node01.k8s.io   node1
EOF

# 将桥接的IPv4流量传递到iptables的链
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system  # 生效

# 时间同步
yum install ntpdate -y
ntpdate time.windows.com

所有master节点部署keepalived

安装相关包和keepalived

yum install -y conntrack-tools libseccomp libtool-ltdl

yum install -y keepalived

配置master节点

master1节点配置

cat > /etc/keepalived/keepalived.conf <<EOF 
! Configuration File for keepalived

global_defs {
   router_id k8s
}

vrrp_script check_haproxy {
    script "killall -0 haproxy"
    interval 3
    weight -2
    fall 10
    rise 2
}

vrrp_instance VI_1 {
    state MASTER 
    interface eth0
    virtual_router_id 51
    priority 250
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass ceb1b3ec013d66163d6ab
    }
    virtual_ipaddress {
        47.108.237.230 
    }
    track_script {
        check_haproxy
    }

}
EOF

master2节点配置

cat > /etc/keepalived/keepalived.conf <<EOF 
! Configuration File for keepalived

global_defs {
   router_id k8s
}

vrrp_script check_haproxy {
    script "killall -0 haproxy"
    interval 3
    weight -2
    fall 10
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP 
    interface eth0
    virtual_router_id 51
    priority 200
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass ceb1b3ec013d66163d6ab
    }
    virtual_ipaddress {
        47.108.237.230
    }
    track_script {
        check_haproxy
    }

}
EOF

启动和检查

在两台master节点都执行

# 启动keepalived
$ systemctl start keepalived.service
设置开机启动
$ systemctl enable keepalived.service
# 查看启动状态
$ systemctl status keepalived.service

启动后查看master1的网卡信息

ip a s ens33

这是节点1的操作，节点1和节点2是相同的操作

[root@iZ2vceh9faycach0mrzkh9Z ~]# systemctl stop firewalld
[root@iZ2vceh9faycach0mrzkh9Z ~]# systemctl disable firewalld
[root@iZ2vceh9faycach0mrzkh9Z ~]# sed -i 's/enforcing/disabled/' /etc/selinux/config
[root@iZ2vceh9faycach0mrzkh9Z ~]# setenforce 0
setenforce: SELinux is disabled
[root@iZ2vceh9faycach0mrzkh9Z ~]# swapoff -a
[root@iZ2vceh9faycach0mrzkh9Z ~]# sed -ri 's/.*swap.*/#&/' /etc/fstab
[root@iZ2vceh9faycach0mrzkh9Z ~]# hostnamectl set-hostname master1
[root@iZ2vceh9faycach0mrzkh9Z ~]# hostname
master1
[root@iZ2vceh9faycach0mrzkh9Z ~]# cat >> /etc/hosts << EOF
> 47.108.237.230 master.k8s.io k8s-vip
> 阿里云公网ip master01.k8s.io master1
> 阿里云公网ip  master02.k8s.io master2
> 阿里云公网ip node01.k8s.io node1
> EOF
[root@iZ2vceh9faycach0mrzkh9Z ~]# cat > /etc/sysctl.d/k8s.conf << EOF
> net.bridge.bridge-nf-call-ip6tables = 1
> net.bridge.bridge-nf-call-iptables = 1
> EOF
[root@iZ2vceh9faycach0mrzkh9Z ~]#  sysctl --system
[root@iZ2vceh9faycach0mrzkh9Z ~]#  yum install ntpdate -y
[root@iZ2vceh9faycach0mrzkh9Z ~]# ntpdate time.windows.com
13 Jun 19:03:12 ntpdate[1578]: adjust time server 20.189.79.72 offset -0.006970 sec
[root@iZ2vceh9faycach0mrzkh9Z ~]# yum install -y conntrack-tools libseccomp libtool-ltdl

Complete!
[root@iZ2vceh9faycach0mrzkh9Z ~]# yum install -y keepalived
[root@iZ2vceh9faycach0mrzkh9Z ~]# ifconfig #查看网卡
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.31.197.185  netmask 255.255.240.0  broadcast 172.31.207.255
        inet6 fe80::216:3eff:fe03:4ff8  prefixlen 64  scopeid 0x20<link>
        ether 00:16:3e:03:4f:f8  txqueuelen 1000  (Ethernet)
        RX packets 70543  bytes 103055816 (98.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7790  bytes 1040858 (1016.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@iZ2vceh9faycach0mrzkh9Z ~]# cat > /etc/keepalived/keepalived.conf <<EOF 
> ! Configuration File for keepalived
> 
> global_defs {
>    router_id k8s
> }
> 
> vrrp_script check_haproxy {
>     script "killall -0 haproxy"
>     interval 3
>     weight -2
>     fall 10
>     rise 2
> }
> 
> vrrp_instance VI_1 {
>     state MASTER 
>     interface eth0
>     virtual_router_id 51
>     priority 250
>     advert_int 1
>     authentication {
>         auth_type PASS
>         auth_pass ceb1b3ec013d66163d6ab
>     }
>     virtual_ipaddress {
>         47.108.237.230
>     }
>     track_script {
>         check_haproxy
>     }
> 
> }
> EOF
[root@iZ2vceh9faycach0mrzkh9Z ~]# systemctl start keepalived.service
[root@iZ2vceh9faycach0mrzkh9Z ~]#  systemctl enable keepalived.service
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
[root@iZ2vceh9faycach0mrzkh9Z ~]# systemctl status keepalived.service
[root@iZ2vceh9faycach0mrzkh9Z ~]# ip a s eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:16:3e:03:4f:f8 brd ff:ff:ff:ff:ff:ff
    inet 172.31.197.185/20 brd 172.31.207.255 scope global dynamic eth0
       valid_lft 315357692sec preferred_lft 315357692sec
    inet 47.108.237.230/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe03:4ff8/64 scope link 
       valid_lft forever preferred_lft forever
[root@iZ2vceh9faycach0mrzkh9Z ~]#

部署haproxy

安装

yum install -y haproxy

配置

两台master节点的配置均相同，配置中声明了后端代理的两个master节点服务器，指定了haproxy运行的端口为16443等，因此16443端口为集群的入口

cat > /etc/haproxy/haproxy.cfg << EOF
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #
    log         127.0.0.1 local2
    
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon 
       
    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------  
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000
#---------------------------------------------------------------------
# kubernetes apiserver frontend which proxys to the backends
#--------------------------------------------------------------------- 
frontend kubernetes-apiserver
    mode                 tcp
    bind                 *:16443
    option               tcplog
    default_backend      kubernetes-apiserver    
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend kubernetes-apiserver
    mode        tcp
    balance     roundrobin
    server      master01.k8s.io   47.109.29.143:6443 check
    server      master02.k8s.io   47.109.22.78:6443 check
#---------------------------------------------------------------------
# collection haproxy statistics message
#---------------------------------------------------------------------
listen stats
    bind                 *:1080
    stats auth           admin:awesomePassword
    stats refresh        5s
    stats realm          HAProxy\ Statistics
    stats uri            /admin?stats
EOF

启动和检查

两台master都启动

# 设置开机启动
$ systemctl enable haproxy
# 开启haproxy
$ systemctl start haproxy
# 查看启动状态
$ systemctl status haproxy

检查端口

netstat -lntup|grep haproxy

两台master节点都要的操作

[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# yum install -y haproxy
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# cat > /etc/haproxy/haproxy.cfg << EOF
> #---------------------------------------------------------------------
> # Global settings
> #---------------------------------------------------------------------
> global
>     # to have these messages end up in /var/log/haproxy.log you will
>     # need to:
>     # 1) configure syslog to accept network log events.  This is done
>     #    by adding the '-r' option to the SYSLOGD_OPTIONS in
>     #    /etc/sysconfig/syslog
>     # 2) configure local2 events to go to the /var/log/haproxy.log
>     #   file. A line like the following can be added to
>     #   /etc/sysconfig/syslog
>     #
>     #    local2.*                       /var/log/haproxy.log
>     #
>     log         127.0.0.1 local2
>     
>     chroot      /var/lib/haproxy
>     pidfile     /var/run/haproxy.pid
>     maxconn     4000
>     user        haproxy
>     group       haproxy
>     daemon 
>        
>     # turn on stats unix socket
>     stats socket /var/lib/haproxy/stats
> #---------------------------------------------------------------------
> # common defaults that all the 'listen' and 'backend' sections will
> # use if not designated in their block
> #---------------------------------------------------------------------  
> defaults
>     mode                    http
>     log                     global
>     option                  httplog
>     option                  dontlognull
>     option http-server-close
>     option forwardfor       except 127.0.0.0/8
>     option                  redispatch
>     retries                 3
>     timeout http-request    10s
>     timeout queue           1m
>     timeout connect         10s
>     timeout client          1m
>     timeout server          1m
>     timeout http-keep-alive 10s
>     timeout check           10s
>     maxconn                 3000
> #---------------------------------------------------------------------
> # kubernetes apiserver frontend which proxys to the backends
> #--------------------------------------------------------------------- 
> frontend kubernetes-apiserver
>     mode                 tcp
>     bind                 *:16443
>     option               tcplog
>     default_backend      kubernetes-apiserver    
> #---------------------------------------------------------------------
> # round robin balancing between the various backends
> #---------------------------------------------------------------------
> backend kubernetes-apiserver
>     mode        tcp
>     balance     roundrobin
>     server      master01.k8s.io   47.109.31.67:6443 check
>     server      master02.k8s.io   47.109.23.137:6443 check
> #---------------------------------------------------------------------
> # collection haproxy statistics message
> #---------------------------------------------------------------------
> listen stats
>     bind                 *:1080
>     stats auth           admin:awesomePassword
>     stats refresh        5s
>     stats realm          HAProxy\ Statistics
>     stats uri            /admin?stats
> EOF
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# systemctl enable haproxy
Created symlink from /etc/systemd/system/multi-user.target.wants/haproxy.service to /usr/lib/systemd/system/haproxy.service.
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# systemctl start haproxy
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# systemctl status haproxy
● haproxy.service - HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2021-06-13 21:20:33 CST; 8s ago
 Main PID: 2449 (haproxy-systemd)
   CGroup: /system.slice/haproxy.service
           ├─2449 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
           ├─2450 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
           └─2451 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

Jun 13 21:20:33 master1 systemd[1]: Started HAProxy Load Balancer.
Jun 13 21:20:33 master1 haproxy-systemd-wrapper[2449]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/h
Jun 13 21:20:33 master1 haproxy-systemd-wrapper[2449]: [WARNING] 163/212033 (2450) : config : 'option forwardfor' ignored for frontend 'kubernete
Jun 13 21:20:33 master1 haproxy-systemd-wrapper[2449]: [WARNING] 163/212033 (2450) : config : 'option forwardfor' ignored for backend 'kubernetes
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# netstat -lntup|grep haproxy
tcp        0      0 0.0.0.0:1080            0.0.0.0:*               LISTEN      2451/haproxy        
tcp        0      0 0.0.0.0:16443           0.0.0.0:*               LISTEN      2451/haproxy        
udp        0      0 0.0.0.0:47890           0.0.0.0:*                           2450/haproxy

所有节点安装Docker/kubeadm/kubelet

Kubernetes默认CRI（容器运行时）为Docker，因此先安装Docker。

安装Docker

#yum安装gcc相关环境(需要确保虚拟机可以上外网)
yum -y install gcc
yum -y install gcc-c++

#1.卸载旧的版本
 yum remove docker \
                  docker-client \
                  docker-client-latest \
                  docker-common \
                  docker-latest \
                  docker-latest-logrotate \
                  docker-logrotate \
                  docker-engine
  # 2.需要的安装包
  yum install -y yum-utils
  # 3.设置镜像的仓库
  yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
  
 # 建议安装阿里云 
  yum-config-manager \
    --add-repo \
    http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
#4更新yum软件包索引
 yum makecache fast


# 5安装docker相关的内容   docker-ce 社区   ee企业版
 yum install -y docker-ce docker-ce-cli containerd.io
 #6 启动docker
 systemctl start docker
 #7.使用 docker version 查看是否安装成功


# 8 配置镜像加速器
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://g6yrjrwf.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

#docker info

添加阿里云YUM软件源

$ cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

安装kubeadm，kubelet和kubectl

由于版本更新频繁，这里指定版本号部署：

$  yum install kubelet-1.20.7 kubeadm-1.20.7 kubectl-1.20.7 -y
$ systemctl enable kubelet

master1 master2 node1节点都要有的操作

[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# yum -y install gcc
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# yum -y install gcc-c++
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]#  yum remove docker \
>                   docker-client \
>                   docker-client-latest \
>                   docker-common \
>                   docker-latest \
>                   docker-latest-logrotate \
>                   docker-logrotate \
>                   docker-engine
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]#  yum install -y yum-utils
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# yum-config-manager \
>     --add-repo \
>     http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
Loaded plugins: fastestmirror
adding repo from: http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
grabbing file http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo to /etc/yum.repos.d/docker-ce.repo
repo saved to /etc/yum.repos.d/docker-ce.repo
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# yum makecache fast
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
base                                                                                                                                             
docker-ce-stable                                                                                                                                 
epel                                                                                                                                             
extras                                                                                                                                           
updates                                                                                                                                          
(1/2): docker-ce-stable/7/x86_64/updateinfo                                                                                                      
(2/2): docker-ce-stable/7/x86_64/primary_db                                                                                                      
Metadata Cache Created
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]#  yum install -y docker-ce docker-ce-cli containerd.io
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]#  systemctl start docker
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# sudo mkdir -p /etc/docker
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# sudo tee /etc/docker/daemon.json <<-'EOF'
> {
>   "registry-mirrors": ["https://g6yrjrwf.mirror.aliyuncs.com"]
> }
> EOF
{
  "registry-mirrors": ["https://g6yrjrwf.mirror.aliyuncs.com"]
}
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# sudo systemctl daemon-reload
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# sudo systemctl restart docker
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# $ cat > /etc/yum.repos.d/kubernetes.repo << EOF
> [kubernetes]
> name=Kubernetes
> baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
> enabled=1
> gpgcheck=0
> repo_gpgcheck=0
> gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
> EOF
-bash: $: command not found
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# vim  /etc/docker/daemon.json
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]#  cat > /etc/yum.repos.d/kubernetes.repo << EOF
> [kubernetes]
> name=Kubernetes
> baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
> enabled=1
> gpgcheck=0
> repo_gpgcheck=0
> gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
> EOF
[root@iZ2vc96g79oqyzqf8xj5l3Z ~]# yum install kubelet-1.20.7 kubeadm-1.20.7 kubectl-1.20.7 -y

部署Kubernetes Master

创建kubeadm配置文件

在具有vip的master上操作，这里为master2（我的这里是master2）

 ip a s eth0 #通过这个命令查看具体在哪个master节点上操作

$ mkdir /usr/local/kubernetes/manifests -p

$ cd /usr/local/kubernetes/manifests/

$ vi kubeadm-config.yaml

apiServer:
  certSANs:
    - master1
    - master2
    - master.k8s.io
    - 47.109.20.140
    - 47.109.31.67
    - 47.109.23.137
    - 127.0.0.1
  extraArgs:
    authorization-mode: Node,RBAC
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: "master.k8s.io:16443"
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.20.7
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.1.0.0/16
scheduler: {}

在master2节点执行

$ kubeadm init --config kubeadm-config.yaml

遇到的错误：

W0613 21:46:45.433570   15099 common.go:77] your configuration file uses a deprecated API spec: "kubeadm.k8s.io/v1beta1". Please use 'kubeadm config migrate --old-config old.yaml --new-config new.yaml', which will write the new, similar spec using a newer API version.
this version of kubeadm only supports deploying clusters with the control plane version >= 1.19.0. Current version: v1.16.3
To see the stack trace of this error execute with --v=5 or higher

解决办法修改kubeadm-config.yaml文件：

apiVersion: kubeadm.k8s.io/v1beta2
kubernetesVersion: v1.20.7  #这个修改与自己下载的版本一致

如果遇到改了很多文件，然后越改越错的情况，则使用这个命令kubeadm reset，然后把下面的操作重新执行一次，大多数时候是能成功的

[root@master2 ~]# cd /etc/kubernetes/manifests
[root@master2 manifests]# ls
etcd.yaml  kube-apiserver.yaml  kube-controller-manager.yaml  kube-scheduler.yaml
[root@master2 manifests]# vim kube-controller-manager.yaml
[root@master2 manifests]# vim kube-scheduler.yaml
[root@master2 manifests]# kubectl get cs
[root@master2 manifests]# kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
coredns-7f89b7bc75-bpj72          1/1     Running   0          10h
coredns-7f89b7bc75-z62hl          1/1     Running   0          10h
etcd-master2                      1/1     Running   0          10h
kube-apiserver-master2            1/1     Running   0          10h
kube-controller-manager-master2   1/1     Running   0          2m10s
kube-flannel-ds-xxmlj             1/1     Running   0          10h
kube-proxy-7f5h8                  1/1     Running   0          10h
kube-scheduler-master2            0/1     Running   0          50s
[root@master2 manifests]# kubectl get pods -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
coredns-7f89b7bc75-bpj72          1/1     Running   0          10h
coredns-7f89b7bc75-z62hl          1/1     Running   0          10h
etcd-master2                      1/1     Running   0          10h
kube-apiserver-master2            1/1     Running   0          10h
kube-controller-manager-master2   1/1     Running   0          2m33s
kube-flannel-ds-xxmlj             1/1     Running   0          10h
kube-proxy-7f5h8                  1/1     Running   0          10h
kube-scheduler-master2            1/1     Running   0          73s
[root@master2 manifests]# mkdir flannel
[root@master2 manifests]# cd flannel
[root@master2 flannel]# wget -c https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
[root@master2 flannel]# kubectl apply -f kube-flannel.yml 
[root@master2 flannel]# ssh root@47.109.31.67 mkdir -p /etc/kubernetes/pki/etcd
root@47.109.31.67's password: 
[root@master2 flannel]# scp /etc/kubernetes/admin.conf root@47.109.31.67:/etc/kubernetes
root@47.109.31.67's password: 
admin.conf                                                                                                                                                    100% 5566    10.9MB/s   00:00    
[root@master2 flannel]# scp /etc/kubernetes/pki/{ca.*,sa.*,front-proxy-ca.*} root@47.109.31.67:/etc/kubernetes/pki
root@47.109.31.67's password: 
ca.crt                                                                                                                                                        100% 1066     2.1MB/s   00:00    
ca.key                                                                                                                                                        100% 1675     3.5MB/s   00:00    
sa.key                                                                                                                                                        100% 1679     3.6MB/s   00:00    
sa.pub                                                                                                                                                        100%  451     1.2MB/s   00:00    
front-proxy-ca.crt                                                                                                                                            100% 1078     2.4MB/s   00:00    
front-proxy-ca.key                                                                                                                                            100% 1679     3.8MB/s   00:00    
[root@master2 flannel]# scp /etc/kubernetes/pki/etcd/ca.* root@47.109.31.67:/etc/kubernetes/pki/etcd
root@47.109.31.67's password: 
ca.crt                                                                                                                                                        100% 1058     2.1MB/s   00:00    
ca.key                            
[root@master2 flannel]#  mkdir /usr/local/kubernetes/manifests -p
[root@master2 flannel]# cd /usr/local/kubernetes/manifests/
[root@master2 manifests]# vim kubeadm-config.yaml
[root@master2 manifests]# kubeadm init --config kubeadm-config.yaml
[init] Using Kubernetes version: v1.20.7
[preflight] Running pre-flight checks
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 19.03
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master.k8s.io master1 master2] and IPs [10.1.0.1 172.31.197.188 47.109.20.140 47.109.31.67 47.109.23.137 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master2] and IPs [172.31.197.188 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master2] and IPs [172.31.197.188 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 12.007521 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node master2 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node master2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: rijbv2.50zq7e6zkpixxcg4
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

  kubeadm join master.k8s.io:16443 --token rijbv2.50zq7e6zkpixxcg4 \
    --discovery-token-ca-cert-hash sha256:7b34954f4a26987b9e56da871607a694581120142ae2814c313f50e9c77efc9d \
    --control-plane 

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join master.k8s.io:16443 --token rijbv2.50zq7e6zkpixxcg4 \
    --discovery-token-ca-cert-hash sha256:7b34954f4a26987b9e56da871607a694581120142ae2814c313f50e9c77efc9d 
[root@master2 manifests]#

按照提示配置环境变量，使用kubectl工具：

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ kubectl get nodes
$ kubectl get pods -n kube-system

按照提示保存以下内容，一会要使用：

  kubeadm join master.k8s.io:16443 --token txwxys.lpyfg7ze218akqtw \
    --discovery-token-ca-cert-hash sha256:f2275499d9e26a7ce76745138dfa10fdfc336cf1bee22314c26989fe6deb4372 \
    --control-plane

查看集群状态

kubectl get cs

kubectl get pods -n kube-system

遇到的问题：

scheduler            Unhealthy   Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused   
controller-manager   Unhealthy   Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused

注释掉后的结果

[root@master2 manifests]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS      MESSAGE                                                                                       ERROR
scheduler            Unhealthy   Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused   
controller-manager   Healthy     ok                                                                                            
etcd-0               Healthy     {"health":"true"}                                                                             
[root@master2 manifests]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok                  
controller-manager   Healthy   ok                  
etcd-0               Healthy   {"health":"true"}

安装集群网络

从官方地址获取到flannel的yaml，在master1上执行

mkdir flannel
cd flannel
wget -c https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

安装flannel网络

kubectl apply -f kube-flannel.yml

检查

kubectl get pods -n kube-system

[root@master2 manifests]# mkdir flannel
[root@master2 manifests]# cd flannel
[root@master2 flannel]# wget -c https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
--2021-06-14 09:55:17--  https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4813 (4.7K) [text/plain]
Saving to: ‘kube-flannel.yml’

100%[======================================================================================================================================================>] 4,813       8.77KB/s   in 0.5s   

2021-06-14 09:55:18 (8.77 KB/s) - ‘kube-flannel.yml’ saved [4813/4813]

[root@master2 flannel]# kubectl apply -f kube-flannel.yml 
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
[root@master2 flannel]# kubectl get pods -n kube-system

master1节点加入集群

复制密钥及相关文件

从master1复制密钥及相关文件到master2

# ssh root@47.109.31.67 mkdir -p /etc/kubernetes/pki/etcd

# scp /etc/kubernetes/admin.conf root@47.109.31.67:/etc/kubernetes
   
# scp /etc/kubernetes/pki/{ca.*,sa.*,front-proxy-ca.*} root@47.109.31.67:/etc/kubernetes/pki
   
# scp /etc/kubernetes/pki/etcd/ca.* root@47.109.31.67:/etc/kubernetes/pki/etcd

[root@master2 flannel]# ssh root@47.109.31.67 mkdir -p /etc/kubernetes/pki/etcd
root@47.109.31.67's password: 
[root@master2 flannel]# scp /etc/kubernetes/admin.conf root@47.109.31.67:/etc/kubernetes
root@47.109.31.67's password: 
admin.conf                                                                                                                                                    100% 5570     8.7MB/s   00:00    
[root@master2 flannel]# scp /etc/kubernetes/pki/{ca.*,sa.*,front-proxy-ca.*} root@47.109.31.67:/etc/kubernetes/pki
root@47.109.31.67's password: 
ca.crt                                                                                                                                                        100% 1066     2.2MB/s   00:00    
ca.key                                                                                                                                                        100% 1675     3.6MB/s   00:00    
sa.key                                                                                                                                                        100% 1675     3.7MB/s   00:00    
sa.pub                                                                                                                                                        100%  451     1.1MB/s   00:00    
front-proxy-ca.crt                                                                                                                                            100% 1078     2.6MB/s   00:00    
front-proxy-ca.key                                                                                                                                            100% 1675     4.1MB/s   00:00    
[root@master2 flannel]# scp /etc/kubernetes/pki/etcd/ca.* root@47.109.31.67:/etc/kubernetes/pki/etcd
root@47.109.31.67's password: 
ca.crt                                                                                                                                                        100% 1058     2.3MB/s   00:00    
ca.key

master1节点join遇到的错误

[root@master1 kubernetes]# kubeadm join master.k8s.io:16443 --token rijbv2.50zq7e6zkpixxcg4     --discovery-token-ca-cert-hash sha256:7b34954f4a26987b9e56da871607a694581120142ae2814c313f50e9c77efc9d     --control-plane
[preflight] Running pre-flight checks
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 19.03
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
error execution phase kubelet-start: failed to create directory "/var/lib/kubelet": mkdir /var/lib/kubelet: not a directory

解决办法：

[root@master1 kubernetes]# rm -f /var/lib/kubelet   # 删除
#找到错误的原因，然后修改，之后重启进行相关操作
[root@master1 kubernetes]# kubeadm reset # 这个是重启
#重启之后要在master2节点上重新执行下面这些语句， 
# ssh root@47.109.31.67 mkdir -p /etc/kubernetes/pki/etcd

# scp /etc/kubernetes/admin.conf root@47.109.31.67:/etc/kubernetes
   
# scp /etc/kubernetes/pki/{ca.*,sa.*,front-proxy-ca.*} root@47.109.31.67:/etc/kubernetes/pki
   
# scp /etc/kubernetes/pki/etcd/ca.* root@47.109.31.67:/etc/kubernetes/pki/etcd

执行成功：

[root@master1 kubernetes]# kubeadm join master.k8s.io:16443 --token rijbv2.50zq7e6zkpixxcg4     --discovery-token-ca-cert-hash sha256:7b34954f4a26987b9e56da871607a694581120142ae2814c313f50e9c77efc9d     --control-plane
[preflight] Running pre-flight checks
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 19.03
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master1] and IPs [172.31.197.187 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master1] and IPs [172.31.197.187 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master.k8s.io master1 master2] and IPs [10.1.0.1 172.31.197.187 47.109.20.140 47.109.31.67 47.109.23.137 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node master1 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.

[root@master1 kubernetes]# mkdir -p $HOME/.kube
[root@master1 kubernetes]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@master1 kubernetes]# sudo chown $(id -u):$(id -g) $HOME/.kube/config

在master2节点上查看：

[root@master2 flannel]# kubectl get nodes
NAME      STATUS   ROLES                  AGE   VERSION
master1   Ready    control-plane,master   10m   v1.20.7
master2   Ready    control-plane,master   79m   v1.20.7
[root@master2 flannel]# kubectl get pods --all-namespaces
NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-7f89b7bc75-qcwzx          1/1     Running   0          79m
kube-system   coredns-7f89b7bc75-w2x25          1/1     Running   0          79m
kube-system   etcd-master1                      1/1     Running   0          11m
kube-system   etcd-master2                      1/1     Running   0          79m
kube-system   kube-apiserver-master1            1/1     Running   0          11m
kube-system   kube-apiserver-master2            1/1     Running   0          79m
kube-system   kube-controller-manager-master1   1/1     Running   0          11m
kube-system   kube-controller-manager-master2   1/1     Running   1          79m
kube-system   kube-flannel-ds-hn8bc             1/1     Running   0          46m
kube-system   kube-flannel-ds-tf6kn             1/1     Running   0          11m
kube-system   kube-proxy-722xp                  1/1     Running   0          79m
kube-system   kube-proxy-dxfdx                  1/1     Running   0          11m
kube-system   kube-scheduler-master1            1/1     Running   0          11m
kube-system   kube-scheduler-master2            1/1     Running   1          79m

加入Kubernetes Node

在node1上执行

向集群添加新节点，执行在kubeadm init输出的kubeadm join命令：

kubeadm join master.k8s.io:16443 --token utehrv.1bcxxsecjilacgf5    --discovery-token-ca-cert-hash sha256:7b34954f4a26987b9e56da871607a694581120142ae2814c313f50e9c77efc9d

遇到的错误：

error execution phase preflight: couldn't validate the identity of the API Server: Get "https://master.k8s.io:16443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp: lookup master.k8s.io on 100.100.2.138:53: no such host

#得到token
>kubeadm token create 
#得到discovery-token-ca-cert-hash
> openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'  
也可以直接使用下面个命令，但是依然没有解决上面这个问题
# kubeadm token create --print-join-command

遇到的问题：

    [WARNING Hostname]: hostname "node1" could not be reached
    [WARNING Hostname]: hostname "node1": lookup node1 on 100.100.2.138:53: no such host

解决办法：

遇到的错误：

[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

解决办法

mkdir /etc/docker

# Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ]
}
EOF

mkdir -p /etc/systemd/system/docker.service.d

# Restart Docker
systemctl daemon-reload
systemctl restart docker

遇到的错误（从节点未能加入到Master节点里面，网上查找的方法也没能解决）：

[root@node1 ~]# kubeadm join master.k8s.io:16443 --token k5rf6v.ybl0dndn1l3xs5if     --discovery-token-ca-cert-hash sha256:7b34954f4a26987b9e56da871607a694581120142ae2814c313f50e9c77efc9d
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 19.03
error execution phase preflight: couldn't validate the identity of the API Server: Get "https://master.k8s.io:16443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp: lookup master.k8s.io on 100.100.2.138:53: no such host
To see the stack trace of this error execute with --v=5 or higher

在master2上查看token

[root@master2 flannel]# kubeadm token list
TOKEN                     TTL         EXPIRES                     USAGES                   DESCRIPTION                                                EXTRA GROUPS
cjqgba.30wm6rmr5sug6avq   23h         2021-06-15T11:00:32+08:00   authentication,signing   <none>                                                     system:bootstrappers:kubeadm:default-node-token
rijbv2.50zq7e6zkpixxcg4   21h         2021-06-15T09:21:44+08:00   authentication,signing   <none>                                                     system:bootstrappers:kubeadm:default-node-token
rwnmlr.06ncpu0wtko1q50v   23h         2021-06-15T11:12:10+08:00   authentication,signing   <none>                                                     system:bootstrappers:kubeadm:default-node-token

集群网络重新安装，因为添加了新的node节点，这里的操作相当于在我的master2节点的操作

检查状态

kubectl get node

kubectl get pods --all-namespaces

测试kubernetes集群

在Kubernetes集群中创建一个pod，验证是否正常运行：（相当于在master2节点中进行这些操作）

$ kubectl create deployment nginx --image=nginx
$ kubectl expose deployment nginx --port=80 --type=NodePort
$ kubectl get pod,svc

这里还可以通过虚拟ip进行访问

访问地址：http://NodeIP:Port

B站学习网址：k8s教程由浅入深-尚硅谷_哔哩哔哩_bilibili

https://www.bilibili.com/video/BV1GT4y1A756?p=55&spm_id_from=pageDriver&vd_source=b8d03deb535c0310c92cb2a2bcaa3a28