1. 简介
Kops是一个用来创建生产级别的kubernetes集群的工具。他被戏称为Kubernetes the Easy Way。我们可以认为他是管理kubernetes集群的kubectl。他是使用命令行的方式,帮助我们创建,销毁,升级并且维护生产级别的,高可用的kubernetes集群。目前AWS是官方支持的,GCE和Openstack是在beta版本,Vmware是在alpha版本。
写这篇文章的时候是2020年3月17日,稳定版本是1.16,最新版是1.17-beta.1。也就是说,截止当前日志,kops依然托管在github的kubernetes账户下,算是kubernetes对AWS的官方支持,但是AWS却在发展自己的产品EKS和fargate。
从kubernetes的角度出发,他是在开发一款支持其他云平台(公有云,私有云)的一键部署工具。但是其他势力也在暗流涌动,从我接触AWS的SA的经验来看,如果有机会,AWS的SA还是会向客户推荐EKS(因为EKS刚进入中国,但是Fargate在中国区还没落地)。
2. 功能
下面是github上写的,我给大家解释一下
-
Automates the provisioning of Kubernetes clusters in AWS, OpenStack and GCE
自动配置,就是说你跑上命令就可以去喝茶了
-
Deploys Highly Available (HA) Kubernetes Masters
可以配置高可用的kubernetes集群,就是可以配置多master节点的集群
-
Built on a state-sync model for dry-runs and automatic idempotency
可以进行同步状态的试运行(dry-run),自动的idempotency,这个翻译成中文叫幂等性,这样非常容易混淆,其实我们叫他非线性增长更贴切,也就是说,如果你创建一个节点用1分钟,如果创建30个节点,不是30分钟,而是10分钟,这样的增长就叫idempotency。在kubeadm的中也有这么一段代码是讲这个的,有兴趣的可以深入研究一下。
-
Ability to generate Terraform
可以生成Terraform的代码,Terraform是一个IaC的工具,我们架构师的课程会涉及到用代码的方式实现基础架构,Terraform就是一个非常好用的工具。如果没有这个经验的话,我们可以理解他为Ansible的play-book
-
Supports custom Kubernetes add-ons
支持自定义的kubernetes插件,比如CoreDNS之类的
-
Command line autocompletion
命令行自动补全,和bash的自动补全是一样的
-
YAML Manifest Based API Configuration
配置文件是Yaml格式
-
Templating and dry-run modes for creating Manifests
在创建资源清单的时候,支持模板和测试,这个和前面的dry-run不一样,这个是清单文件的测试,上面那个是云资源的dry-run
-
Choose from eight different CNI Networking providers out-of-the-box
可以选择8个不同的CNI网络插件
-
Supports upgrading from kube-up
可以使用kube-up升级集群
-
Capability to add containers, as hooks, and files to nodes via a cluster manifest
可以在创建集群的时候,添加自定义的容器,关系(谁先谁后之类的)和文件到节点上
3 在AWS上安装集群
3.1. 配置AWS CLI和kubectl
首先得有个AWS账户,在本机上安装AWS CLI。
还要安装kubectl命令,点这里
3.2. 安装Kops
我这边使用的是MacOS,所以使用brew安装最方便。
$ brew update && brew install kops
Linux看这个
curl -LO https://github.com/kubernetes/kops/releases/download/$(curl -s https://api.github.com/repos/kubernetes/kops/releases/latest | grep tag_name | cut -d '"' -f 4)/kops-linux-amd64
chmod +x kops-linux-amd64
sudo mv kops-linux-amd64 /usr/local/bin/kops
Windows的这里下载,点点点就好了
3.3. 配置 AWS Route53
- 注意:中国区是没有Route53的,虽然马上就要有了。但是,kops已经支持gossip-based cluster,只要集群名称是.k8s.local结尾,那么就会跳过DNS检测。如果想用Route53服务的请参考官方文档
3.4. 创建S3存储,存放集群信息
$ aws s3 mb s3://clusters.prod.k8s.local
make_bucket: clusters.prod.k8s.local
设置环境变量,为我们下面的步骤做准备
export KOPS_STATE_STORE=s3://clusters.prod.k8s.local
3.5. 创建集群配置
这一步并不是实际创建集群,也就是前面特性中的dry-run,后面的update才是创建集群
kops create cluster \
--name=clusters.prod.k8s.local \
--zones=ap-south-1c \
--master-count=3 \
--master-size="t3.large" \
--node-count=2 \
--node-size="t3.large" \
--networking=calico \
--ssh-public-key="~/.ssh/id_rsa.pub"
列出集群
kops get cluster
编辑集群
kops edit cluster cluster.kubernetes.cloudnative.com
3.6. 在AWS上创建集群
kops update cluster clusters.prod.k8s.local --yes
.
.
.
Cluster is starting. It should be ready in a few minutes.
Suggestions:
* validate cluster: kops validate cluster
* list nodes: kubectl get nodes --show-labels
* ssh to the master: ssh -i ~/.ssh/id_rsa admin@api.clusters.prod.k8s.local
* the admin user is specific to Debian. If not using Debian please use the appropriate user based on your OS.
* read about installing addons at: https://github.com/kubernetes/kops/blob/master/docs/operations/addons.md.
查看下instance
aws ec2 describe-instances --query "Reservations[*].Instances[*].{PublicIP:PublicIpAddress,Name:Tags[?Key=='Name']|[0].Value,Status:State.Name}" --filters Name=instance-state-name,Values=running --output table
---------------------------------------------------------------------------------------
| DescribeInstances |
+-------------------------------------------------------+-----------------+-----------+
| Name | PublicIP | Status |
+-------------------------------------------------------+-----------------+-----------+
| master-ap-south-1c-2.masters.clusters.prod.k8s.local | 13.235.54.117 | running |
| master-ap-south-1c-3.masters.clusters.prod.k8s.local | 13.234.10.56 | running |
| master-ap-south-1c-1.masters.clusters.prod.k8s.local | 13.235.53.61 | running |
| None | 52.66.233.129 | running |
| nodes.clusters.prod.k8s.local | 13.235.53.80 | running |
| nodes.clusters.prod.k8s.local | 13.235.214.165 | running |
+-------------------------------------------------------+-----------------+-----------+
登录到master上看下状态
admin@ip-172-20-37-171:~$ sudo -i
root@ip-172-20-37-171:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-8b55685cc-5w2rh 1/1 Running 0 10m
kube-system calico-node-47pk5 1/1 Running 0 9m3s
kube-system calico-node-fvxgn 1/1 Running 0 8m51s
kube-system calico-node-pbp42 1/1 Running 0 8m58s
kube-system calico-node-vmkbc 1/1 Running 0 9m54s
kube-system calico-node-w88kr 1/1 Running 0 10m
kube-system dns-controller-5769c5f8b6-v5bpt 1/1 Running 0 10m
kube-system etcd-manager-events-ip-172-20-37-171.ap-south-1.compute.internal 1/1 Running 0 8m58s
kube-system etcd-manager-events-ip-172-20-53-146.ap-south-1.compute.internal 1/1 Running 0 7m59s
kube-system etcd-manager-events-ip-172-20-55-51.ap-south-1.compute.internal 1/1 Running 0 9m18s
kube-system etcd-manager-main-ip-172-20-37-171.ap-south-1.compute.internal 1/1 Running 0 9m10s
kube-system etcd-manager-main-ip-172-20-53-146.ap-south-1.compute.internal 1/1 Running 0 8m57s
kube-system etcd-manager-main-ip-172-20-55-51.ap-south-1.compute.internal 1/1 Running 0 9m21s
kube-system kops-controller-lffj8 1/1 Running 0 8m43s
kube-system kops-controller-mq4x7 1/1 Running 0 9m1s
kube-system kops-controller-nv2g5 1/1 Running 0 9m13s
kube-system kube-apiserver-ip-172-20-37-171.ap-south-1.compute.internal 1/1 Running 3 8m43s
kube-system kube-apiserver-ip-172-20-53-146.ap-south-1.compute.internal 1/1 Running 4 7m42s
kube-system kube-apiserver-ip-172-20-55-51.ap-south-1.compute.internal 1/1 Running 2 9m38s
kube-system kube-controller-manager-ip-172-20-37-171.ap-south-1.compute.internal 1/1 Running 0 9m7s
kube-system kube-controller-manager-ip-172-20-53-146.ap-south-1.compute.internal 1/1 Running 0 8m18s
kube-system kube-controller-manager-ip-172-20-55-51.ap-south-1.compute.internal 1/1 Running 0 9m39s
kube-system kube-dns-autoscaler-594dcb44b5-6vjwx 1/1 Running 0 10m
kube-system kube-dns-b84c667f4-4jmhx 3/3 Running 0 8m24s
kube-system kube-dns-b84c667f4-jfp7h 3/3 Running 0 10m
kube-system kube-proxy-ip-172-20-35-53.ap-south-1.compute.internal 1/1 Running 0 8m48s
kube-system kube-proxy-ip-172-20-37-171.ap-south-1.compute.internal 1/1 Running 0 9m22s
kube-system kube-proxy-ip-172-20-45-175.ap-south-1.compute.internal 1/1 Running 0 8m10s
kube-system kube-proxy-ip-172-20-53-146.ap-south-1.compute.internal 1/1 Running 0 7m53s
kube-system kube-proxy-ip-172-20-55-51.ap-south-1.compute.internal 1/1 Running 0 8m48s
kube-system kube-scheduler-ip-172-20-37-171.ap-south-1.compute.internal 1/1 Running 0 9m
kube-system kube-scheduler-ip-172-20-53-146.ap-south-1.compute.internal 1/1 Running 0 8m34s
kube-system kube-scheduler-ip-172-20-55-51.ap-south-1.compute.internal 1/1 Running 0 8m44s
-
注意1:集群所使用的AMI,也就是镜像,debian-jessie在中国区并没有,想用的话就要从国际区拉一个过来,我们可以使用CoreOS来代替
-
注意2:国内区创建集群的时候,会去找grc.io的镜像仓库,这个非常慢,我们有两种思路,一个是做个假的docker仓库,把镜像拉下来,在push上去,然后在dns里面写一个假的解析,吧gcr.io指向伪造的镜像仓库。第二种就是自定义镜像仓库,我们用到的镜像在这里
gcr.io/google_containers/etcd gcr.io/google_containers/pause-amd64 gcr.io/google_containers/cluster-proportional-autoscaler-amd64 gcr.io/google_containers/k8s-dns-kube-dns-amd64 gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64 gcr.io/google_containers/k8s-dns-sidecar-amd64 gcr.io/google_containers/kubedns-amd64 gcr.io/google_containers/k8s-dns-dnsmasq-amd64 gcr.io/google_containers/dnsmasq-metrics-amd64 gcr.io/google_containers/exechealthz-amd64
3.7. 删除集群
kops delete cluster cluster.kubernetes.cloudnative.com --yes