目录

Operator基础-开始Operator开发

使用 Operator 开发自定义资源类型和控制器.

开始之前

首先声明一点,这不是教程

为什么写这个系列?

出于对 kubernetes 的热爱,对日常简单的使用和操作 kubernetes 已经感到了厌倦,闲暇之余,翻阅了 kubernetes 的相关源码和 Operator 的编程相关书籍,感觉这些新事物很新鲜,想自己尝试一番。这个系列只是我作为初学者入门记录学习 Operator 过程中的一些笔记,启蒙于 《kubebuilder 官方文档》和 《kubernetes operator 进阶》这本书籍。

Operator

Operator 是一种封装、部署和管理 kubernetes 应用的方法。

简单来说,Operator 是由 kubernetes 自定义资源(CRD, Custom Resource Definition)和控制器(Controller)构成的云原生扩展服务。Operator 把部署应用的过程或业务运行过程中需要频繁操作的命令封装起来,运维人员只需要关注应用的配置和应用的期望运行状态即可,无需花费大量的精力在部署和容易出错的命令上面。

举个例子更方便理解,例如 Rdis Operator,只需要关心 size 和 config 即可,部署过程不用关心:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
apiVersion: Redis.io/v1beta1
kind: RedisCluster
metadata:
  name: my-release
spec:
  size: 3
  imagePullPolicy: IfNotPresent
  resources:
    limits:
      cpu: 1000m
      memory: 1Gi
    requests:
      cpu: 1000m
      memory: 1Gi
  config:
    maxclients: "10000"

或者一些自己开发的微服务应用:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: custom.ops/v1
kind: MicroServiceDeploy
metadata:
  name: ms-sample-v1s0
spec:
  msName: "ms-sample"                     # 微服务名称
  fullName: "ms-sample-v1s0"              # 微服务实例名称
  version: "1.0"                          # 微服务实例版本
  path: "v1"                              # 微服务实例的大版本,该字符串将出现在微服务实例的域名中
  image: "just a image url"               # 微服务实例的镜像地址
  replicas: 3                             # 微服务实例的 replica 数量
  autoscaling: true                       # 该微服务是否开启自动扩缩容功能
  needAuth: true                          # 访问该微服务实例时,是否需要租户 base 认证
  config: "password=88888888"             # 该微服务实例的运行时配置项
  creationTimestamp: "1535546718115"      # 该微服务实例的创建时间戳
  resourceRequirements:                   # 该微服务实例要求的机器资源
    limits:                               # 该微服务实例会使用到的最大资源配置
      cpu: "2"
      memory: 4Gi
    requests:                             # 该微服务实例至少要用到的资源配置
      cpu: "2"
      memory: 4Gi
  idle: false                             # 是否进入空载状态

工具选择

一个好的工具可以事半功倍。

有很多工具可供选择,比如 Operator SDK、Kubebuilder ,本次开发过程采用 Kubebuilder 进行开发。

并使用 Kind 快速部署一个开发环境的 kubernetes 集群。

Kubebuilder 依赖 golang 的版本,针对自己环境不同版本的 kubernetes 集群,需要选择不同版本的 Kubebuilder 来进行开发。

本次环境

  • kubernetes: 1.20.15

  • KubeBuilder: 3.2.0

  • golang: 1.16.15

环境准备

准备好 golang 和 kubebuilder 的环境

  1. golang 官网下载 1.16.15 版本并安装,这里不做过多解释。

  2. kubebuilder 官方下载 3.2.0 版本并安装,这里不做过多解释。

  3. Kind 快速部署一个 kubernetes 集群。官方

    快速拉起指定版本的集群:cat kind.yaml ,更多参数参考官方GitHub

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    # three node (two workers) cluster config
    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    nodes:
    - role: control-plane
      image: kindest/node:v1.20.15@sha256:a32bf55309294120616886b5338f95dd98a2f7231519c7dedcec32ba29699394
    - role: worker
      image: kindest/node:v1.20.15@sha256:a32bf55309294120616886b5338f95dd98a2f7231519c7dedcec32ba29699394
    - role: worker
      image: kindest/node:v1.20.15@sha256:a32bf55309294120616886b5338f95dd98a2f7231519c7dedcec32ba29699394
    

    kind create cluster --config kind.yaml --name dev01

简单示例

初始化一个简单的项目来做演示。

初始化项目

kubebuilder 初始化项目,看看 kubebuilder 为我们准备了些什么。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
mkdir application-operator

cd application-operator

kubebuilder init --domain=isekiro.com \
 --repo=github.com/isekiro/application-operator \
 --owner=isekiro

Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
Get controller runtime:
$ go get sigs.k8s.io/[email protected]
go: downloading google.golang.org/appengine v1.6.7
Update dependencies:
$ go mod tidy
go: downloading github.com/stretchr/testify v1.7.0
go: downloading go.uber.org/goleak v1.1.10
go: downloading github.com/benbjohnson/clock v1.1.0
go: downloading github.com/Azure/go-autorest/autorest/mocks v0.4.1
go: downloading gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f
go: downloading github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e
go: downloading golang.org/x/lint v0.0.0-20210508222113-6edffad5e616
go: downloading github.com/kr/text v0.2.0
go: downloading golang.org/x/tools v0.1.2
Next: define a resource with:
$ kubebuilder create api

执行完初始化命令后,我们看看目录下有哪些目录和文件。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
├── config
│   ├── default
│   │   ├── kustomization.yaml
│   │   ├── manager_auth_proxy_patch.yaml
│   │   └── manager_config_patch.yaml
│   ├── manager
│   │   ├── controller_manager_config.yaml
│   │   ├── kustomization.yaml
│   │   └── manager.yaml
│   ├── prometheus
│   │   ├── kustomization.yaml
│   │   └── monitor.yaml
│   └── rbac
│       ├── auth_proxy_client_clusterrole.yaml
│       ├── auth_proxy_role_binding.yaml
│       ├── auth_proxy_role.yaml
│       ├── auth_proxy_service.yaml
│       ├── kustomization.yaml
│       ├── leader_election_role_binding.yaml
│       ├── leader_election_role.yaml
│       ├── role_binding.yaml
│       └── service_account.yaml
├── Dockerfile
├── go.mod
├── go.sum
├── hack
│   └── boilerplate.go.txt
├── main.go
├── Makefile
└── PROJECT

我们还没有创建 API ,这里生产的都是一些 rbac 元数据相关和控制器相关的 yaml 。

创建 API

创建 API 相关配置和字段。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
kubebuilder create api --group apps \
--version v1 \
--kind Application

Create Resource [y/n]
y
Create Controller [y/n]
y
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
api/v1/application_types.go
controllers/application_controller.go
Update dependencies:
$ go mod tidy
Running make:
$ make generate
go: creating new go.mod: module tmp
Downloading sigs.k8s.io/controller-tools/cmd/[email protected]
go get: added sigs.k8s.io/controller-tools v0.7.0
/data/goproject/src/github.com/isekiro/application-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:
$ make manifests

执行命令后,提示是否创建 Resource 和 Controller ,这里我们都选 y 即可。

我们再回头看看目录多了哪些内容。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
├── api
│   └── v1
│       ├── application_types.go
│       ├── groupversion_info.go
│       └── zz_generated.deepcopy.go
├── bin
│   └── controller-gen
├── config
│   ├── crd
│   │   ├── kustomization.yaml
│   │   ├── kustomizeconfig.yaml
│   │   └── patches
│   │       ├── cainjection_in_applications.yaml
│   │       └── webhook_in_applications.yaml
│   ├── default
│   │   ├── kustomization.yaml
│   │   ├── manager_auth_proxy_patch.yaml
│   │   └── manager_config_patch.yaml
│   ├── manager
│   │   ├── controller_manager_config.yaml
│   │   ├── kustomization.yaml
│   │   └── manager.yaml
│   ├── prometheus
│   │   ├── kustomization.yaml
│   │   └── monitor.yaml
│   ├── rbac
│   │   ├── application_editor_role.yaml
│   │   ├── application_viewer_role.yaml
│   │   ├── auth_proxy_client_clusterrole.yaml
│   │   ├── auth_proxy_role_binding.yaml
│   │   ├── auth_proxy_role.yaml
│   │   ├── auth_proxy_service.yaml
│   │   ├── kustomization.yaml
│   │   ├── leader_election_role_binding.yaml
│   │   ├── leader_election_role.yaml
│   │   ├── role_binding.yaml
│   │   └── service_account.yaml
│   └── samples
│       └── apps_v1_application.yaml
├── controllers
│   ├── application_controller.go
│   └── suite_test.go
├── Dockerfile
├── go.mod
├── go.sum
├── hack
│   └── boilerplate.go.txt
├── main.go
├── Makefile
└── PROJECT

可以看到多出了 api 目录、sample 目录和 config 目录,多了一些 crd 、 controller 相关的 yaml 。

CRD 实现

看看实现 CRD 的代码,在 api/v1/application_types.go 这个目录下,有相关的结构体。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// ApplicationSpec defines the desired state of Application
type ApplicationSpec struct {
	// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	// Foo is an example field of Application. Edit application_types.go to remove/update
	Foo string `json:"foo,omitempty"`
}

// ApplicationStatus defines the observed state of Application
type ApplicationStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make" to regenerate code after modifying this file
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status

// Application is the Schema for the applications API
type Application struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   ApplicationSpec   `json:"spec,omitempty"`
	Status ApplicationStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

// ApplicationList contains a list of Application
type ApplicationList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []Application `json:"items"`
}

删掉 Foo 字段,替换成 Template 。

1
2
3
4
5
6
7
type ApplicationSpec struct {
	// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	Replicas int32                  `json:"replicas,omitempty"`
	Template corev1.PodTemplateSpec `json:"template,omitempty"`
}

我们修改了 application_types.go 的字段后,根据提示,需要 make 来重新生成代码。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 因为我们引入新的包,需要 go mod tidy 解决一下依赖
go mod tidy

# 再执行 make 生成新的代码
make

/data/goproject/src/github.com/isekiro/application-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go build -o bin/manager main.go

# make manifests 重新生成 crd yaml 配置文件
make manifests

/data/goproject/src/github.com/isekiro/application-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases

最后我们通过 make install 将 CRD 资源部署进集群内。

1
2
3
4
5
6
7
8
make install

/data/goproject/src/github.com/isekiro/application-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
go: creating new go.mod: module tmp
Downloading sigs.k8s.io/kustomize/kustomize/[email protected]
go get: added sigs.k8s.io/kustomize/kustomize/v3 v3.8.7
/data/goproject/src/github.com/isekiro/application-operator/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/applications.apps.isekiro.com created

通过 kubectl 查看 CRD 有没有部署成功。

1
2
3
4
5
6
kubectl get crd
NAME                                  CREATED AT
applications.apps.isekiro.com         2023-02-05T08:22:08Z

kubectl get applications
No resources found in default namespace.

我们修改一下 config/samples/apps_v1_application.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
vim config/samples/apps_v1_application.yaml
apiVersion: apps.isekiro.com/v1
kind: Application
metadata:
  name: application-sample
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: nginx
          image: nginx:1.16.1

# 应用到集群
kubectl apply -f config/samples/apps_v1_application.yaml 
application.apps.isekiro.com/application-sample created

# 查询有没有应用到
kubectl get applications
NAME                 AGE
application-sample   3s

可以看到,kubernetes 集群已经认到了 CRD 资源,并返回结果,但是,这仅仅只是将 yaml 存储到集群的 ETCD 里面,并没有做任何实质的东西,因为我们还没有实现 controller 来将对这些 CRD 资源的配置做相应的控制逻辑。

controller 实现

实现 controller 来对 CRD 资源对象进行调谐。

打开 controller 目录,我们可以看到 application_controller.go 文件,控制器的逻辑就在 Reconcile 方法里面去实现。

1
2
3
4
5
6
7
func (r *ApplicationReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	_ = log.FromContext(ctx)

	// TODO(user): your logic here

	return ctrl.Result{}, nil
}

我们来实现创建 Pod 的逻辑。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
func (r *ApplicationReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	l := log.FromContext(ctx)

	// get the Application
	app := &appsv1.Application{}
	if err := r.Get(ctx, req.NamespacedName, app); err != nil {
    # 如果在缓存中查不到说明已经被删除了是正常的直接返回空的值
		if errors.IsNotFound(err) {
			l.Info("the Application is not found")
			return ctrl.Result{}, nil
		}
    # 如果是 IsNotFound 以外的错误说明真正出现了错误重新加入到队列
		l.Error(err, "failed to get the Application")
		return ctrl.Result{RequeueAfter: 1 * time.Minute}, err
	}

	// create pods
	for i := 0; i < int(app.Spec.Replicas); i++ {
		pod := &corev1.Pod{
			ObjectMeta: metav1.ObjectMeta{
				Name:      fmt.Sprintf("%s-%d", app.Name, i),
				Namespace: app.Namespace,
				Labels:    app.Labels,
			},
			Spec: app.Spec.Template.Spec,
		}

		if err := r.Create(ctx, pod); err != nil {
			l.Error(err, "failed to create Pod")
			return ctrl.Result{RequeueAfter: 1 * time.Minute}, err
		}
		l.Info(fmt.Sprintf("the Pod (%s) has created", pod.Name))
	}

	l.Info("all pods has created")
	return ctrl.Result{}, nil
}

make run ENABLE_WEBHOOKS=false 本地运行 controller 调试看看。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
/data/goproject/src/github.com/isekiro/application-operator/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/data/goproject/src/github.com/isekiro/application-operator/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./main.go
2023-02-05T16:44:17.711+0800    INFO    controller-runtime.metrics      metrics server is starting to listen    {"addr": ":8080"}
2023-02-05T16:44:17.714+0800    INFO    setup   starting manager
2023-02-05T16:44:17.714+0800    INFO    starting metrics server {"path": "/metrics"}
2023-02-05T16:44:17.715+0800    INFO    controller.application  Starting EventSource    {"reconciler group": "apps.isekiro.com", "reconciler kind": "Application", "source": "kind source: /, Kind="}
2023-02-05T16:44:17.715+0800    INFO    controller.application  Starting Controller     {"reconciler group": "apps.isekiro.com", "reconciler kind": "Application"}
2023-02-05T16:44:17.817+0800    INFO    controller.application  Starting workers        {"reconciler group": "apps.isekiro.com", "reconciler kind": "Application", "worker count": 1}
2023-02-05T16:44:17.828+0800    INFO    controller.application  the Pod (application-sample-0) has created      {"reconciler group": "apps.isekiro.com", "reconciler kind": "Application", "name": "application-sample", "namespace": "default"}
2023-02-05T16:44:17.832+0800    INFO    controller.application  the Pod (application-sample-1) has created      {"reconciler group": "apps.isekiro.com", "reconciler kind": "Application", "name": "application-sample", "namespace": "default"}
2023-02-05T16:44:17.844+0800    INFO    controller.application  the Pod (application-sample-2) has created      {"reconciler group": "apps.isekiro.com", "reconciler kind": "Application", "name": "application-sample", "namespace": "default"}
2023-02-05T16:44:17.844+0800    INFO    controller.application  all pods has created    {"reconciler group": "apps.isekiro.com", "reconciler kind": "Application", "name": "application-sample", "namespace": "default"}

可以看到,控制器已经识别到 CRD 配置并根据配置按照我们的逻辑创建了相应数量的 Pod 。

controller 部署

将 controller 部署到集群内运行。部署过程会遇到若干问题。

构建镜像

make docker-build IMG=application-operator:v0.1

加载镜像到 Kind 集群

kind load docker-image application-operator:v0.1 --name dev01

部署控制器

make deploy IMG=application-operator:v0.1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
2023-02-05T08:57:29.327Z	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
2023-02-05T08:57:29.329Z	INFO	setup	starting manager
2023-02-05T08:57:29.330Z	INFO	starting metrics server	{"path": "/metrics"}
I0205 08:57:29.330489       1 leaderelection.go:248] attempting to acquire leader lease application-operator-system/2e281282.isekiro.com...
I0205 08:57:29.343129       1 leaderelection.go:258] successfully acquired lease application-operator-system/2e281282.isekiro.com
2023-02-05T08:57:29.343Z	DEBUG	events	Normal	{"object": {"kind":"ConfigMap","namespace":"application-operator-system","name":"2e281282.isekiro.com","uid":"9ed00deb-c9d3-485e-abb6-620c58f67c8e","apiVersion":"v1","resourceVersion":"542935"}, "reason": "LeaderElection", "message": "application-operator-controller-manager-59487657df-wt4q9_73560857-a7c0-4c03-9098-324336ec2bc9 became leader"}
2023-02-05T08:57:29.343Z	DEBUG	events	Normal	{"object": {"kind":"Lease","namespace":"application-operator-system","name":"2e281282.isekiro.com","uid":"084087d5-5279-4119-9a3a-c0b416c28c58","apiVersion":"coordination.k8s.io/v1","resourceVersion":"542936"}, "reason": "LeaderElection", "message": "application-operator-controller-manager-59487657df-wt4q9_73560857-a7c0-4c03-9098-324336ec2bc9 became leader"}
2023-02-05T08:57:29.343Z	INFO	controller.application	Starting EventSource	{"reconciler group": "apps.isekiro.com", "reconciler kind": "Application", "source": "kind source: /, Kind="}
2023-02-05T08:57:29.343Z	INFO	controller.application	Starting Controller	{"reconciler group": "apps.isekiro.com", "reconciler kind": "Application"}
2023-02-05T08:57:29.444Z	INFO	controller.application	Starting workers	{"reconciler group": "apps.isekiro.com", "reconciler kind": "Application", "worker count": 1}
注意

第一个问题:Dockerfile 的 go mod download 因某些原因导致某些包无法下载,如下修改 Dockerfile 。

RUN go env -w GOPROXY=https://goproxy.cn,direct;
go mod download

第二个问题:gcr.io/distroless/static:nonroot 因为某些原因无法拉取,用 kubeimages/distroless-static:latest 替换。

第三个问题:gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0 镜像也因为某些原因无法拉取,下载 kubesphere/kubebuilder/kube-rbac-proxy:v0.8.0 ,docker tag 解决

资源清理

清理控制器:make undeploy

清理CRD资源:make uninstall

结束

玩了一遍这个基础的 Operator,我有个想法,模仿 Cronjob 用学到的东西来实现一个 mysql 备份的 Operator ,部署到集群内,定时执行备份任务。