k8s动态扩缩容部署

Horizontal Pod Autoscaler(HPA,Pod水平自动伸缩),根据平均 CPU 利用率、平均内存利用率或你指定的任何其他自定义指标自动调整 Deployment 、ReplicaSet 或 StatefulSet 或其他类似资源,实现部署的自动扩展和缩减,让部署的规模接近于实际服务的负载。HPA不适用于无法缩放的对象,例如DaemonSet。

我的k8s集群是用kubesphere来管理的,配置hap发现没有生效,查看文档发现需要先部署metrics-server来获取资源指标:

可以看到所有的TARGETS都是,但是后边有%80、%100的阀值。

1
2
3
4
5
6
7
[root@master knativetest]# kubectl get hpa -A
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
istio-system cluster-local-gateway Deployment/cluster-local-gateway <unknown>/80% 1 5 1 33d
istio-system istio-ingressgateway Deployment/istio-ingressgateway <unknown>/80% 4 5 4 33d
istio-system istiod Deployment/istiod <unknown>/80% 1 5 1 33d
knative-serving activator Deployment/activator <unknown>/100% 1 20 1 33d
tekton-pipelines tekton-pipelines-webhook Deployment/tekton-pipelines-webhook <unknown>/100% 1 5 1 7d21h

查看一下hpa的详情:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@master knativetest]# kubectl describe hpa/istio-ingressgateway -n istio-system 
Name: istio-ingressgateway
Namespace: istio-system
Labels: app=istio-ingressgateway
install.operator.istio.io/owning-resource=unknown
install.operator.istio.io/owning-resource-namespace=istio-system
istio=ingressgateway
istio.io/rev=default
operator.istio.io/component=IngressGateways
operator.istio.io/managed=Reconcile
operator.istio.io/version=1.8.0
release=istio
Annotations: <none>
CreationTimestamp: Fri, 20 Nov 2020 17:13:57 +0800
Reference: Deployment/istio-ingressgateway
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 1%
Min replicas: 1
Max replicas: 5
Deployment pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 19m (x1734 over 5d18h) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Warning FailedGetResourceMetric 4m37s (x13 over 16m) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)

报错信息是不能够从metrcis api中拿到服务的指标。

所以我们得先安装metrics-server

metrics-server的yaml文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
image: registry.aliyuncs.com/google_containers/metrics-server:v0.4.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100

部署:

1
kubectl apply -f metrics-server.yaml

部署完之后再看下hpa状态:

1
2
3
4
5
6
7
naison@P_CAIWFENG-MB0 knativetest % kubectl get hpa -A
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
istio-system cluster-local-gateway Deployment/cluster-local-gateway 12%/80% 1 5 1 33d
istio-system istio-ingressgateway Deployment/istio-ingressgateway 11%/80% 4 5 4 33d
istio-system istiod Deployment/istiod 0%/80% 1 5 1 33d
knative-serving activator Deployment/activator <unknown>/100% 1 20 1 33d
tekton-pipelines tekton-pipelines-webhook Deployment/tekton-pipelines-webhook 3%/100% 1 5 1 7d21h

还有一个hpa没生效,查hpa官方文档发现,如果不设置container的request request值,那么hpa是无法生效的。

创建一个hpa:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hap-nginx
spec:
maxReplicas: 10 # 最大扩容到10个节点(pod)
minReplicas: 1 # 最小扩容1个节点(pod)
metrics:
- resource:
name: cpu
target:
averageUtilization: 40 # CPU 平局资源使用率达到40%就开始扩容,低于40%就是缩容
# 设置内存
# AverageValue:40
type: Utilization
type: Resource
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hap-nginx

测试:

压测的时候发现,不一会就扩容了很多pod,之前测试出来的最优性能时,pod大概是4个,但是这个不一会儿就已经8个了,肯定有问题。

hpa的计算公式:

1
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

意思是:

1
需要的pod数 = 向下取整[当前pod数 * ( pod当前使用值 / pod request值)]

注意这里是pod,request值也就是resource的request值。
查看deployment的container中的resources标签值,发现是100m,原来是分母太小了,导致pod只用了一点儿资源,就扩容了。所以把request值设置为合理的大小就行。

Thank you for your accept. mua!
-------------本文结束感谢您的阅读-------------