参考:
https://mp.weixin.qq.com/s/D8efjj9ZhLyEu7zEqWvJiQ
https://stackoverflow.com/questions/71860152/actuator-health-endpoint-returns-out-of-service-when-all-groups-are-up
https://docs.spring.io/spring-boot/docs/2.6.x/reference/htmlsingle/#actuator.endpoints.kubernetes-probes
本文使用 K8s + SpringBoot 实现零宕机发布:健康检查 + 滚动更新 + 优雅停机 + 弹性伸缩 + Prometheus监控 + 配置分离(镜像复用)
配置
健康检查
- 健康检查类型:就绪探针(readiness)+ 存活探针(liveness)
- 探针类型:exec(进入容器执行脚本)、tcpSocket(探测端口)、httpGet(调用接口)
业务层面
项目依赖 pom.xml
1 2 3 4
| <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency>
|
定义访问端口、路径及权限 application.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| management: server: port: 50000 endpoint: health: probes: enabled: true endpoints: web: exposure: base-path: /actuator include: health
|
将暴露/actuator/health/readiness和/actuator/health/liveness两个接口,访问方式如下:
1 2 3
| http://127.0.0.1:50000/actuator/health -》 返回组下所有信息 http://127.0.0.1:50000/actuator/health/readiness -》 返回readiness组下信息 http://127.0.0.1:50000/actuator/health/liveness -》 返回liveness组下信息
|
运维层面
k8s部署模版deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
| apiVersion: apps/v1 kind: Deployment spec: template: spec: containers: - name: {APP_NAME} image: {IMAGE_URL} imagePullPolicy: Always ports: - containerPort: {APP_PORT} - name: management-port containerPort: 50000 readinessProbe: httpGet: path: /actuator/health/readiness port: management-port initialDelaySeconds: 90 periodSeconds: 30 timeoutSeconds: 30 successThreshold: 1 failureThreshold: 3 livenessProbe: httpGet: path: /actuator/health/liveness port: management-port initialDelaySeconds: 90 periodSeconds: 30 timeoutSeconds: 30 successThreshold: 1 failureThreshold: 3
|
滚动更新
k8s资源调度之滚动更新策略,若要实现零宕机发布,需支持健康检查
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| apiVersion: apps/v1 kind: Deployment metadata: name: {APP_NAME} labels: app: {APP_NAME} spec: selector: matchLabels: app: {APP_NAME} replicas: {REPLICAS} strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 1
|
优雅停机
在K8s中,当我们实现滚动升级之前,务必要实现应用级别的优雅停机。否则滚动升级时,还是会影响到业务。使应用关闭线程、释放连接资源后再停止服务
业务层面
项目依赖 pom.xml
1 2 3 4
| <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency>
|
定义访问端口、路径及权限 application.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| spring: application: name: <xxx> profiles: active: @profileActive@ lifecycle: timeout-per-shutdown-phase: 30s
server: port: 8080 shutdown: graceful
management: server: port: 50000 endpoint: shutdown: enabled: true health: probes: enabled: true endpoints: web: exposure: base-path: /actuator include: health,shutdown
|
将暴露/actuator/shutdown接口,调用方式如下:
1
| curl -X POST 127.0.0.1:50000/actuator/shutdown
|
运维层面
确保dockerfile模版集成curl工具,否则无法使用curl命令
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| FROM openjdk:8-jdk-alpine
ARG JAR_FILE ARG WORK_PATH="/app" ARG EXPOSE_PORT=8080
ENV JAVA_OPTS=""\ JAR_FILE=${JAR_FILE}
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo 'Asia/Shanghai' >/etc/timezone RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.ustc.edu.cn/g' /etc/apk/repositories \ && apk add --no-cache curl
COPY target/$JAR_FILE $WORK_PATH/
WORKDIR $WORK_PATH
EXPOSE $EXPOSE_PORT
ENTRYPOINT exec java $JAVA_OPTS -jar $JAR_FILE
|
k8s部署模版deployment.yaml
注:经验证,java项目可省略结束回调钩子的配置
此外,若需使用回调钩子,需保证镜像中包含curl工具,且需注意应用管理端口(50000)不能暴露到公网
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| apiVersion: apps/v1 kind: Deployment spec: template: spec: containers: - name: {APP_NAME} image: {IMAGE_URL} imagePullPolicy: Always ports: - containerPort: {APP_PORT} - containerPort: 50000 lifecycle: preStop: exec: command: ["curl", "-XPOST", "127.0.0.1:50000/actuator/shutdown"]
|
弹性伸缩
为pod设置资源限制后,创建HPA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| apiVersion: apps/v1 kind: Deployment metadata: name: {APP_NAME} labels: app: {APP_NAME} spec: template: spec: containers: - name: {APP_NAME} image: {IMAGE_URL} imagePullPolicy: Always resources: limits: cpu: 0.5 memory: 1Gi requests: cpu: 0.15 memory: 300Mi --- kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta2 metadata: name: {APP_NAME} spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: {APP_NAME} minReplicas: {REPLICAS} maxReplicas: 6 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50
|
Prometheus集成
业务层面
项目依赖 pom.xml
1 2 3 4 5 6 7 8 9
| <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency>
|
定义访问端口、路径及权限 application.yaml
1 2 3 4 5 6 7 8 9 10 11
| management: server: port: 50000 metrics: tags: application: ${spring.application.name} endpoints: web: exposure: base-path: /actuator include: metrics,prometheus
|
将暴露/actuator/metric和/actuator/prometheus接口,访问方式如下:
1 2
| http://127.0.0.1:50000/actuator/metric http://127.0.0.1:50000/actuator/prometheus
|
运维层面
deployment.yaml
1 2 3 4 5 6 7 8 9
| apiVersion: apps/v1 kind: Deployment spec: template: metadata: annotations: prometheus:io/port: "50000" prometheus.io/path: /actuator/prometheus prometheus.io/scrape: "true"
|
配置分离
方案:通过configmap挂载外部配置文件,并指定激活环境运行
作用:配置分离,避免敏感信息泄露;镜像复用,提高交付效率
通过文件生成configmap
1 2 3 4 5
| kubectl create cm -n <namespace> <APP_NAME> --from-file=application-test.yaml --dry-run=1 -oyaml > configmap.yaml
kubectl apply -f configmap.yaml
|
挂载configmap并指定激活环境
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| apiVersion: apps/v1 kind: Deployment metadata: name: {APP_NAME} labels: app: {APP_NAME} spec: template: spec: containers: - name: {APP_NAME} image: {IMAGE_URL} imagePullPolicy: Always env: - name: SPRING_PROFILES_ACTIVE value: test volumeMounts: - name: conf mountPath: "/app/config" readOnly: true volumes: - name: conf configMap: name: {APP_NAME}
|
汇总配置
业务层面
项目依赖 pom.xml
1 2 3 4 5 6 7 8 9
| <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency>
|
定义访问端口、路径及权限 application.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
| spring: application: name: project-sample profiles: active: @profileActive@ lifecycle: timeout-per-shutdown-phase: 30s
server: port: 8080 shutdown: graceful
management: server: port: 50000 metrics: tags: application: ${spring.application.name} endpoint: shutdown: enabled: true health: probes: enabled: true endpoints: web: exposure: base-path: /actuator include: health,shutdown,metrics,prometheus
|
运维层面
确保dockerfile模版集成curl工具,否则无法使用curl命令
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
| FROM openjdk:8-jdk-alpine
ARG JAR_FILE ARG WORK_PATH="/app" ARG EXPOSE_PORT=8080
ENV JAVA_OPTS=""\ JAR_FILE=${JAR_FILE}
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo 'Asia/Shanghai' >/etc/timezone RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.ustc.edu.cn/g' /etc/apk/repositories \ && apk add --no-cache curl
COPY target/$JAR_FILE $WORK_PATH/
WORKDIR $WORK_PATH
EXPOSE $EXPOSE_PORT
ENTRYPOINT exec java $JAVA_OPTS -jar $JAR_FILE
|
k8s部署模版deployment.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
| apiVersion: apps/v1 kind: Deployment metadata: name: {APP_NAME} labels: app: {APP_NAME} spec: selector: matchLabels: app: {APP_NAME} replicas: {REPLICAS} strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 template: metadata: name: {APP_NAME} labels: app: {APP_NAME} annotations: timestamp: {TIMESTAMP} prometheus.io/port: "50000" prometheus.io/path: /actuator/prometheus prometheus.io/scrape: "true" spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - {APP_NAME} topologyKey: "kubernetes.io/hostname" terminationGracePeriodSeconds: 30 containers: - name: {APP_NAME} image: {IMAGE_URL} imagePullPolicy: Always ports: - containerPort: {APP_PORT} - name: management-port containerPort: 50000 readinessProbe: httpGet: path: /actuator/health/readiness port: management-port initialDelaySeconds: 90 periodSeconds: 30 timeoutSeconds: 30 successThreshold: 1 failureThreshold: 3 livenessProbe: httpGet: path: /actuator/health/liveness port: management-port initialDelaySeconds: 90 periodSeconds: 30 timeoutSeconds: 30 successThreshold: 1 failureThreshold: 3 resources: limits: cpu: 0.5 memory: 1Gi requests: cpu: 0.1 memory: 200Mi env: - name: TZ value: Asia/Shanghai --- kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta2 metadata: name: {APP_NAME} spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: {APP_NAME} minReplicas: {REPLICAS} maxReplicas: 6 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50
|
问题
程序中有段代码:在CommandLineRunner.run当中执行while(true){...},永无休止的执行一段代码
这会导致一个问题:这个程序永远都无法正常停止!当执行健康检查的/readiness接口时,返回的status永远都是503
解决办法:while(true)放在单独的一个子线程执行