需求

同一套环境, 两个微服务serviceA和serviceB, 且分别有2个版本original, v1

调用链路: serviceA -> serviceB

具体分为以下几种情况

  1. 如果serviceA和serviceB都有v1版本

    serviceA(v1) -> serviceB(v1)

  2. 如果serviceA有v1版本, serviceB没有

    serviceA(v1) -> serviceB(original)

  3. 如果serviceA没有v1版本, 而serviceB没有

    serviceA(original) -> serviceB(v1)

技术方案

流量染色

什么是流量染色

在元数据中心(这里可以代指我们的k8s集群),维护每个环境对应的服务列表;在流量的入口处,对请求添加标识;在基础框架层,对流量标识进行解析、透传 和 服务路由。

实际操作一般是在我们的 HTTP 请求中,加入对应环境,用户等变量标识,使请求可以根据这些标识做分类,转发等操作

为什么需要流量染色

  1. 使不同的服务,共享环境

  2. 可以本地调试特定的服务,而不阻碍服务的正常运行

总结: 降成本, 测试提效, 环境治理,可控

istio - 路由控制

istio版本: 1.14.1

路由这个功能是流量控制里面非常重要,也是最常用的一个功能。在Istio里一般通过Virtual Service(虚拟服务)以及Destination Rule(目标规则)这两个API资源进行动态路由的设置。

虚拟服务(Virtual Service):

  • 定义路由规则,匹配请求
  • 描述满足条件的请求去哪里

目标规则(Destination Rule):

  • 定义子集、策略
  • 描述到达目标的请求怎么处理

我们的方案是, 添加一个request header(project-version), 每次请求会把此header发到下个服务, 保证整个请求链路都带着这个标识

那么整个流程就是这个样子

技术实现(包含测试需要)

功能技术栈
ci/cdgitlab-runner
流量控制(路由规则)istio(VirtualService/DestinationRule)
两个微服务(serviceA, serviceB)nodejs, koa
serviceA的外挂访问配置istio ingressgateway
作为VirtualService/DestinationRule的管理工具Rancher
传递标识(传递 request header)axios

测试流程梳理

准备两个nodejs微服务(testaaa和testbbb)

testaaa服务需要做的是把从上游获取到的header发给下游

1
2
3
4
5
6
7
router.get('/', async (ctx) => {
const header = ctx.headers['project-version'];
const res = await ctx.rest.get(`${bbbUrl}`,{},{
headers: {'project-version': header || '-'}
});
ctx.body = res;
});

testbbb服务将当前pod的版本号作为结果返回给testaaa

1
2
3
router.get('/', async (ctx) => {
ctx.body = process.env.PROJECT_VERSION;
});

准备好两个服务的配置资源文件

Dockerfile(两个项目一样)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
FROM node:12.22.0-alpine3.12 as build
WORKDIR /user/src/app
RUN set -eux \
&& sed -i 's/dl-cdn.alpinelinux.org/mirrors.ustc.edu.cn/g' /etc/apk/repositories \
&& apk add --no-cache curl gcc g++ make linux-headers python2 python3 python3-dev
COPY package.json package-lock.json ./
RUN npm install
FROM node:12.22.0-alpine3.12 as runtime
WORKDIR /user/src/app
RUN set -eux \
&& apk add --no-cache tzdata \
&& cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \
&& apk del tzdata
COPY --from=build /user/src/app/node_modules ./node_modules/
COPY . .
CMD [ "npm", "run", "start" ]

testaaa, deployment.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
apiVersion: v1
kind: Service
metadata:
name: testaaa
namespace: test-istio
spec:
selector:
app: testaaa
ports:
- port: 31000
targetPort: 31000
# 最好配置协议/或者配置name属性,然后值为<protocol-suffix>, 见https://istio.io/latest/zh/docs/ops/configuration/traffic-management/protocol-selection/
appProtocol: HTTP
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: testaaa-${CI_COMMIT_REF_NAME}
namespace: test-istio
labels:
app: testaaa
version: ${CI_COMMIT_REF_NAME}
spec:
replicas: 1
selector:
matchLabels:
app: testaaa
version: ${CI_COMMIT_REF_NAME}
strategy:
type: Recreate
template:
metadata:
annotations:
# 可以单独配置此属性来指定当前部署单元是否需要istio接管
sidecar.istio.io/inject: 'true'
labels:
app: testaaa
version: ${CI_COMMIT_REF_NAME}
spec:
containers:
- image: $REGISTRY_ADDRESS/${NODE_ENV}/${CI_PROJECT_NAME}:v${CI_PIPELINE_ID}
env:
- name: NODE_ENV
value: development
- name: PROJECT_VERSION
value: ${CI_COMMIT_REF_NAME}
imagePullPolicy: IfNotPresent
livenessProbe:
tcpSocket:
port: 31000
readinessProbe:
tcpSocket:
port: 31000
name: testaaa
ports:
- containerPort: 31000
dnsPolicy: ClusterFirst
restartPolicy: Always

testbbb, deployment.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
apiVersion: v1
kind: Service
metadata:
name: testbbb
namespace: test-istio
spec:
selector:
app: testbbb
ports:
- port: 32000
targetPort: 32000
# 必须配置协议/也可以写name,然后配置为<protocol-suffix>
appProtocol: HTTP
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: testbbb-${CI_COMMIT_REF_NAME}
namespace: test-istio
labels:
app: testbbb
version: ${CI_COMMIT_REF_NAME}
spec:
replicas: 1
selector:
matchLabels:
app: testbbb
version: ${CI_COMMIT_REF_NAME}
strategy:
type: Recreate
template:
metadata:
annotations:
# 此处优先级高于namespace下的注入配置
sidecar.istio.io/inject: 'true'
# sidecar.istio.io/logLevel: debug
labels:
app: testbbb
version: ${CI_COMMIT_REF_NAME}
spec:
containers:
- image: $REGISTRY_ADDRESS/${NODE_ENV}/${CI_PROJECT_NAME}:v${CI_PIPELINE_ID}
env:
- name: NODE_ENV
value: development
- name: PROJECT_VERSION
value: ${CI_COMMIT_REF_NAME}
imagePullPolicy: IfNotPresent
livenessProbe:
tcpSocket:
port: 32000
readinessProbe:
tcpSocket:
port: 32000
name: testbbb
ports:
- containerPort: 32000
dnsPolicy: ClusterFirst
restartPolicy: Always

因为istio sidecar要运行到每个POD中进行流量管控,所以我们需要为所需的namespace开启POD自动注入istio sidecar的功能【必须】

1
2
# 开启istio自动注入
kubectl label namespace test-istio istio-injection=enabled

准备gitlab-ci配置文件, 并部署两个项目的release版本和release-v1版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
stages:
- build
- deploy
##############################################
build-release:
stage: build
variables:
IMAGE: release/${CI_PROJECT_NAME}:v${CI_PIPELINE_ID}
script:
- echo "Building application..."
- echo "$REGISTRY_PASSWORD" | sudo docker login -u ${REGISTRY_USERNAME} --password-stdin ${REGISTRY_ADDRESS}
- echo "registry login success"
- sudo docker build -t ${REGISTRY_ADDRESS}/${IMAGE} .
- sudo docker push ${REGISTRY_ADDRESS}/${IMAGE}
- echo "docker push && push success"
tags:
- build-runner
only:
- release
- /^release-.*/



deploy-release:
stage: deploy
variables:
NODE_ENV: release
script:
- echo "Deploying application..."
- envsubst < deployment.yaml > deployment_new.yaml
- ssh -p ${RELEASE_CI_PORT} -tt ${RELEASE_CI_USER}@${RELEASE_CI_IP} "[ -d ${DEPLOY_PATH} ] && echo ok || mkdir -p ${DEPLOY_PATH}"
- scp deployment_new.yaml ${RELEASE_CI_USER}@${RELEASE_CI_IP}:${DEPLOY_PATH}/deployment_${CI_PROJECT_NAME}.yaml
- ssh -p ${RELEASE_CI_PORT} -tt ${RELEASE_CI_USER}@${RELEASE_CI_IP} "cd ${DEPLOY_PATH} && kubectl apply -f deployment_${CI_PROJECT_NAME}.yaml"
- echo "Application successfully deployed."
tags:
- back-release
only:
- release
- /^release-.*/

如下

配置ingress, 使testaaa可以对外访问(下一步会在vs中结合配置)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: gateway
namespace: test-istio
spec:
selector:
istio: ingressgateway # use istio test-istio controller
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"

准备两个服务各自的DestinationRule, 和VirtualService

  • testaaa

管理当前服务所有子集(版本)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: testaaa
namespace: test-istio
spec:
host: testaaa.test-istio.svc.cluster.local
subsets:
- labels:
version: release
name: release
- labels:
version: release-v1
name: release-v1

管理当前服务路由规则(流量控制)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: testaaa
namespace: test-istio
spec:
gateways:
- gateway
hosts:
- "*"
http:
- match:
- headers:
project-version:
exact: release-v1
route:
- destination:
host: testaaa.test-istio.svc.cluster.local
subset: release-v1
- route:
- destination:
host: testaaa.test-istio.svc.cluster.local
subset: release
  • testbbb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: testbbb
namespace: test-istio
spec:
hosts:
- testbbb.test-istio.svc.cluster.local
http:
- match:
- headers:
project-version:
exact: release-v1
route:
- destination:
host: testbbb.test-istio.svc.cluster.local
subset: release-v1
- route:
- destination:
host: testbbb.test-istio.svc.cluster.local
subset: release
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: testbbb
namespace: test-istio
spec:
host: testbbb.test-istio.svc.cluster.local
subsets:
- labels:
version: release
name: release
- labels:
version: release-v1
name: release-v1
---

部署好所有服务,会发现在当前命名空间部署的所有pod内都会多一个容器,和一个初始化容器

测试访问(注:这个测试的访问流量是非本集群内出去的,但是先走了istio ingressgateway,所以testaaa和testbbb的路由规则一定会生效)

先得到ingress的ip和端口
1
2
3
4
5
6
[root@k3s-release-server1 ~]# kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}'
31380

# 假如有2个ingressgateway副本,.items下就可以得到{.items[0].status.hostIP}和{.items[1].status.hostIP},上游拿到进行负载配置即可
[root@k3s-release-server1 ~]# kubectl get pod -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}'
10.1.4.5
访问不带header, 得到的结果是original版本(release)
1
2
3
[root@k3s-release-server1 ~]# curl "http://10.1.4.5:31380/" -w '\n'
release
[root@k3s-release-server1 ~]#
带上release-v1的header后, 得到的结果是release-v1版本
1
2
3
[root@k3s-release-server1 ~]# curl "http://10.1.4.5:31380/" --header 'project-version: release-v1' -w '\n'
release-v1
[root@k3s-release-server1 ~]#
带着未部署过的版本 header进行访问, 得到的结果是original版本(release)
1
2
3
[root@k3s-release-server1 ~]# curl "http://10.1.4.5:31380/" --header 'project-version: release-v3' -w '\n'
release
[root@k3s-release-server1 ~]#
紧接着我们为testaaa部署一个release-v2版本, 并且配置好DestinationRule和VirtualService, 但是testbbb不做更改, 带着release-v2的header再发起访问, 得到的结果是orginal版本(release)
1
2
3
[root@k3s-release-server1 ~]# curl "http://10.1.4.5:31380/" --header 'project-version: release-v2' -w '\n'
release
[root@k3s-release-server1 ~]#
如果我们为testbbb部署一个release-v3版本, 并且配置好DestinationRule和VirtualService, 但是testaaa不做更改, 带着release-v3的header再发起访问, 得到的结果是release-v3版本
1
2
3
[root@k3s-release-server1 ~]# curl "http://10.1.4.5:31380/" --header 'project-version: release-v3' -w '\n'
release-v3
[root@k3s-release-server1 ~]#

testaaa服务的链路图

testbbb服务的链路图

延申(vs如果不加gateway字段,默认会配置,值是mesh,表示集群内部所有 Sidecar,也就表示此 VirutualService 规则针对集群内的访问生效)

小结

以上测试效果仅适合上游请求也是从当前namespace进来的,

假如在非当前namespace内访问testaaa服务, 不经过gateway, 它呈现出的效果就是testaaa服务规则不会生效, 但是testbbb的规则生效,

假如在非当前namespace内直接访问testbbb服务, 那么testbbb的规则不会生效,

假如在当前namespace直接访问testbbb, 那么testbbb的规则则会生效

直接在主机上访问service是会不生效的,istio的灰度规则是通过请求端的sidecar生效的。可以在一个注入了sidecar的pod(或者请求需流经注入了sidecar的pod)里访问service。以使得流量规则生效

问题1

目前这种方案可以满足我们绝大部分场景下的多分支并行开发, 但是对于个别情况, 例如第三方回调, 还存在问题. 除非回调可以由我们来控制请求报文, 否则回调只能落到original版本

内容更新

此处我们后续改成利用Istio提供的EnvoyFilter解决,以下为样例

控制所有pod-lableorange-gateway-a的工作负载(即网关),监听其入站流量,对:authority header和project-version header加以判断,是否需要额外添加project-version header

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: header-envoy-filter
namespace: sopei-biz
spec:
workloadSelector:
labels:
app: orange-gateway-a
configPatches:
- applyTo: HTTP_FILTER
match:
context: SIDECAR_INBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
subFilter:
name: envoy.filters.http.router
patch:
operation: INSERT_BEFORE
value:
name: envoy.filters.http.lua
typed_config:
"@type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua"
inlineCode: |
function envoy_on_request(request_handle)
local authority = request_handle:headers():get(":authority")
local version_header = request_handle:headers():get("project-version")
if authority == "aaa.com" then
if version_header == nil then
request_handle:headers():add("project-version", "release-aaa")
end
elseif authority == 'bbb.com' then
if version_header == nil then
request_handle:headers():add("project-version", "release-bbb")
end
end
end

问题2

Istio 使用 Envoy 作为数据面转发 HTTP 请求,而 Envoy 默认要求使用 HTTP/1.1 或 HTTP/2,当客户端使用 HTTP/1.0 时就会返回 426 Upgrade Required

而我们的网关是基于openresty开发, 而openresty又基于nginx, nginx默认http version是1.0, 所以这里最后是改了网关的proxy_http_version 1.1;

问题3

https://github.com/istio/istio/issues/41709

https://discuss.istio.io/t/nginx-proxy-pass-to-istio-ingress-gateway-404/4330/3

当使用nginx作为服务网关时, 可能在代理服务前配置了proxy_set_header Host xxx;, 需要注意这里会影响到下游请求的路由规则, 因为VirtualService会根据hosts来判断当前流量规则是否生效

所以如果proxy_pass结合upstream使用, 需要在nginx中配置proxy_set_header Host <service-name>;, 如果不是结合upstream使用可以配置proxy_set_header Host $proxy_host

问题4

参考:

https://github.com/istio/istio/issues/41826

https://github.com/envoyproxy/envoy/issues/14981

https://blog.csdn.net/luo15242208310/article/details/96480095

偶现访问超时, 然后报503的问题

页面响应消息: upstream connect error or disconnect/reset before headers. reset reason: connection termination

一个在istio中经常见到的问题, 但是引起的原因很多, 这次问题的起因我也没摸清, 只能按着kiali提供的链路图去看, 发现问题是外界流量到服务网关这里引起, 然后查看服务网关的envoy日志, 如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
"downstream_local_address":"10.42.2.122:80",
"bytes_received":0,
"route_name":"default",
"downstream_remote_address":"10.42.1.0:0",
"upstream_cluster":"inbound|80||",
"upstream_local_address":"127.0.0.6:48839",
"upstream_transport_failure_reason":null,
"connection_termination_details":null,
"duration":3825,
"x_forwarded_for":"175.162.8.253,10.42.1.0",
"path":"/xxxxxx/api/weapp/v2.0/products?product_type=BEST_SELL",
"start_time":"2022-11-08T07:23:09.887Z",
"requested_server_name":"outbound_.80_.release-k3s_.orange-gateway-a.sopei-biz.svc.cluster.local",
"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.3 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1 wechatdevtools/1.06.2210310 MicroMessenger/8.0.5 Language/zh_CN webview/",
"upstream_host":"10.42.2.122:80",
"method":"GET",
"protocol":"HTTP/1.1",
"bytes_sent":95,
"response_code_details":"upstream_reset_before_response_started{connection_termination}",
"response_flags":"UC",
"authority":"xxxxxx",
"upstream_service_time":null,
"request_id":"717b6d28-3cfc-40a1-88c2-dec26f9b54b8",
"response_code":503
}

关键字upstream_reset_before_response_started{connection_termination}

如issue14981, 这类问题的一个可能的解释是,当代理开始发送请求时,上游服务器关闭了连接。了解上游连接在发送第一个请求之前打开了多长时间可能会有所帮助。

大概意思就是说, 请求到了服务网关, 后续api服务响应时间过长, 导致网关关闭了连接(其实这个接口确实响应时间过长)

最后的解决办法:

本着试试的心. 在服务网关上面加了层ingressgateway, 然后进行测试, 发现这个问题解决了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: gateway
spec:
selector:
istio: ingressgateway # use istio default controller
servers:
- port:
number: 80
name: http-orange
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: orange-gateway
spec:
gateways:
- gateway
hosts:
- "*"
http:
- route:
- destination:
host: orange-gateway-a
subset: release-k3s
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: orange-gateway-a
spec:
host: orange-gateway-a
subsets:
- labels:
version: release-k3s
name: release-k3s

问题4 更新

针对503问题基本上可通过如下4中方式进行优化:
(1)修改VirtualService中HTTPRetry(attempts, perTryTimeout,retryOn),设置错误重试策略
(注:在envoy中需要同时设置timeout(Envoy参考),即重试的总时间要小于timeout,
在Istio中需同时设置HttpRoute.timeout即可);

(2)修改DestinationRule中HTTPSettings.idleTimeout,设置envoy连接池中连接的空闲缓存时间;

(3)修改DestinationRule中HTTPSettings.maxRequestsPerConnection为1(关闭Keeplive,连接不重用,性能下降);

(4)修改tomcat connectionTimeout(Springboot配置server.connectionTimeout),增加web容器空闲连接超时时间;

同时关于Istio中503的问题排查方法可以参考如下文章:

【英文版】Istio: 503's with UC's and TCP Fun Times

【中文版】Istio:503、UC 和 TCP