Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Enhance self-learning collection of http2/gRPC header key values #8242

Open
3 tasks done
Fancyki1 opened this issue Sep 28, 2024 · 7 comments
Open
3 tasks done
Assignees
Labels

Comments

@Fancyki1
Copy link

Fancyki1 commented Sep 28, 2024

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

需求: deepflow v6.4版本实现了eBPF kprobe 高性能解码 HTTP2 压缩头,自动学习通信双方的压缩字典,但是在实际过程中采集自定义header存在丢失乱序覆盖的问题,希望使用只采集value去解决自定义头匹配的问题
文章来源:https://www.deepflow.io/blog/zh/053-high-performance-decoding-of-http2-compressed-headers-using-ebpf-kprobe/
缺陷:

  1. 对于 deepflow-agent 启动之前就已经存在的 HTTP2 长连接,已存在的动态字典表项无法解码
  2. 使用 cBPF 时,由于网络中可能存在丢包、重传、乱序等因素,因此对压缩头不的还原可能存在误差(但 eBPF kprobe 无此限制)
  3. 实际测试v6.5版本可能存在压缩字典乱序的问题,导致采集内容key和value对应不上

问题描述: 对于可能存在压缩字典乱序的问题,导致采集内容key和value对应不上,实测效果

static_config:
   l7-protocol-advanced-features:
     extra-log-fields:
       http2:
       -  field-name: "x-custom-code"
       -  field-name: "x-custom-msg"
       -  field-name: "x-custom-data"

发送一个http2/gRPC的请求

:authority: www.xxxx.com
:method: POST
:path: /list?aid=6383&sdk_version=5.1.18_zip&device_platform=web&zip=1
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br, zstd
accept-language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
content-encoding: gzip
content-length: 5368
content-type: application/json; charset=utf-8
origin: https://www.xxxx.com
priority: u=1, i
referer: https://www.xxxx.com/
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36 Edg/129.0.0.0
x-custom-code: 200
x-custom-msg: success
x-custom-data: {"test": "data"}

技术原理:https://kiosk007.top/post/http-2-0-header-compression/
http2索引表包括:静态表rfc7541和动态表
image

Server代码落库位置:

deepflow\server\ingester\flow_log\log_data\l7_flow_log.go

// AttributeNames = [] 数组 和 AttributeValues = [] 数组
// 映射关系是一对一 key=>value关系:AttributeNames[i]=>AttributeValues[i]
h.AttributeNames = append(h.AttributeNames, l.ExtInfo.AttributeNames...)
h.AttributeValues = append(h.AttributeValues, l.ExtInfo.AttributeValues...)
h.MetricsNames = append(h.MetricsNames, l.ExtInfo.MetricsNames...)
h.MetricsValues = append(h.MetricsValues, l.ExtInfo.MetricsValues...)

落库结果举例:

# 情况1:正常,少数
AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
# 情况2:异常,大量
# x-custome-msg 被 x-custome-code 覆盖,索引表解析乱序
AttributeNames = ["rpc_services","x-custom-code","x-custom-code","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]
# x-custome-code 被 x-custome-data 覆盖,索引表解析乱序
AttributeNames = ["rpc_services","x-custom-data","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

技术方案:
技术思路:既然自学习HTTP2头解析索引表还是存在一些不足,不如从有特点的value入手通过配置进行补全

首先来一个通用简单的场景,分隔符处理,定义一个header

# 定义的header key :x-custom-content,没有实际意义,如果wireshark和deepflow学习不到这个值的时候是unknown
# 特定字符串分隔符:!#!
x-custom-content: "200!#!success!#!{\"test\": \"data\"}"
# 实际协议解析可能为:unknown:"200!#!success!#!{\"test\": \"data\"}"

增加一个配置:这里有几个不同的方案,经过实测后

static_config:
   l7-protocol-advanced-features:
     extra-log-fields:
       http2:
       -  field-name: "x-custom-code"
          match-value-rule: "!#!"
          field-value-index: 0
       -  field-name: "x-custom-msg"
          match-value-rule: "!#!"
          field-value-index: 1
       -  field-name: "x-custom-data"
          match-value-rule: "!#!"
          field-value-index: 2

由于特殊分隔符的情况较少,解析header时候可以被特殊分隔符分割且分割后的长度大于等于2的value,按照匹配规则和预定义的key进行补全。

补全后的结果和正常自学习header结果一致,效果稳定

AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

场景补充:正则匹配处理(字段冗余思路)

# 定义的header key :x-custom-content,http2协议标准,动态表的一个字段,解析没有实际意义
# 特定字符串分隔符:!#!
x-custom-code: "x-custom-code:200"
x-custom-msg: "x-custom-msg:success"
x-custom-data: "x-custom-data:{\"test\": \"data\"}"
# 实际协议解析可能为:
# unknown: "x-custom-code:200"
# unknown: "x-custom-msg:success"
# unknown: "x-custom-data:{\"test\": \"data\"}"

增加一个配置

static_config:
   l7-protocol-advanced-features:
     extra-log-fields:
       http2:
       -  field-name: "x-custom-code"
          match-value-rule: "^x-custom-code:(.*)"
          field-value-index: 0
       -  field-name: "x-custom-msg"
          match-value-rule: "^x-custom-msg:(.*)"
          field-value-index: 0
       -  field-name: "x-custom-data"
          match-value-rule: "^x-custom-data:(.*)"
          field-value-index: 0

举例伪代码处理:

import re

input_string = "x-custom-msg:success"
pattern = r"^x-custom-msg:(.*)"

match = re.match(pattern, input_string)

if match:
    result = match.group(1)
    print("匹配成功!")
    print("提取的内容:", result) # success
else:
    print("匹配失败")

匹配解析后的结果

# x-custom-code: "200"
# x-custom-msg: "success"
# x-custom-data: "{\"test\": \"data\"}"
AttributeNames = ["rpc_services","x-custom-code","x-custom-msg","x-custom-data"]
AttributeValues = ["xxx","200","success","{\"test\": \"data\"}"]

备注: 采用HTTP2静态表中的字段user-agentserver,deepflow采集的效果稳定很多,但是对应的server代码要做修改处理,静态表字段并不符合协议标准和存在不安全性,看能否兼容动态表处理,兼容自定义http2 header的场景
@sharang

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@sharang sharang self-assigned this Sep 30, 2024
@sharang
Copy link
Member

sharang commented Oct 15, 2024

@Fancyki1 你提到的方法挺好的,相当于定义一个 http/grpc header injection 的规范,通过 value 的特殊性,在一个 value 中放进去所有需要 injection 的内容。

我们想想如何能在规范层面推进这种做法。

@gbling
Copy link

gbling commented Nov 11, 2024

请问一下低于6.4的版本会有这个问题吗?

@Fancyki1
Copy link
Author

Fancyki1 commented Nov 12, 2024

@gbling 文章来源都有:https://www.deepflow.io/blog/zh/053-high-performance-decoding-of-http2-compressed-headers-using-ebpf-kprobe/
image
6.4之前都不支持这个功能

@gbling
Copy link

gbling commented Nov 13, 2024

@Fancyki1 想再确认一下,HTTP1.1 协议的也会有同样的情况么?

@Fancyki1
Copy link
Author

@gbling http1.1 可以用wasm插件解析去实现,不需要用到这个特性

@gbling
Copy link

gbling commented Nov 13, 2024

@Fancyki1 是这样的,我们在测试链路追踪的时候通过自定义的 http_log_x_request_id 做链路的关联,内部链路调用都是用 http1.1 ,会存在链路不全的情况;是想再明确一下这个特性是只对 HTTP2/gRPC 生效,还是 http1.1 也会生效的?

@Fancyki1
Copy link
Author

Fancyki1 commented Nov 13, 2024

@gbling 你多看看文档,文档里面都写了

    ## Configuration to extract the customized header fields of HTTP, HTTP2, GRPC protocol etc
    #extra-log-fields:
    ## for example:
    ## http:
    ## - field-name: "user-agent"
    ## - field-name: "cookie"
    #  http: []
    #  http2: []

你用>v6.4版本,配置了http就启用了http1.1,而且http1.1不存在http2索引表的采集乱序不全的问题,直接用就好了,而且你要弄明白你要实现什么效果,如果是链路追踪那和这个没什么关系,如果想用这个看链路追踪是否每个请求都有http_log_x_request_id那倒是可以辅助排障使用

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants