- Start Date: 2023-02-01
- RFC Type: feature
- RFC PR: #71
- RFC Status: approved
To make it possible to continue a trace over process boundaries (think of one program starting another one) For this to work we will use environment variables as the carrier. An SDK should be able to fetch tracing data from environment variables on init()
to continue a trace started in another process.
The Sentry "Cron Monitors" feature suggests that one can add something like this to the crontab:
10 * * * * sentry-cli monitor run <monitor_id> -- python my_cron_job.py
.
The sentry-cli
call starts a trace (so it is the head of trace) and the Sentry SDK in my_cron_job.py
should connect it's transactions to that trace.
Trace propagation is: giving trace information from one service (the first service, or head of trace) to a second service, so that the second service can create transactions that are attached to the trace created in the first service.
In most cases we use HTTP as the carrier for trace information. The information is propagated through two HTTP headers (sentry-trace
and baggage
).
A few integrations (for example job queues) use queue specific meta data fields for propagating trace information. (but those implementations only propagate sentry-trace
and maybe the legacy tracestate
that will be rmeoved, but NOT baggage
so some information is lost here.)
The trace information can also be injected into rendered HTML as a HTML tag.
See Appendix A if you want to know how this works in the Python SDK.
We will implement Option A.
In the develop docs a specification of this RFC will be added and afterwards it will be first implemented in Python.
When the SDK starts up it reads an environment variable SENTRY_USE_ENVIRONMENT
.
If SENTRY_USE_ENVIRONMENT
is set to false
(or rather false|False|FALSE|n|no|No|NO|off|Off|OFF|0
) then the SDK should not read tracing information from the environment. Otherwise (if SENTRY_USE_ENVIRONMENT
is NOT set or set to an arbitrary value not in the list of falsy values) the SDK reads tracing information from the environment. So the default behaviour is trying to read tracing information from the environment.
The following environment variables is read:
SENTRY_BAGGAGE
(similar tobaggage
HTTP header)SENTRY_TRACE
(similar tosentry-trace
HTTP header)
The environment variables contain the same strings that the respecitve HTTP headers would contain. The SDK parses the string values from the environment variables and stores the information in the propagation context on the scope.
To successfully attach a transaction to an existing trace at least SENTRY_TRACE
must have data.
For dynamic sampling not to break, SENTRY_BAGGAGE
needs to include all information of the dynamic sampling context. The tracing information parsed from this must not be changed.
If SENTRY_USE_ENVIRONMENT
is set to true
and no information can be found in SENTRY_BAGGAGE
or SENTRY_TRACE
then the current process is the head of trace and a dynamic sampling context should be created.
See Unified Propagation Mechanism for details.
See Dynamic Sampling Context Payloads to see what data needs to be included in the dynamic sampling context.
The integrations that patch functions that are used for spawning new processes (StdlibIntegration
in Python) should be changed so they use the parsed tracing information (if any) and serialize it to the environment variables (SENTRY_BAGGAGE
, SENTRY_TRACE
) for the newly spawned process. The variable SENTRY_USE_ENVIRONMENT
should also be set to true
so the receiving process is picking up the information.
While initializing, SDK checks for environment variables: SENTRY_TRACING_FILE_BAGGAGE
and SENTRY_TRACING_FILE_SENTRY_TRACE
. Each of those variables might contain a file path that points to a file containing the tracing information.
For example, SENTRY_TRACING_FILE_SENTRY_TRACE=/dir1/file1
indicates that sentry-trace
value should be read from the file located at /dir/file1
.
Note: the actual file object might also be, for example, a named pipe.
After the values are read from the files, the same considerations as in Option A apply.
Main differences from Option A:
- Using files, tracing information can be passed not only from a parent process to a child process, but also between processes that don't have direct parent-child relationships.
- Files used for transport can be updated by any other process (as far as filesystem permissions allow), and the SDK can technically re-read the file multiple times during the lifetime of the host application. However, I don't have a good use case for this right now.
- Message passing via files is generally more cumbersome and might have performance/reliability drawbacks, depending on the type of file (plain file, named pipe) and the backing filesystem.
- Because the serialization format of the bagagge (and thus the dynamic sampling context) stays the same, just the carrier is a new one, it should be 100% compatible to other integrations and will not break dynamic sampling. So: No drawbacks.
- Should we create a new
transaction_info.source
for this kind of transactions that span one execution of a process? - We probably need a flag on a transaction that says "do not emit me" but do consider me sampled (to avoid charging customers unnecessary transactions). This is out of scope for this RFC, but will be handled in a separate RFC.
How trace propagation is working in the Python SDK right now:
When a request from another service is received a transaction is created. (So each request-response cycle is a transaction). The process of creating a transaction that attaches to an existing trace received from the calling service is done as follows:
- data from the
baggage
header is parsed. If it contains values with asentry-
prefix the parsed baggage is frozen (marked as immutable). - data from the
sentry-trace
header is parsed. (trace_id
,parent_span_id
andparent_sampled
) If this header exists and contains values the parsed baggage from step is is frozen (marked as immutable). - DEPRECATED and will be removed soon: data from the
tracestate
header is parsed. (sentry_tracestate
andthird_party_tracestate
). - With all this information a new transaction is created.
- If an outgoing request is done this tracing information is attached to the request as HTTP headers, thus propagated.
- There is the option
propagate_traces
that can turn off the propagation of traces (but not yet implemented in all integrations, so some integrations ignore them).
This should adhere to the Unified Propagation Mechanism.
Currently every integration itself takes care of parsing tracing information and attaching it to Transactions or outgoing requests.