Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update expensify_prod branch #1982

Merged
merged 23 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
1a05b3a
changing message when rejecting crash command
danieldoglas Nov 21, 2024
cf42859
adding custom logic for accountID and AuthEmail, probably will refact…
danieldoglas Nov 21, 2024
2680e3b
whitelisting email as a valid paramter to log
danieldoglas Nov 21, 2024
f361615
removing confusing log line
danieldoglas Nov 21, 2024
e4ce90c
adding log line on signal handler in run command
danieldoglas Nov 21, 2024
e2b0618
adding new method that can be reused in auth
danieldoglas Nov 21, 2024
0ac1dea
adding method with default value
danieldoglas Nov 21, 2024
2ec759b
adding getLogParams to all crash-indulcing logs
danieldoglas Nov 21, 2024
b2515dd
adding requestID to the default params
danieldoglas Nov 21, 2024
4065792
improving logs
danieldoglas Nov 21, 2024
94ca829
renaming method so we don't conflict with Auth, changing logs to aler…
danieldoglas Nov 21, 2024
e999b6d
style change
danieldoglas Nov 21, 2024
24fd532
addressing review comments
danieldoglas Nov 22, 2024
f598bad
making the command log everything that was in the request
danieldoglas Nov 22, 2024
285ab2b
adding new keys
danieldoglas Nov 22, 2024
9257a8f
changing die function to return a string that we can log
danieldoglas Nov 22, 2024
3dafa7c
Merge branch 'main' into dsilva_improvingLogsForCrashes
danieldoglas Nov 22, 2024
89ed70c
whitelisting badcoomand test parameters
danieldoglas Nov 22, 2024
8f23d98
Update BedrockCommand.cpp
danieldoglas Nov 25, 2024
eb0c80b
addressing comments
danieldoglas Nov 25, 2024
2398375
Merge branch 'main' into dsilva_improvingLogsForCrashes
danieldoglas Nov 25, 2024
bfd68c4
addressing more comments
danieldoglas Nov 26, 2024
762abd3
Merge pull request #1973 from Expensify/dsilva_improvingLogsForCrashes
danieldoglas Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions BedrockServer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,7 @@ void BedrockServer::sync()
// this, which would probably make this message malformed. This is the best we can do.
SSetSignalHandlerDieFunc([&](){
_clusterMessenger->runOnAll(_generateCrashMessage(command));
return addLogParams("CRASHING from BedrockServer::sync, command:" + command->request.methodLine, command->request.nameValueMap);
});

// And now we'll decide how to handle it.
Expand Down Expand Up @@ -579,7 +580,9 @@ void BedrockServer::sync()
}
} while (!_syncNode->shutdownComplete() || BedrockCommand::getCommandCount());

SSetSignalHandlerDieFunc([](){SWARN("Dying in shutdown");});
SSetSignalHandlerDieFunc([](){
return "Dying in shutdown";
});

// If we forced a shutdown mid-transaction (this can happen, if, for instance, we hit our graceful timeout between
// getting a `BEGIN_TRANSACTION` and `COMMIT_TRANSACTION`) then we need to roll back the existing transaction and
Expand Down Expand Up @@ -667,7 +670,7 @@ void BedrockServer::worker(int threadId)
try {
// Set a signal handler function that we can call even if we die early with no command.
SSetSignalHandlerDieFunc([&](){
SWARN("Die function called early with no command, probably died in `commandQueue.get`.");
return "Die function called early with no command, probably died in `commandQueue.get`.";
});

// Get the next one.
Expand Down Expand Up @@ -712,6 +715,7 @@ void BedrockServer::runCommand(unique_ptr<BedrockCommand>&& _command, bool isBlo
// signals, like SIGSEGV, this function will be called.
SSetSignalHandlerDieFunc([&](){
_clusterMessenger->runOnAll(_generateCrashMessage(command));
return addLogParams("CRASHING from BedrockServer::runCommand, command:" + command->request.methodLine, command->request.nameValueMap);
});

// If we dequeue a status or control command, handle it immediately.
Expand All @@ -722,7 +726,7 @@ void BedrockServer::runCommand(unique_ptr<BedrockCommand>&& _command, bool isBlo
// Check if this command would be likely to cause a crash
if (_wouldCrash(command)) {
// If so, make a lot of noise, and respond 500 without processing it.
SALERT("CRASH-INDUCING COMMAND FOUND: " << command->request.methodLine);
SALERT("REJECTING CRASH-INDUCING COMMAND, command:" + command->request.methodLine, command->request.nameValueMap);
command->response.methodLine = "500 Refused";
command->complete = true;
_reply(command);
Expand Down
14 changes: 12 additions & 2 deletions libstuff/SLog.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,17 +43,22 @@ void SLogStackTrace(int level) {
// If the param name is not in this whitelist, we will log <REDACTED> in addLogParams.
static const set<string> PARAMS_WHITELIST = {
"accountID",
"authEmail",
"accountIDs",
"attendees",
"bankAccountID",
"cardData",
"cardID",
"clientUpdateID",
"command",
"companyName",
"companyWebsite",
"Connection",
"Content-Length",
"currentTime",
"domainAccountID",
"domainName",
"email",
"errorMessage",
"feed",
"feedCountry",
Expand All @@ -65,10 +70,13 @@ static const set<string> PARAMS_WHITELIST = {
"invoice",
"isUnique",
"key",
"lastIP",
"logParam",
"nvpName",
"policyAccountID",
"policyID",
"reportID",
"requestID",
"requestTimestamp",
"secondaryLogin",
"shouldCompleteOnboarding",
Expand All @@ -78,10 +86,12 @@ static const set<string> PARAMS_WHITELIST = {
"timeDiff",
"token",
"transactionID",
"type"
"type",
"userID",
"secondaryLogin"
};

string addLogParams(string&& message, const map<string, string>& params) {
string addLogParams(string&& message, const STable& params) {
if (params.empty()) {
return message;
}
Expand Down
23 changes: 16 additions & 7 deletions libstuff/SSignal.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
#include <unistd.h>
#include <format>

thread_local function<void()> SSignalHandlerDieFunc;
void SSetSignalHandlerDieFunc(function<void()>&& func) {
thread_local function<string()> SSignalHandlerDieFunc;
void SSetSignalHandlerDieFunc(function<string()>&& func) {
SSignalHandlerDieFunc = move(func);
}

Expand Down Expand Up @@ -78,7 +78,7 @@ void SClearSignals() {

void SInitializeSignals() {
// Our default die function does nothing.
SSignalHandlerDieFunc = [](){};
SSignalHandlerDieFunc = [](){ return ""; };

// Clear the thread-local signal number.
_SSignal_threadCaughtSignalNumber = 0;
Expand Down Expand Up @@ -246,16 +246,25 @@ void _SSignal_StackTrace(int signum, siginfo_t *info, void *ucontext) {
}
free(frame);
}
close(fd);

// Done.
free(callstack);

// Call our die function and then reset it.
SWARN("Calling DIE function.");
SSignalHandlerDieFunc();
SSignalHandlerDieFunc = [](){};
string logMessage = SSignalHandlerDieFunc();
if (!logMessage.empty()) {
SALERT(logMessage);
}
SSignalHandlerDieFunc = [](){ return ""; };
SWARN("DIE function returned.");

// Finish writing the crash file with the request details if it exists
if (fd != -1 && !logMessage.empty()) {
logMessage += "\n";
write(fd, logMessage.c_str(), strlen(logMessage.c_str()));
}
close(fd);

if (SQLiteNode::KILLABLE_SQLITE_NODE) {
SWARN("Killing peer connections.");
SQLiteNode::KILLABLE_SQLITE_NODE->kill();
Expand Down
8 changes: 6 additions & 2 deletions libstuff/libstuff.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@ extern void* SSIGNAL_NOTIFY_INTERRUPT;
// Initialize libstuff on every thread before calling any of its functions
void SInitialize(string threadName = "", const char* processName = 0);

void SSetSignalHandlerDieFunc(function<void()>&& func);
// This function sets a lambda that will be executed while the process is being killed for any reason
// (e.g. it crashed). Since we usually add logs in the lambda function, we'll also need to return the log as a
// string so we can write that log in the crash file. We do that to guaurantee we'll have the log message
// instantly available in the crash file instead of depending on rsyslog, which can be late.
void SSetSignalHandlerDieFunc(function<string()>&& func);

// --------------------------------------------------------------------------
// Assertion stuff
Expand Down Expand Up @@ -235,7 +239,7 @@ void SSyslogSocketDirect(int priority, const char* format, ...);
// Atomic pointer to the syslog function that we'll actually use. Easy to change to `syslog` or `SSyslogSocketDirect`.
extern atomic<void (*)(int priority, const char *format, ...)> SSyslogFunc;

string addLogParams(string&& message, const map<string, string>& params = {});
string addLogParams(string&& message, const STable& params = {});

// **NOTE: rsyslog default max line size is 8k bytes. We split on 7k byte boundaries in order to fit the syslog line prefix and the expanded \r\n to #015#012
#define SWHEREAMI SThreadLogPrefix + "(" + basename((char*)__FILE__) + ":" + SToStr(__LINE__) + ") " + __FUNCTION__ + " [" + SThreadLogName + "] "
Expand Down