Skip to content

Commit

Permalink
[pfcp] adding logic to detect pfcp association failures and reestabli…
Browse files Browse the repository at this point in the history
…sh sessions
  • Loading branch information
spencersevilla committed Jan 19, 2024
1 parent a55503a commit 2f998cd
Show file tree
Hide file tree
Showing 11 changed files with 76 additions and 5 deletions.
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,12 @@ The files `{service}_start_time` (e.g. `/tmp/open5gs/mme_start_time`) are compri
Each service also has its own subdirectory (e.g. `/tmp/open5gs/mme/`) that contains exports other information/values. The files with a `num` in the front (e.g. `/tmp/open5gs/mme/num_enbs`) contain a single number that indicates how many of that item the service is connected to. The files with a `list` in front (e.g. `/tmp/open5gs/mme/list_enbs`) contains more information about each of those connected items. Each item has its own line (i.e. the number of lines in `list_enbs` should equal `num_enbs`), and each line has a series of key-value information separated by spaces (e.g. `ip:10.0.0.2 tac:2`).

The reason I chose to use files is beacuse it's Unix-esque, it's completely agnostic to any specific network monitoring platform, and it should be easy to write a shim for whatever platform you want to integrate against. If you or your org have specific metrics you would like to see exposed, please reach out to me (or just issue a PR) and I will add them.

## Crash-Proof PFCP
In our specific deployment context, we lean heavily on the CUPS split and locate the user-plane-server (sgwc and upf) far from the rest of the core (mme, sgwc, and smf). We often see UPS nodes crash, drop messages, or lose communication for extended periods of time, and as a result we needed to substantially harden the PFCP interface against these conditions. This hardening takes two main forms, detailed below:

##### Idempotent PFCP Operations:
Wherever possible, we have changed PFCP messages/operations from an action-based model to a state-based one, with the understanding/assumption that the CPS is always authoritative/correct over the UPS. In this design, the UPS should use messages to infer the state that the CPS wants, and silently recover into this state if able. For Session Establishment messages, this means that if the UPS already has an existing session for the given SEID, it should simply wipe out the preexisting session, create a new one according to the CPS details, and return OGS_PFCP_CAUSE_REQUEST_ACCEPTED. Similarly, for Session Delete messages where the SEID does not exist, the UPS can return OGS_PFCP_CAUSE_REQUEST_ACCEPTED without doing anything.

##### PFCP Session Re-Establishment:
When a UPS re-associates itself with the CPS after a period of disconnectivity, their respective states may have diverged substantially. We handle this using a reassociation process wherein (1) the CPS sends a Session-Set-Delete message, effectively instructing the UPS to delete any/all current sessions, and then (2) the CPS re-sends a Session Establishment message for each active session. This creates a bit more chatter over the wire than you might expect, but works incredibly well in our context.
2 changes: 2 additions & 0 deletions lib/pfcp/context.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,8 @@ typedef struct ogs_pfcp_node_s {

ogs_pfcp_up_function_features_t up_function_features;
int up_function_features_len;

bool already_associated;
} ogs_pfcp_node_t;

typedef enum {
Expand Down
20 changes: 20 additions & 0 deletions src/sgwc/pfcp-path.c
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,26 @@ int sgwc_pfcp_send_session_establishment_request(
return rv;
}

int sgwc_pfcp_resend_established_sessions(ogs_pfcp_node_t *node)
{
sgwc_ue_t *sgwc_ue = NULL;
sgwc_sess_t *sess = NULL;

ogs_pfcp_cp_send_session_set_deletion_request(node, NULL);

ogs_list_for_each(&sgwc_self()->sgw_ue_list, sgwc_ue) {
ogs_list_for_each(&sgwc_ue->sess_list, sess) {
if (sess->pfcp_node == node) {
if (sess->pfcp_established) {
sess->sgwu_sxa_seid = 0;
sgwc_pfcp_send_session_establishment_request(sess, NULL, NULL);
}
}
}
}
return OGS_OK;
}

int sgwc_pfcp_send_session_modification_request(
sgwc_sess_t *sess, ogs_gtp_xact_t *gtp_xact,
ogs_pkbuf_t *gtpbuf, uint64_t flags)
Expand Down
1 change: 1 addition & 0 deletions src/sgwc/pfcp-path.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ int sgwc_pfcp_send_bearer_to_modify_list(
int sgwc_pfcp_send_session_establishment_request(
sgwc_sess_t *sess, ogs_gtp_xact_t *gtp_xact, ogs_pkbuf_t *gtpbuf,
uint64_t flags);
int sgwc_pfcp_resend_established_sessions(ogs_pfcp_node_t *node);

int sgwc_pfcp_send_session_modification_request(
sgwc_sess_t *sess, ogs_gtp_xact_t *gtp_xact,
Expand Down
7 changes: 7 additions & 0 deletions src/sgwc/pfcp-sm.c
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,11 @@ void sgwc_pfcp_state_associated(ogs_fsm_t *s, sgwc_event_t *e)
ogs_error("PFCP restoration");
}

if (node->already_associated) {
sgwc_pfcp_resend_established_sessions(node);
}
node->already_associated = true;

stats_update_sgwc_pfcp_nodes();

break;
Expand Down Expand Up @@ -285,13 +290,15 @@ void sgwc_pfcp_state_associated(ogs_fsm_t *s, sgwc_event_t *e)
OGS_PORT(&node->addr));
ogs_pfcp_cp_handle_association_setup_request(node, xact,
&message->pfcp_association_setup_request);
sgwc_pfcp_resend_established_sessions(node);
break;
case OGS_PFCP_ASSOCIATION_SETUP_RESPONSE_TYPE:
ogs_warn("PFCP[RSP] has already been associated [%s]:%d",
OGS_ADDR(&node->addr, buf),
OGS_PORT(&node->addr));
ogs_pfcp_cp_handle_association_setup_response(node, xact,
&message->pfcp_association_setup_response);
sgwc_pfcp_resend_established_sessions(node);
break;
case OGS_PFCP_SESSION_ESTABLISHMENT_RESPONSE_TYPE:
if (!message->h.seid_presence) ogs_error("No SEID");
Expand Down
4 changes: 2 additions & 2 deletions src/sgwc/sxa-handler.c
Original file line number Diff line number Diff line change
Expand Up @@ -279,8 +279,6 @@ void sgwc_sxa_handle_session_establishment_response(

ogs_assert(sess);

sess->pfcp_established = true;

ogs_debug(" SGW_S5C_TEID[0x%x] PGW_S5C_TEID[0x%x]",
sess->sgw_s5c_teid, sess->pgw_s5c_teid);

Expand Down Expand Up @@ -353,6 +351,8 @@ void sgwc_sxa_handle_session_establishment_response(
/* Setup GTP Node */
OGS_SETUP_GTP_NODE(sess, pgw);

sess->pfcp_established = true;

/* Check Indication */
if (create_session_request->indication_flags.presence &&
create_session_request->indication_flags.data &&
Expand Down
3 changes: 2 additions & 1 deletion src/smf/context.h
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,8 @@ typedef struct smf_sess_s {

ogs_gtp_node_t *gnode;
ogs_pfcp_node_t *pfcp_node;

bool pfcp_established;

smf_ue_t *smf_ue;

bool n1_released;
Expand Down
2 changes: 2 additions & 0 deletions src/smf/gsm-sm.c
Original file line number Diff line number Diff line change
Expand Up @@ -718,9 +718,11 @@ void smf_gsm_state_operational(ogs_fsm_t *s, smf_event_t *e)

switch (e->h.id) {
case OGS_FSM_ENTRY_SIG:
sess->pfcp_established = true;
break;

case OGS_FSM_EXIT_SIG:
sess->pfcp_established = false;
break;

case SMF_EVT_GN_MESSAGE:
Expand Down
25 changes: 23 additions & 2 deletions src/smf/pfcp-path.c
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,7 @@ int smf_5gc_pfcp_send_session_establishment_request(

memset(&h, 0, sizeof(ogs_pfcp_header_t));
h.type = OGS_PFCP_SESSION_ESTABLISHMENT_REQUEST_TYPE;
<<<<<<< HEAD

/*
* 7.2.2.4.2 Conditions for Sending SEID=0 in PFCP Header
Expand Down Expand Up @@ -430,7 +431,7 @@ int smf_5gc_pfcp_send_session_establishment_request(
* over N4 towards another SMF or another PFCP entity in the SMF
* as specified in clause 5.22.2 and clause 5.22.3.
*/
h.seid = sess->upf_n4_seid;
h.seid = 0;

n4buf = smf_n4_build_session_establishment_request(h.type, sess, xact);
if (!n4buf) {
Expand Down Expand Up @@ -575,6 +576,7 @@ int smf_epc_pfcp_send_session_establishment_request(

memset(&h, 0, sizeof(ogs_pfcp_header_t));
h.type = OGS_PFCP_SESSION_ESTABLISHMENT_REQUEST_TYPE;
<<<<<<< HEAD

/*
* 7.2.2.4.2 Conditions for Sending SEID=0 in PFCP Header
Expand Down Expand Up @@ -605,7 +607,7 @@ int smf_epc_pfcp_send_session_establishment_request(
* over N4 towards another SMF or another PFCP entity in the SMF
* as specified in clause 5.22.2 and clause 5.22.3.
*/
h.seid = sess->upf_n4_seid;
h.seid = 0;

n4buf = smf_n4_build_session_establishment_request(h.type, sess, xact);
if (!n4buf) {
Expand All @@ -625,6 +627,25 @@ int smf_epc_pfcp_send_session_establishment_request(
return rv;
}

int smf_epc_pfcp_resend_established_sessions(ogs_pfcp_node_t *node)
{
smf_ue_t *smf_ue = NULL;
smf_sess_t *sess = NULL;

ogs_pfcp_cp_send_session_set_deletion_request(node, NULL);

ogs_list_for_each(&smf_self()->smf_ue_list, smf_ue) {
ogs_list_for_each(&smf_ue->sess_list, sess) {
if (sess->pfcp_node == node) {
if (sess->pfcp_established) {
smf_epc_pfcp_send_session_establishment_request(sess, NULL);
}
}
}
}
return OGS_OK;
}

int smf_epc_pfcp_send_all_pdr_modification_request(
smf_sess_t *sess, void *gtp_xact, ogs_pkbuf_t *gtpbuf,
uint64_t flags, uint8_t gtp_pti, uint8_t gtp_cause)
Expand Down
1 change: 1 addition & 0 deletions src/smf/pfcp-path.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ int smf_5gc_pfcp_send_session_deletion_request(

int smf_epc_pfcp_send_session_establishment_request(
smf_sess_t *sess, void *gtp_xact, uint64_t flags);
int smf_epc_pfcp_resend_established_sessions(ogs_pfcp_node_t *node);
int smf_epc_pfcp_send_all_pdr_modification_request(
smf_sess_t *sess, void *gtp_xact, ogs_pkbuf_t *gtpbuf,
uint64_t flags, uint8_t gtp_pti, uint8_t gtp_cause);
Expand Down
7 changes: 7 additions & 0 deletions src/smf/pfcp-sm.c
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,11 @@ void smf_pfcp_state_associated(ogs_fsm_t *s, smf_event_t *e)
ogs_error("PFCP restoration");
}

if (node->already_associated) {
smf_epc_pfcp_resend_established_sessions(node);
}
node->already_associated = true;

stats_update_smf_pfcp_nodes();

break;
Expand Down Expand Up @@ -300,13 +305,15 @@ void smf_pfcp_state_associated(ogs_fsm_t *s, smf_event_t *e)
OGS_PORT(&node->addr));
ogs_pfcp_cp_handle_association_setup_request(node, xact,
&message->pfcp_association_setup_request);
smf_epc_pfcp_resend_established_sessions(node);
break;
case OGS_PFCP_ASSOCIATION_SETUP_RESPONSE_TYPE:
ogs_warn("PFCP[RSP] has already been associated [%s]:%d",
OGS_ADDR(&node->addr, buf),
OGS_PORT(&node->addr));
ogs_pfcp_cp_handle_association_setup_response(node, xact,
&message->pfcp_association_setup_response);
smf_epc_pfcp_resend_established_sessions(node);
break;
case OGS_PFCP_SESSION_ESTABLISHMENT_RESPONSE_TYPE:
if (!message->h.seid_presence) ogs_error("No SEID");
Expand Down

0 comments on commit 2f998cd

Please sign in to comment.