Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data synth #560

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Data synth #560

wants to merge 9 commits into from

Conversation

webcoderz
Copy link

initial PR for people data synthesis

setup.py Outdated Show resolved Hide resolved
for _ in range(num_calls):
if len(non_affiliated_people) > 1:
caller, callee = non_affiliated_people.sample(n=2, replace=False)['phone_number'].values
self.add_call_log(call_logs_df, caller, callee, start_date)
Copy link
Contributor

@lmeyerov lmeyerov Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be nice to, ahead of time, determine some overlapping social networks, vs random connections

checkout igraph's sbm & forest fire game generators, that can dictate who connects to who for N samples

e.g., given N users and intent for E relns, I think you can get [(a,b), (x, y), ...] from one of igraph's generators, and then turn those into call logs generator calls

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea will look into this

leader_calls = int(num_affiliated_calls * leader_call_percentage)
gang_calls = num_affiliated_calls - leader_calls

# Generate intra-gang calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neato

i think this is similar to above... maybe for each gang:

  • generate a general social network within the gang
  • inject the leader or hierarchy or cell structure you want
  • add some random overlap between the gang, other gangs, and unaffiliated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(+ comment on burners)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea was thinking about how to do this wrt the intra-gang call logs , for example only a leader would likely call to another gangs leaders


return call_logs_df

def generate_affiliated_call_logs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this modeling phenomena like burner phones vs regular? i'm not sure how that'd look

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, this is gang affiliated call logs versus regular, burner would be interesting, not sure how to model that though , maybe like similar profiles like someone calls x,y,z numbers on main, but then realized he slipped up and switches to a burner and calls same numbers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think you'd want the distributions to mirror

so like your simulation secretly tracks who has what burner when, and has burners call one another most of the time, and occasionally the slipup of calling a main

df = pd.DataFrame(records)
return df

def generate_call_logs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a notion of tracked phone numbers?

  • most foolks slooowly rotate the main phone
  • some folks quickly rotate their burners, and are primarily used for calling burners? something like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and burners are / aren't associated with people, e.g., sometimes known, sometimes not?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea was thinking maybe I do something similar to the whereabouts where it has a date tied to the person at that address

Copy link
Contributor

@lmeyerov lmeyerov Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, similar to the social network, i think either:

  • prepick a distribution/network/etc ahead of time, and follow that
  • do a per-person/community timeline 'simulation'

i think we want to guarantee that burners are active only for a period of time before retiring, so i can imagine some sort of simulation:

ACTION_CALL_BURNER=0
ACTION_CALL_MAIN=1
ACTION_DESTROY_BURNER=2
ACTION_NEW_BURNER=3
ACTION_REPLACE_PHONE=4

person_to_burners : Dict[PersonId, List[BurnerId]]
person_to_main_phone : Dict[PersonId, PhoneId]

for tick:
  person = pick(person_to_burners.keys())  
  switch random():
     case ACTION_CALL_BURNER:
       ...

Copy link
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit hard to see without viz etc but:

  • see comment on injecting social networks, vs picking random pairs that'll come out funny looking wrt graph

  • i didn't quite follow the split between people, entities (addresses, phones, ...), and records (call logs, criminal, ....) linking them together, maybe easier to see

@webcoderz
Copy link
Author

a bit hard to see without viz etc but:

  • see comment on injecting social networks, vs picking random pairs that'll come out funny looking wrt graph

  • i didn't quite follow the split between people, entities (addresses, phones, ...), and records (call logs, criminal, ....) linking them together, maybe easier to see

Yea this was the next step was to figure this part out

seperated the profile generation out into a seperate module that is now a faker factory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants