nameReweight NA issue #54

EmericA570 · 2021-07-15T13:28:12Z

Hello everyone,

Nice work with the package. It works well for me.

I just have a few question about reweighting posterior probabilities. After using nameReweight or just fastLink with nameReweight and firstname.field I only have NA in zeta.name. I don't understand why. I looked in the function and it should be because of that : 'matches.names.A$zeta.j.names[matches.names.A[,ind] != 2] <- NA'. But I don't understand it.

Also I would need to reweight using more than one field. I already did some modification but I wanted to know if there was any reason why you didn't do it.

In fact I realized that I'm not really of how to use the nameReweight function. Could you explain me ?

Best,

Emeric

tedenamorado · 2021-07-31T22:29:38Z

Hi @AuriantEmeric,

I hope all is well. Sorry for the late reply.

The name reweight function takes the empirical distribution of names and basically reweights matches according to the name frequency. This leads to common names being down-weighted and matching on infrequent names up weights the matching probability.

Our code, as it currently stands, can reweight probabilities based on one field. For example:

## Load the package and data
library(fastLink)
data(samplematch)

## The fastLink function only allows you do reweight one field at a time
matches.out <- fastLink(
  dfA = dfA, dfB = dfB, 
  varnames = c("firstname", "middlename", "lastname", "housenum", "streetname", "city", "birthyear"),
  stringdist.match = c("firstname", "middlename", "lastname", "streetname", "city"),
  partial.match = c("firstname", "lastname", "streetname"),
  reweight.names = T,
  firstname.field = c("firstname")
)

## You can also reweight by last name
matches.out <- fastLink(
  dfA = dfA, dfB = dfB, 
  varnames = c("firstname", "middlename", "lastname", "housenum", "streetname", "city", "birthyear"),
  stringdist.match = c("firstname", "middlename", "lastname", "streetname", "city"),
  partial.match = c("firstname", "lastname", "streetname"),
  reweight.names = T,
  firstname.field = c("lastname")
)

Now, to reweight by two fields, you would need to make further assumptions about the prevalence of names and last names. For example, if you were to assume first and last names are independent, then you can just multiply the matching probabilities after adjusting for first name frequency and the last name frequency counterparts.

If anything, please do not hesitate to reach out.

All my best,

Ted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nameReweight NA issue #54

nameReweight NA issue #54

EmericA570 commented Jul 15, 2021 •

edited

Loading

tedenamorado commented Jul 31, 2021

nameReweight NA issue #54

nameReweight NA issue #54

Comments

EmericA570 commented Jul 15, 2021 • edited Loading

tedenamorado commented Jul 31, 2021

EmericA570 commented Jul 15, 2021 •

edited

Loading