-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for a way to feed threshold cutoffs to individual variables #66
Comments
@ajw5296 if you are using the fastLink wrapper function, it is not possible (those cutpoints are global). If anything, let us know. All my best, Ted |
@ajw5296 can you provide an example of what you have in mind here? Is your question about cutoff about how we compare variables or about the weight each variable receives when predicting the probability that two records are the same? Looking forward to hearing from you! Ted |
Hey @tedenamorado, my question is more about cutoffs, and if they can be set at a variable level, more preciously
I suppose this is kind of a question about weights in a way, but I think the setting a higher weight for dob is methodologically different than setting a cutoff for dob. But if setting parameters for weights is easier, I'm interested in looking into it. And just as a note, we looked into the stringSubset method, but since DOBs are shared values, it didn't really help us much. Let me know if I can provide more info, thanks for your help! |
I do not think it is possible in The Python-based For what it is worth, to me this seems of little use compared with other promised features under development such as probabilistic blocking and active learning. |
Hi @ajw5296, As @aalexandersson mentions, it is not possible to set deterministic rules based on the probability of observing a specific agreement value for field k given that a pair of records is a match. The model learns these probabilities from the data. Our focus is on the Probability that a pair of records is a match given the agreement pattern and the parameters of the model, which is a composite measure of the field-specific probabilities of observing an agreement value given that a pair of records is a match. However, an alternative would be to pass your own set of parameters to fastLink. For example, we discuss how to pass parameters from a random sample of observations to a larger dataset here. Please, if you feel we can be of further assistance, let us know. All my best, Ted |
Is there a way to set different cutoff values for certain variables. For instance, if the DOB variable between a potential match isn't above .9, then that wouldn't be considered a match, but all other variables have a cut off of .8.
The text was updated successfully, but these errors were encountered: