-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect RDF normalization #396
Comments
@mphoward Thanks for reporting this issue. We'll investigate. |
@bdice thanks! I think I had a similar issue myself when I wrote this plugin code: https://github.com/mphoward/azplugins/blob/master/azplugins/RDFAnalyzer.cc It was a while ago, but I think there was some problem with how the looping over pairs was done and counting which pairs to remove from the normalization. I think I had some issue looping on Not sure if that is relevant to freud's implementation, but wanted to let you know in case it's helpful. |
I suspect the difference between the two programs is here: Line 125 in b483a83
Since there are only Line 107 in b483a83
via here Line 165 in b483a83
The rescaling factor I fit by eye is pretty much This does make me wonder if this is just a difference in definition for small, finite systems. Technically, g(r) for an N particle ideal gas is 1-1/N if it is defined from the two-body correlation function, so the freud result is correct (the large r limit should be 0.95 not 1 for N = 20). But VMD and also LAMMPS use the normalization convention based on pairs, presumably because you are not counting one of the pairs and are interested in the thermodynamic limit of an infinitely replicated system (it shouldn't matter that you throw out exactly N pairs). Thoughts on how this fits your philosophy? |
I think you are correct that it appears to be a definition issue. In the thermodynamic limit the two results are identical, but for the finite systems we're calculating they differ by that factor of 1/N. To be honest I'm somewhat agnostic to the choice we make, as long as it's properly documented (if we choose not to change the code I definitely think we need to make it clear in the docs that users should expect that scaling factor). I think that it's probably a bit more precise not to change it, but users could certainly be surprised by the behavior for small N. I'd like to get @joaander's take on this before making any final changes since I believe he either wrote or oversaw the creation of this implementation. He's out of town until after labor day unfortunately, but if this is holding you back let us know and we can at least create a fix ASAP and hold off on a final decision for when he returns. |
Now that I've looked at the code and think I understand the discrepancy, I'm happy enough to multiply my results by Either way, it would be great if you could add an equation to the docs showing how freud defines and normalizes the RDF, and possibly add a sentence or two of discussion on potential differences (or similarities) with other tools for doing the calculation. (Something like mdanalysis.analysis.rdf... which makes me wonder which result that equation returns, since it is normed by A solution that could be nice is if this normalization could be an option when |
I like the idea of a normalization flag, what would you name it? I would definitely like to add an equation or two to the documentation, perhaps pointing at Wikipedia's RDF description as well for more explanation. I would probably prefer to leave the default as is, to at least make users think about it before requesting the normalization (since for small N our definition is exactly what you get from the pair correlation function); would you agree that it makes sense? |
I'm not quite sure what I would call it... I guess it should probably be a bool that toggles the modes, something like your I agree that it is cleanest to leave the current behavior as default. |
As part of glotzerlab/freud-examples#19 @jennyfothergill has expressed interest in resolving this issue for the general case of pairs from two arbitrary subsets of the system, i.e. |
Hi @vyasr, thank you for your help! I'm not sure exactly where to start... I'm looking at https://github.com/glotzerlab/freud/blob/master/cpp/density/RDF.cc, is that right? Line 55 in 2ae4a79
We want to add a check that points != query_points and change the normalization to points*query_points if True. (Equation 3 from http://www-s.ks.uiuc.edu/Research/gpu/files/rdfgpu/rdfgpu.pdf)Does this sound about right? |
Yes, that's where the normalization is currently happening. However, I'm not sure if your suggestion is exactly right. IIRC there's a prefactor that's being computed just above this number density (literally called prefactor in the code I think :)) and it encodes a factor of the number of points as well. I'm not sure how familiar you are with the theory of the RDF, so here's some background just in case. Basically, the pair correlation function should rigorously include a V/(N*(N-1)) prefactor, where the V/N is the ideal gas density. Most people are accustomed to g(r)->1 in the r->infinity limit. In fact, this only holds for large N (the thermodynamic limit), where V/(N*(N-1)) is basically equal to V/N^2. For very small N, in fact g(r) does not approach 1; instead it approaches (N-1)/N. However, since this is not what people are typically accustomed to, the existing normalization flag basically shifts it back to 1. I'm actually not quite sure what the most general case that we should handle is. @joaander I don't think there's any meaningful way to do this in the context of a custom NeighborList, is there? It's one thing if you pass two distinct sets of points ( |
VMD can solve this in a general way because it computes the RDF between two arbitrary subsets of particles and thus can compute the proper normalization based on the intersection of those two sets. In freud's case (by design) we don't know what the properties of those sets are or even if the two sets of points came frame the same set of particles (maybe they are reference positions on a grid). The user is going to need to supply that information in some way. We could either provide unnormalized output (or output with a given well defined normalization) and ask the user to normalize it. OR we could allow the user to set the prefactor as a property and document how to compute that prefactor for common cases such as those outlined in the VMD paper. |
I don't think there's a need to set or apply a prefactor if the code is not able to do the exclusion between arbitrary sets of particles anyway. Right now, I think you only are able to exclude For the more general case, you would need to be able to give a set of labels to both |
I agree that there's an important distinction here. The concept of exclusion is simply implemented differently in freud. In VMD you have a complete system, then you subset this system to generate the two sets, whereas in freud you would generate the two subsets by subsetting your arrays appropriately before passing them into freud. If the user passes in two sets of particles that are not mutually exclusive (the use case @mphoward is mentioning), freud has no way of knowing that, and the user has to rescale themselves to get the RDF normalization to look as expected. The case I was considering was where you pass a custom I think the only cases that freud can reasonably handle within its data model are the two simplest cases: two disjoint sets of points, and two identical sets of points (with or without |
I think this is very reasonable. One thing that could perhaps be done as a convenience (and not sure if this is any easier for the user) is to optionally return the count of pairs that have distances that are identically zero (or close to zero within some tolerance). The number of such counts can somehow be related to the factor the user will need to use to correct the normalization. Or, you could do as Josh suggested, and allow the user to provide the number of overlaps themselves, and then apply the normalization. I think all of this sort of falls into an "advanced" usage category, so I'm on board with any solution that's well documented. |
I think my preference would be to provide recipes for how a user could accomplish this rather than incorporating it into the API. There are enough ways that we could do this that all seem to handle special cases that I think it would be more maintainable in the long term to just provide unnormalized output (except perhaps in the two special cases discussed above) and maintain a list of examples of how you could, for instance construct a @bdice do you have any thoughts here? |
@vyasr hm, I think I'm following, but let's make sure: It sounds like for the example I'm working on (glotzerlab/freud-examples#19) I should think about showing an example of how to normalize an RDF which uses a custom neighborlist after its creation, as opposed changing anything in the C++ code. This could be done using the |
Yes, that's correct regarding your example. However, I think whatever you do there needs to be documented in the RDF docstring as well, so to close this issue I think we need a PR that updates the docstring regarding how normalize works, in what cases users need to manually normalize, and what the normalization factor needs to be. Also that PR should update the existing normalization to work in the case that |
We should aim to resolve this along with #635 for freud 3.0 |
Closing this issue, as we now extensively document the normalization we use, and the conditions under which that normalization is appropriate. Feel free to re-open if someone wants to add further documentation on the proper normalization when there are overlapping points in the set of |
Describe the bug
The radial distribution function g(r) that I accumulated for a trajectory does not have the correct limiting value. The system is dilute and so g -> 1 for large r, but freud's result is consistently smaller. I recomputed g using VMD and obtained the behavior I physically expected. I can rescale the freud result onto VMD's by multiplying by a factor of ~1.05, suggesting a possible normalization issue. (This factor could be specific to my system though: 20 particles, cubic box with L = 50, 1000 frames.)
To Reproduce
Minimum reproducer script (
gr.py
) and a sample trajectory (cm.gsd
) are attached below. The script reads the trajectory and accumulates g(r) withrmax = 20
anddr = 0.2
. The freud output (freud_gr.dat
) should match the VMD output (vmd_gr.dat
), and both should approach 1.gr_repro.tar.gz
Error output
![compare](https://user-images.githubusercontent.com/6749016/63523347-9a6ad100-c4bf-11e9-83f3-5f41942ade7a.png)
Comparison of freud and VMD output
System configuration
(I confirmed this on other system configurations too.)
The text was updated successfully, but these errors were encountered: