-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Average shared span for multiple matches #12
base: main
Are you sure you want to change the base?
Conversation
I added extra tests that are for the |
Gee, that's a quite clean implementation. Nice. My first inclination is not to introduce the "ties" argument? I think we don't have a use case for |
That's a fair point. I think it was sort of an idea to put in place, where in case we want to add a new matching scheme then we have the scaffolding for that already. However, if want to remove that I can. |
Good thought, but there's no need to put the argument in place, since if we add it in the future but with the default |
Alright, I removed the |
Hm, okay - sorry I didn't pick up on this earlier - but, this is also changing dissimilarity (since both dissimilarity and tpr depend on |
I actually like this change. When I was attempting to redefine terms in the paper, the notation got bogged down. ARF in my mind should be defined as it was with To comment about points 3, 4, and 6: If we want to remove the 'averaged node span' from TPR we could alternatively define it as Below is an example between avg TPR and max TPR: |
Hey, this is a great point. |
Option 2 for issue #11
For each node$n_2\in T_2$ with multiple matches in $T_1$ , Let $\beta(n_2)=[n_1\in T_1 \colon \alpha(n_1)=n_2]$ . Then we compute the similarity between two trees $T_1$ , and $T_2$ as
$$sim(T_1,T_2)=\sum_{n_2\in T_2}\frac{1}{\beta(n_2)}\sum_{n_1\in \alpha^{-1}(n_2)} m(n_1,n_2),$$ $T_2$ and their multiple matches $\beta(n_2)$ .
which is the average over shared spans between nodes in