Ideas for visualising key phrases together with text, as a modelling aid #260
Replies: 6 comments
-
This seems like a wonderful way to very quickly and clearly show both what PyTextRank does, and how it can be used. The produced image can work great as a graphical elevator pitch, and having another example of how PTR can be used is always preferred. I'd be in favor of:
Perhaps the README image can be a link to the jupyter notebook directly. That way, developers can play around with PTR within minutes.
|
Beta Was this translation helpful? Give feedback.
-
On top of example/README, which you know better, I was wondering whether it could be useful similarly to tr = doc._.textrank
tr.display_keyphrases() but it would be similarly exploratory, so not sure whether you think it's worth adding to the codebase. And we would have to do it in a way that it doesn't actually affect Also, I think the colouring might be better on a scale of white to green, say, so one immediately, visually picks up on which phrases are more key, though it would possibly make the phrases less distinguishable. |
Beta Was this translation helpful? Give feedback.
-
Regarding this, I agree. The current system seems to randomly pick some color, meaning that some borderline unreadable color can be picked. E.g. really dark purple, or borderline black. |
Beta Was this translation helpful? Give feedback.
-
Good point. I experimented with a colour scale based on score and even being lazy and quickly putting together something with matplotlib, I get this which might be better, but still a bit on the dark side at the top of the scale. Note: New from matplotlib import cm
from matplotlib.colors import rgb2hex
def generate_colours(labels):
oranges = cm.get_cmap('Oranges')
labels = [float(label) for label in labels]
## better to normalise to 1 / len(doc) -> 0, and then use a red-yellow-green (RdYlGn) scale, given TextRank starts with uniform distribution of score?
colours = {str(label): (label - min(labels)) / (max(labels) - min(labels)) for label in labels}
colours = {label: oranges(colour) for label, colour in colours.items()}
colours = {label: rgb2hex(colour) for label, colour in colours.items()}
return colours and the rest is mostly the same. |
Beta Was this translation helpful? Give feedback.
-
This looks great! @DayalStrub thank you for all the work and suggestions on |
Beta Was this translation helpful? Give feedback.
-
I am about to experiment with integrating this package into my webapp - https://huggingface.co/spaces/Hellisotherpeople/Unsupervised_Extractive_Summarization which is a (somewhat incomplete) port of my package CX_DB8 - https://github.com/Hellisotherpeople/CX_DB8 I feel compelled to link it here as I independently tackled this problem (visualizations of extractive summaries), and at the time I was unaware of this package (and I don't think it existed quite in its current form either!). It may help or at least give inspiration. @DayalStrub @ceteri thank you both for the hard work on this project and making my life a LOT easier in the next few weeks. My goal is to have a webapp which hosts basically every single technique we can think of for extractive and query focused extractive summarization. This will also need to eventually include MMR and related methods (which are implemented in KeyBERT) |
Beta Was this translation helpful? Give feedback.
-
Just wanted to see what people thought about this...
I've been playing about with keyphrase extraction and, as well as looking at the altair plot pyTextRank produces, found it helpful to display the text with the key phrases. I ended up "hacking" the
doc.ents
and using spaCy'sdisplacy
, so it's not necessarily clean and therefore not sure how it could be added (as is), but thought I would share as I do think it would make a nice exploratory/modelling feature, similar to the extra viz functionality. On the other hand, it might be a common hack, and people might know it, but I haven't seen it elsewhere.Here is an example output:
NOTE: It is only displaying the top 10 key phrases as the colours get quite busy, but one can easily drop the colouring.
And here is the code to reproduce and play with it:
Beta Was this translation helpful? Give feedback.
All reactions