Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word clouds and existing charts #3

Open
Gilles-Narcy opened this issue May 2, 2023 · 2 comments
Open

word clouds and existing charts #3

Gilles-Narcy opened this issue May 2, 2023 · 2 comments

Comments

@Gilles-Narcy
Copy link
Collaborator

@njr2128 @tcatapano

I've uploaded word clouds for every language in "docs". Besides, I realized that Roni Kaufman and Clément Godbarge have already done the charts I had in mind - language/category and language/parent tag correlations. If you agree, I think I should just as well use theirs and add my comments and observations on what this data can tell us. Thank you!

@njr2128
Copy link
Member

njr2128 commented May 3, 2023

We just took a look:

  • can you describe how you created these wordclouds?
  • How are you populating the data? We noticed that not all terms/phrases were represented and some seemed cut off
  • What app are you using to generate the clouds?
  • do the colors of the words have significance/meaning?
  • perhaps it is better to generate a vocabulary diversity score for each language and plot it as a bar chart or scattergram?
    --> these questions of methodology (ie how you made these charts) should be included in this repo but also in your final paper

It would be great to use what Roni and Clement have already generated if they are of use to you and your argument. Just remember to cite them

@Gilles-Narcy
Copy link
Collaborator Author

I created the wordclouds using a world cloud generator online (https://www.freewordcloudgenerator.com/) following Terry's advice. I extracted the texts from the tc dataset after cleaning the data from GitHub on Excel. Maybe some text was lost in the process - I'll double-check. I'll take your suggestion about diversity scores converted into bar charts - maybe with different colors for each tag in the manuscript, in order to provide another visualization of the languages-tags correlation.

I'll cite Roni and Clement of course, and make sure to discuss my methodology in my paper. Thank you again for your precious help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants