Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis on file-level Metadata #50

Open
3 tasks done
SahithiKasim opened this issue Oct 11, 2023 · 5 comments
Open
3 tasks done

Analysis on file-level Metadata #50

SahithiKasim opened this issue Oct 11, 2023 · 5 comments
Assignees

Comments

@SahithiKasim
Copy link
Collaborator

SahithiKasim commented Oct 11, 2023

Try to match the results with the Debsources paper

  • Get the matched repositories and versions from the tmp directory(from abhi)
  • Run the analysis for the command file --mime-type
  • wc -l

Reference : Debsource paper table 2: Tools used to extract file-level metadata

@VinhPham2106
Copy link
Collaborator

@SahithiKasim we don't have any permissions on most of the files and directories in the /tmp folder. Can you help us out with that?

@JorgeH309
Copy link
Collaborator

@SahithiKasim We ran the cloning script to have a sample of repos that we can work on. Our new question is this: are we running these commands on the repos themselves or on the individual files inside the repos?

@SahithiKasim
Copy link
Collaborator Author

@JorgeH309 Do not clone the repos again. Work on the cloned repos. We need the analysis on repos for specific tags (those are matching with the version from Publish_Packages).

@VinhPham2106 I already gave permissions for everyone to work on clone_repos folder. The path is /data/cyan/guacalytics/raw_data/upstream_clones/clone_repos. Can you tell me what files or directories you need in the /tmp folders and for what so that i can change the permissions accordingly.

@VinhPham2106
Copy link
Collaborator

@SahithiKasim
Work done:

  • Get the script to mostly work (there's the error we're discussing).
  • json format for the data is set
  • @JorgeH309 used the generated data to make some visualization
  • I added scripts to parse the data to gather metrics like mean, median, std_dev, top mime_type, so that ppl with data skills can use them to generate further analytics or visualization.

@SahithiKasim
Copy link
Collaborator Author

@VinhPham2106 and @JorgeH309 are your plots ready from this analysis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants