You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Preliminary findings from 500 cloned GitHub repositories include metrics - lines of code (LOC), code churns, dependency counts, and the languages used.
Upstream code analysis:
1 a. What trends can be observed in code complexity over time-related to the?
Lines of Code (LOC): Count the total lines of code in a codebase. Higher LOC generally indicates greater complexity.
Dependency Counts: The number of dependencies of packages. More dependencies implies greater complexity.
Code Churn: The number of lines of code added/deleted over time. High churn suggests complex code that requires more frequent changes.
1b. How has the cyclomatic complexity of core Debian packages changed across different time periods?
Get the source code history for each package from its git repository
Calculating the cyclomatic complexity for each package version over time using static analysis tools
Apply statistical analysis (like regression) to test for a significant trend in mean cyclomatic complexity
How has the diversity of licenses used in the codebase evolved?
Extract license information from Upstream copyright files.
Map the license identifiers to general categories like BSD, GNU, Apache, OpenSSL, etc. (can be found in this https://www.debian.org/legal/licenses/)
For each year, tally the number of packages under each license category. Calculate category proportions.
Analyze the trends in license category proportions over time using descriptive statistics.
What changes have occurred in the usage of programming languages?
Examine source code files and identify programming languages using heuristics, file extensions, and tools like cloc.
Categorize and tally usage for languages like Python, Java, C/C++, Rust, etc.
Track the proportions of each language over time as packages evolve.
Apply statistical analysis to determine if language proportions have significantly changed.
What is the correlation between authors and licenses, and how have contributions varied over time?
For each package, analyze git commit history.
Extract the author's name and emails for each commit and match the author's name with the maintainers table. Resolve aliases to unique contributors.
Aggregate commits by contributor to determine top contributors and their activity levels.
Analyze email domains to categorize contributors by organization (e.g. @debian.org, @redhat.com). Calculate organization proportions over time.
Categorize licenses.
Use correlation analysis to identify relationships between authors/organizations and license categories.
The text was updated successfully, but these errors were encountered: