-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path9-reporting-other.Rmd
146 lines (95 loc) · 3.62 KB
/
9-reporting-other.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
title: "9 - Language specific reporting"
output: html_notebook
---
```{r setup, echo=F, results='hide'}
library(ggplot2)
source("config.R")
source("helpers.R")
```
> This R notebook can be executed from within R, you can update the database connection properties and the dataset settings in the `config.R` file, or run the commands specified here interactively if you want to experiment. **Note that this notebook is only valid when javascript pipeline datasets are used.**
## Fig 6 - JavaScript files over time, with and without NPM files.
```{r}
js_aggregate = read.table(paste.path(DATASET_PATH,"aggregated_files.csv"), header = F, col.names = c("time", 1:32), colClasses = rep("integer", 33))
```
```{r}
filesOverTime(DATASET_NAME, js_aggregate, paste("time_",DATASET_NAME,"_files_bw.pdf", sep = ""))
nonNpmFilesOverTime(DATASET_NAME, js_aggregate, paste("time_",DATASET_NAME,"_files_nonpm_bw.pdf", sep = ""))
```
## Fig 7 - Percentage of clones over time
```{r}
nonUniqueFilesOverTime(DATASET_NAME, js_aggregate, paste("time_", DATASET_NAME, "_dup_bw.pdf", sep = ""), bounds = c(0, 100))
```
## Fig 8 - % of NPM files in projects and directly imported NPM packages
```{r}
npmFilesInProjects(DATASET_NAME)
npmDirectImports(DATASET_NAME)
```
## Fig 9 - Popularity of NPM modules.
```{r}
npmPopularity(DATASET_NAME)
```
# NPM and non-NPM Versions of Common Graphs
## Figuree 3
```{r}
js_npm_cloneGroupSize(DATASET_NAME)
```
## Files per project
```{r}
js_npm_filesPerProject(DATASET_NAME)
```
## SLOC per file
```{r}
js_npm_slocPerFile(DATASET_NAME)
```
## Stars per Project
```{r}
js_npm_starsPerProject(DATASET_NAME)
```
## Commits per project
```{r}
js_npm_commitsPerProject(DATASET_NAME)
```
## Qualitative Analysis
project with the most only npm files:
```{r}
sql.connect(dbname = "js")
```
```
SELECT js.projects.projectId, js.projects.files, js.projects.projectUrl FROM ( SELECT * FROM projects WHERE files = 0) AS X JOIN js.projects ON X.projectId = js.projects.projectId ORDER BY js.projects.files;
```
How many modules a project imports:
```
CREATE TABLE foo (SELECT MAX(npmDepth) AS maxDepth, COUNT(DISTINCT moduleName) AS modules, COUNT(DISTINCT blameModule) AS directModules, projectId FROM files_nm JOIN files ON files_nm.fileId = files.fileId GROUP BY projectId);
```
```{r}
summary(sql.query("SELECT maxDepth FROM foo WHERE maxDepth > 0")[[1]])
```
```{r}
summary(sql.query("SELECT modules FROM foo WHERE maxDepth > 0")[[1]])
```
To determine the most popular modules (blame modules), run the following SQL:
```
SELECT COUNT(DISTINCT projectId) as projects, blameModule FROM files_nm JOIN files ON files_nm.fileId = files.fileId GROUP BY blameModule ORDER BY projects DESC LIMIT 10;
```
To see how many projects have NPM files in them:
```
SELECT COUNT(DISTINCT projectId) FROM files WHERE npm != 0;
```
To determine most cloned projects:
```
java SccPreprocessor cstats PATH_TO_DATASET NUM_CHUNKS
CREATE TABLE fully_cloned_projects(
projectId BIGINT NOT NULL,
clones INT NOT NULL,
PRIMARY KEY(projectId));
LOAD DATA INFILE '/home/peta/datasets/js/fully_cloned_projects2.csv' INTO TABLE fully_cloned_projects FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
SELECT projectUrl, clones FROM fully_cloned_projects JOIN projects ON fully_cloned_projects.projectId = projects.projectId ORDER BY clones DESC LIMIT 40;
```
To determine which files are cloned the most:
```
SELECT COUNT(*) AS cnt FROM files GROUP BY fileHash ORDER BY cnt DESC LIMIT 20;
```
```
SELECT COUNT(DISTINCT fileHash) AS distinct, COUNT(*) AS count FROM files WHERE projectId IN SELECT projectId FROM projects ORDER BY commits DESC LIMIT 10000 AS x;
```