forked from shreyagoel/EnrollmentAnalysis_LA_With_R
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathMain.Rmd
156 lines (108 loc) · 4.12 KB
/
Main.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "Data Exploration Assignment Due - 4th week of October"
author: "Shreya and Francis"
date: "October 14, 2017"
output: html_document
---
## Read_Files
```{r}
install.packages("readxl")
```
```{r}
library(readxl)
appbase <- read_excel("C:/Users/Shreya/Documents/GITHUB/Capstone/ApplicationBase.xlsx")
```
```{r}
library(readxl)
appdecision <- read_excel("C:/Users/Shreya/Documents/GITHUB/Capstone/ApplicationDecisions.xlsx")
```
```{r}
library(readxl)
eventattendance <- read_excel("C:/Users/Shreya/Documents/GITHUB/Capstone/PersonEventAttendance.xlsx")
```
```{r}
library(readxl)
gre <- read_excel("C:/Users/Shreya/Documents/GITHUB/Capstone/PersonGRE.xlsx")
```
```{r}
library(readxl)
login <- read_excel("C:/Users/Shreya/Documents/GITHUB/Capstone/PersonLogins.xlsx")
```
```{r}
library(readxl)
message <- read_excel("C:/Users/Shreya/Documents/GITHUB/Capstone/PersonMessages.xlsx")
```
```{r}
library(readxl)
school <- read_excel("C:/Users/Shreya/Documents/GITHUB/Capstone/PersonSchools.xlsx")
```
## Gre
GRE: Gre Scores of students with 3 main categories in these scores: Verbal (Range 0-170), Quantitative (Range 0-170) and Writing (Range 0-6)
Remove Student with identity code
"90584"
Reason: 0 GRE score information on quantitative aptitude
Remove the identity variable and get a new dataset named gre1 to see for trends in the scores for the admitted students
```{r}
gre1 <- gre[-c(1)]
```
```{r}
plot(gre1)
```
After viewing these plots it is well clear that #the #Institue offers admissions to a diverse population and does not count GRE score to be one of the compulsary requirements in the process of admissions.
```{r}
install.packages("psych")
```
```{r}
library(psych)
pairs.panels(gre1[c(1:3)], gap = 0)
```
The pair panels clearly state that th writing scores are not corelated at all with the quantitative scores for the given students at a positive 0.01 corelation (0 being the least and 1 or -1 being perfect corelation)
Also, what is interesting to note is that verbal scores and writing dont show a significant coreltion in the pair plots (+0.06 only)
But, Verbal scores a positively corelated with quantitative scores with corelation of 0.96
```{r}
hist(gre$Verbal)
```
This simple histogram of verbal score shows that there are 2 scales of gre cale that are currently being used in the data
one is current scale of 0-170 (increments of 1 point) (maximum student population has this score range)
other is prior / traditional scale of 0 - 800 (increments of 10 points) (changed on August 2011)
```{r}
hist(gre$Quantitative)
```
```{r}
hist(gre$Writing)
```
Maximum admitted students have 3 or more score level on a scale of 0-6.
To remove the data points with old gre scale
```{r}
gre<-gre[!(gre$Verbal>=171),]
```
There is stil one where the quantitative score is 759 so the next step
```{r}
gre<-gre[!(gre$Quantitative>=171),]
```
```{r}
gre<-gre[!(gre$Quantitative==0),]
```
Originally the data set had 2989 data points and now after removing the data points the new data set has 2936 data points (we lost 53 data points in this process)
Suggested by Charles - I went wrong on the decision of deleting the student data at all, but he suggests that we find an algorithm or may be create on to convert scores from one form to another to not lose data and utilize it well
Also, one thing to may be consider in there is to check that we change before the particular date when this gre score system was used, irrespective of the system, we need to consider that well. - Thoughts on this Francis?
## Event_Attendance
```{r}
summary(eventattendance)
```
```{r}
counts <- table(eventattendance$EventType)
par(las=2)
par(mar=c(5,15,4,2))
barplot(counts, main="Event Attendance Barplot", horiz=TRUE, xlab="Number of Attendees", col="chartreuse3")
```
6 diffent event types, plot shows the frequency people attend the most.
will explore more
interesting if we had some data from those who enroll to comare
## Application_Decision
```{r}
counts <- table(appdecision$decision_name)
par(las=2)
par(mar=c(5,16,4,2))
barplot(counts, main="Decision Name Barplot", horiz=TRUE, xlab="Number of Students in each decision type", col="chartreuse3")
```