-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdemo.Rmd
170 lines (119 loc) · 5.04 KB
/
demo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
author: "Oraz Shakirov"
date: "3/24/2022"
runtime: shiny
output:
html_document:
css: "assets/style.css"
toc: true
# output:
# slidy_presentation:
# css: "style.css"
# runtime: shiny
---
```{r setup, include=FALSE}
library(tidyverse)
library(data.table)
library(vistime)
library(plotly)
library(rpart.plot)
knitr::opts_chunk$set(echo = TRUE)
glovo <- fread("data/glovo.csv")
glovo_model <- readRDS("data/model.RData")
```
## How are you planning to measure the issue?
There are many different measures which could be used to evaluate the quality of fraud model. From my perspective the most important are listed below. Imagine there were 1000 orders (25K $), 50(2.5K $) got canceled, 35 has been recognized as suspicious, 20 has been detected as fraud, 10 were false positives. Therefore:
* **Fraud rate (%)** 5%
* **Detection rate (%)** 16%
* **FP rate (%)** 50%
* **Chargeback rate (%)** 10%
* **Accuracy**
## Key findings {.tabset .tabset-fade .tabset-pills}
### Summary
The goal of EDA is in finding features with the biggest impact on outcome (fraud/legitimate). There are many different approaches, but here i used ANOVA. See more on **Rules** section
* Frauds are more likely to happen for android/quiero combination.
* Loyal clients are unlikely to be fraudsters
* Most of the suspicion orders takes place at night hours
* Customers with more than one account are more likely to be fraudsters
### Chart
```{r, echo=FALSE, warning=F}
inputPanel(
selectInput("measure","Select Y", choices = c("fraud_ratio","total_ratio"),selected = "fraud_ratio"),
selectInput("dim1","Select X", choices = c("More than one account"="multiple_account","#orders bucket"="cut_cnt","Order Type"="order_type", "OS"="device_os"),selected = "multiple_account"),
selectInput("dim2","Select group", choices = c("More than one account"="multiple_account","#orders bucket"="cut_cnt","Order Type"="order_type", "OS"="device_os"),selected = "order_type")
)
```
```{r cars, fig.align="center", out.width="100%", echo=FALSE}
glovo_df <- eventReactive(list(input$dim1,input$dim2),{
df <- glovo %>%
group_by_(input$dim1, input$dim2) %>%
summarise(fraud_ratio = sum(if_else(final_order_status=="CanceledStatus",1,0)/n()),N=n()) %>%
ungroup() %>%
mutate(sum(N),total_ratio=N/sum(N))
df
})
renderPlotly({
p <- glovo_df() %>%
ggplot(aes_string(input$dim1, input$measure,fill=input$dim2))+
geom_bar(stat="identity", position = "dodge")+
scale_y_continuous(labels=scales::percent)+
scale_fill_brewer(palette = "Set1")+
theme(legend.position = 'bottom',legend.title = element_blank())+
theme_minimal()
p
}
)
```
```{r, echo=F, warning=F,fig.align="center", out.width="100%"}
renderPlot({
p <-
glovo %>%
group_by(order_hour) %>%
summarise(fraud_ratio = sum(if_else(final_order_status=="CanceledStatus",1,0)/n()),
total=sum(eur_amount,na.rm=T),
total_row= n()
) %>%
mutate(ratio=total/sum(total),ratio2=total_row/sum(total_row)) %>%
ggplot(aes(order_hour, fraud_ratio))+
geom_line(color="red")+
geom_line(aes(y=ratio2/0.2),color="#00a082", stat="identity")+
scale_y_continuous(labels=scales::percent,
name="Fraud Ratio (%)",
sec.axis = sec_axis(~.*0.2, name="Profit Ratio (%)",labels=scales::percent))+
theme(axis.text.y = element_text(colour="red",),
axis.text.y.right = element_text(color = "#00a082"))+
theme_minimal()
p
})
```
## Fraud prevention {.tabset .tabset-fade .tabset-pills}
### How do you plan to measure the success
* Using historical data and orders which happen to be fraud, we can evaluate the efficiency of rules/model
* Testing on random orders known as fraud in advance
* Market benchmarks
### Rules
```{r, echo=F, warning=F,fig.align="center", out.width="100%"}
rpart.plot(glovo_model, box.palette="GnBu", shadow.col="gray", nn=TRUE)
```
Here is a list of rules which could be starting point:
```{r echo=F, warning=F}
rpart.rules(glovo_model, cover = T,style="tallw")
```
## Other fraud types
* Using stolen credit/debit cards for paying orders
* Manipulations with distance, in case if courier's paycheck depends on how far the customer is
* Stolen discount cards/codes
## High-level plan for first 6 months
```{r, echo=FALSE, warning=FALSE,fig.align="center", out.width="100%"}
data <- read.csv(text="event group start end color tooltip
Getting familiar with the team and tools Intro 2022-04-01 2022-05-01 #c8e6c9 1-1 meetings, standarts , docs etc
Detecting problems and pain points Intro 2022-04-01 2022-05-01 #a5d6a7
Defining expectations Intro 2022-04-01 2022-05-01 #fb8c00
Initiate quick win project 2022-05-01 2022-05-17 #DD4B39 eg.Increase detection rate by 2%
Develop quick win project 2022-05-17 2022-06-15 #DEEBF7
Implement quick win project 2022-06-15 2022-07-01 #C6DBEF
Initiate Project 2022-07-01 2022-07-31 #9ECAE1 eg. switching from manual orders review to automatical
Develop Project 2022-08-01 2022-09-01 #E5F5E0
Implement Project 2022-09-01 2022-10-01 #C7E9C0",sep="\t")
vistime(data)
```