This file contains a broad list of industrial and academic articles related to detecting and mitigating risks for information systems & platforms. Currently, these articles can be grouped into several interrelated sub-topics:
- Anomaly Detection. Usually an indicator of risk is anomaly. So detecting anomaly can be the first step of risk management.
- Malicious Item Detection. Sometime we more or less know what bad items (e.g., fraud transactions, fake accounts, bot accounts) look like. So in addition to anomaly signals, we can also add domain-specific signals to detect the bad stuff.
- Risk Mitigation and Prevention. After detection the defender needs to act in the right way to mitigate the risk. This is not easy, because the defender has many constraints, like growth, usability, etc.
- Risk Management System / Infrastructure. Risk management systems are challenging to build because they need to process huge amount of data to find a few bad things with high accuracy and fast response time.
Hope you find this reference list helpful!
Lanugage: Chinese / English
Building Trust and Combating Abuse On Our Platform
Digital Trust & Safety Partnership Best Practices Framework
Linkedin spam: a case study of robust feature engineering
Amazon Fraud Detector launches Account Takeover Insights (ATI)
Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop
- Uber
- A comprehensive overview of Alipay's anti-fraud architecture.
Deep Entity Classification: Abusive Account Detection for Online Social
Detecting New Account Fraud and Transaction Fraud with Amazon Fraud Detector
- Practical insights on using ML for fraud detection
Prevent fake account sign-ups in real time with AI using Amazon Fraud Detector
Fighting spam with Guardian, a real-time analytics and rules engine
Trust & Safety Engineering @ GitHub - Lexi Galantino
- A lot of practical insights on payment fraud.
- Very good feature engineering.
Automated Fake Account Detection at LinkedIn
- Score everything.
- Defense in depth and in redundancy.
Breaking fraud & bot detection solutions - Mayank Dhiman - AppSecUSA 2018
- A good overview of Web-based bot defense.
Marketing "Dirty Tinder" on Twitter
- Network analysis
Facebook Crackdown on Fake Accounts Isn’t Solving the Problem for Everyone
- High quality article on the fake & impersonating account problem on Facbeook. Facebook uses facial recognition which seems not effective.
LOBO – Evaluation of Generalization Deficiencies in Twi er Bot Classifiers
- Found that Twitter bot classifiers do not generalize well to unseen bots.
- But the features used by external researchers and internal spam fighters are very different.
Implementing Model-Agnosticism in Uber’s Real-Time Anomaly Detection Platform
Anomaly Detection: Algorithms, Explanations, Applications
- An overview of Thomas Dietterich's research on anomaly detection benchmarking, theory, and applications.
Netflix Cloud Security: Detecting Credential Compromise in AWS
- If my understanding is correct they use the trust on first use principle: if the IP of AssumedRole call is different from the first known IP of that role the call is likely from an outside IP and therefore related to compromised credential.
- They have a follow up article: Netflix Information Security: Preventing Credential Compromise in AWS.
Stripe Machine Learning with Michael Manapat
- Great stuff about applying ML in fraud detection.
- Also described how does Stripe's anti-fraud ML infra look like.
- The main decision making model is linear and they do NOT recommend to directly use more complex machine learning models as the decision making model.
- They propose CNN as a way of ehnancing feature engineering. Particularly, CNN explores many possible feature combinations and pick promising ones. But ultimately we need human to analyze these candidates and find ones that make sense.
*** Exploring New Machine Learning Models for Account Security, Uber
- Semi-supervied model: use PCA + clustering to get labels of IP.
- Unsupervised model: use LSTM and word2vec to discover common travel sequences between cities.
*** Mastermind: Using Uber Engineering to Combat Fraud in Real Time, Uber
- Very good insights on building a global rule-based fraud prevention engine.
Automation Attacks at Scale - Credential Exploitation
- A great empirical study on credential stuffing and account takeover in general.
- Two-stage architecture: predictor and anomaly-detector / comparator.
- For the predictor they used Deep Learning models like DNN, RNN, LSTM.
Twilio Verify: The best phone verification solution
- Twilio argues that phone verification is a good way to prevent fake account creation.
评分卡模型开发-基于逻辑回归的标准评分卡实现 - Erin的博客 - CSDN博客
Suspicious behavior detection: Current trends and future directions
- A very good survey of related work.
Debot: Real-Time Bot Detection via Activity Correlation
- Based on account activity time series, particularly correlation.
- Several papers published.
Detecting outliers and anomalies in realtime at Datadog (Blog)
- Time-series; Outlier detection: MAD, DBSCAN; Anomaly detection: agnostic online learning;
- Some discussions on signal processing based techniques: FFT, Wavelet Transform, Time–frequency analysis.
Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes
LBSNShield: Malicious Account Detection in Location-Based Social Networks
Mastermind: Using Uber Engineering to Combat Fraud in Real Time
- Rule-based engine
Anomaly Detection for Airbnb’s Payment Platform
EVILCOHORT: Detecting Communities of Malicious Accounts on Online Services
- Detect malicious account based on account-IP mapping
- Bipartite network projection, clustering / community detection
Detecting Clusters of Fake Accounts in Online Social Networks
- The main technique is a supervised machine learning pipeline for classifying an entire cluster of accounts as malicious or legitimate
- The key features used in the model are statistics on fields of user-generated text.
RAD — Outlier Detection on Big Data
- Robust PCA; project source code;
Tracking down the Villains: Outlier Detection at Netflix
- DBSCAN clustering.
luminol: a light weight python library for anomaly detection and correlation of time series
- The default anomaly detection method is based on the 2005 "Assumption-Free" paper.
- One limitation, if I understand correctly, is that detection_delay = future_window_size, which can be rather big if one wants to account for periodicity.
- Rule-based; low-latency so that it can run on write path.
Online Social Spammer Detection
- One key contribution is to combine content and network to detect fast evolving spammers. The basic idea is to represent the content, network, and labels as matrices. Then they find a low-rank user representation by finding a matrix factorization that minimizes a loss function. This low-rank representation becomes the feature vectors for classifier.
- Also the model can be updated online.
Uncovering Large Groups of Active Malicious Accounts in Online Social Networks
- Detect malicious accounts based on synchronized actions.
- Single-linkage hierarchical clustering.
Nikunj Oza: "Data-driven Anomaly Detection" | Talks at Google
- Air traffic; One-class SVM; Multiple Kernel Learning based Heterogeneous Algorithm (MKAD);
Systems and methods for troubleshooting errors within computing tasks using models of log files
- Model normal machine as a Finite-State Machine (FSM), and compare logs against the FSM.
Spotting Opinion Spammers using Behavioral Footprints
- Formulate the review spam problem in Bayesian framework / graphical model. The model is generative and can be viewed as a soft clustering system.
- Two key latent variables are author spamicity and review spam cluster. Author spamicity is in [0,1] and used as the parameter of Bernoulli distribution for review spam cluster (\pi). \pi, then together with a few other latent variables, generate observable variables.
[Scammers and VoIP: What you need to know about illegal phone scams](https://www.voipreview.org/blog/scammers-and-voip-what-you-need-know-about-illegal-phone-scams
Sms spam detection using noncontent features
Detecting and characterizing social spam campaigns
- Group wall posts by textual similarities and URLs. They build similarity graph to find clusters.
- Then use two assumptions to find malicious clusters from benign ones.
- Interesting empirical results. For example, 97% of accounts used for spam campaign are compromised.
Determining Wether a Response from a Participant is Contradictory in an Objective Manner
- The basic idea is to compare a user's response with the authoritive response or community response via contingency matrix. If the difference is large one will consider the user's response as fradulent.
Assumption-Free Anomaly Detection in Time Series
- A quite interesting 2D representation of time series. But it is hard to see why we need this representation and the proposed anomaly detection method.
Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters
- Grid-based clustering: CLIQUE, pMAFIA