Skip to content

A System for Latent User Similarity Comparison on Texting Data

Notifications You must be signed in to change notification settings

moxplayer/affinity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deployment Instructions for affinity

The affinity system is a Web application running on the Web Server software nginx, workign as a reverse proxy, together with the duo Gunicorn and Flask running the application (server).

Whenever you use affinity, please cite the following paper (preprint) to be published by IEEE:

@unpublished{Eichinger2019,
	author	= {Eichinger, Tobias and Beierle, Felix and Khan, Sumsam Ullah and Middelanis, Robin and Sekar, Veeraraghavan and Tabibzadeh, Sam},
	title	= {affinity: A System for Latent User Similarity Comparison on Texting Data},
	booktitle = {IEEE-ICC},
	journal={arXiv preprint},
	year 	= {2019},
}

The data sets used in the paper can be downloaded here.

We have tested the soundnedd of the affinity prototype on an Ubuntu 16.04 virtual machine using Python2 (simply python in the below). Support for Python3 is not there - please use Python2

  1. Preliminaries
sudo apt update
sudo apt upgrade
sudo apt-get install python-dev
sudo apt install vim
sudo apt install curl
sudo apt install git
sudo apt install python python-pip
  1. Clone the repository
git clone https://github.com/moxplayer/affinity.git
cd affinity
  1. Initialize Submodules
git submodule init
  1. Update Submodules
git submodule update
  1. Make WMD
sudo apt install python-tk
cd ServerFeatures/Processing/wmd/python-emd-master
make
cd ../../../..
  1. Make fastText
cd ServerFeatures/WordEmbedding/fastText
make
python -m pip install .
cd ../../..
  1. Install virtual_env
python -m pip install virtualenv
  1. Create a virtual environment
virtualenv --python=python2 affinity_venv
  1. Activate the virtual environment
source affinity_venv/bin/activate
  1. Install some python modules into the virtual environment
python -m pip install fbchat
python -m pip install scikit-learn
python -m pip install Flask-MySQL
python -m pip install numpy
python -m pip install scipy
python -m pip install gensim
python -m pip install pybind11
python -m pip install gunicorn
python -m pip install Flask
  1. Install the nginx WebServer software
sudo apt install nginx
  1. Start the nginx WebServer
sudo service nginx start
  1. Configure nginx to act as a Reverse-Proxy

..1. Open the nginx config file in vim

sudo vim /etc/nginx/nginx.conf

..2. Press 'i' in order to access the insertion mode

..3. Replace its content with:

worker_processes 1;

events{

	worker_connections 1024;

}

http{

	# this server delegates incoming (HTTP) requests to the gunicorn application server on https:127.0.0.1:5000;
	server {
		location / {
			proxy_pass https://127.0.0.1:5000;
			# set the (request) header info 'Host' to the address of the request host
			proxy_set_header Host $host;
			# set the X-Real-IP to the client IP (sender)
			proxy_set_header X-Real-IP $remote_addr;

			# increase the timeout restrictions ( you might want to increase those numbers
			# denoting time [in seconds] for testing)
			proxy_connect_timeout 3600;
			proxy_send_timeout 3600;
			proxy_read_timeout 3600;
			send_timeout 3600;
		}
	}
}
  1. Restart the nginx service for the changes to take effect
sudo service nginx restart
  1. Start the application main file (containing Flask application information) bound (-b) on 127.0.0.1 and port 5000, a timeout (-t) of 3600 seconds ['wsgi.py' is the server-file and 'app' the application object's name in the server-file]
gunicorn -b 127.0.0.1:5000 -t 3600 wsgi:app
  1. If everything started correctly you will be propted with something like:
[2018-09-24 12:58:32 +0000] [3607] [INFO] Starting gunicorn 19.9.0
[2018-09-24 12:58:32 +0000] [3607] [INFO] Listening at: http://127.0.0.1:5000 (3607)
[2018-09-24 12:58:32 +0000] [3607] [INFO] Using worker: sync
[2018-09-24 12:58:32 +0000] [3607] [INFO] Boosting worker with pid: 3611

and some warning messages generated by Cython (a python package) which you can ignore

[...]
/home/affinity/affinity_venv/local/lib/python2.7/site-packages/scipy/io/matlab/mio5.py:98: RuntimeWarning: numpy.dtype size changed, may
indicate binary incompatibility. Expected 96, got 88
from .mio5_utils import VarReader5

The warning messages can be safely ignored according to this github issue.

You may interrupt the server anytime via [CTRL] + [C]

  1. Open Port 5000 on Ubuntu
sudo ufw allow 5000

Setup the MySQL database

  1. Download packages
sudo apt install mysql-server mysql-common mysql-client
  1. Log into the MySQL Server as a root user [... add information at first login to MySQL - basically create credentials - username : root password : toor
mysql -u root -p
  1. Enter password [toor] [Some Welcome message]

  2. Create a Database

CREATE DATABASE FBUserData;
  1. Select the created database
USE FBUserData;
  1. Set the 4 byte unicode symbol support and their standard collation (the colation will be able to be adjusted within the application)
ALTER DATABASE CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  1. Create three tables
CREATE TABLE userinfo ( id int NOT NULL AUTO_INCREMENT, usertype varchar(20), useremailid varchar(100), username varchar(100), password varchar(100), status varchar(2),
 maxtimestamp varchar(300), fbemailid varchar(100), PRIMARY KEY (id));
CREATE TABLE userBoW ( id int NOT NULL, feature_names varchar(1000), occurrences int, tfidf float);
CREATE TABLE BoWHistory ( feature_names varchar(1000), no_occurrences int);
  1. Check if the collation is correctly set
SHOW TABLE STATUS;
  1. Quit MySQL
\q
  1. Decompress dataset
tar -zxvf US115thcongress.tar.gz

Download NUS SMS Corpus

wget https://github.com/kite1988/nus-sms-corpus/raw/master/smsCorpus_en_xml_2015.03.09_all.zip
unzip smsCorpus_en_xml_2015.03.09_all.zip
mv smsCorpus_en_2015.03.09_all.xml ServerFeatures/Userdata/Reference_User_Histories

Register reference Users from the NUS SMS Corpus via

Note that in case that no reference model has been trained, the NUS SMS Corpus is used to build a temporary corpus as a concatenation of its individual users' .txt files that contain more than 1000bytes. This threshold is more or less arbitrary to disregard user histories with very little textual information. After the registration process, we have [] reference users registered with their individual word usage statistics saved in a MySQL database.

curl -X POST -H "Content-type:application/json" -d '{"path":"./ServerFeatures/Userdata/Reference_User_Histories/smsCorpus_en_2015.03.09_all.xml", "useforretrain":"False"}' http://127.0.0.1:5000/createreferenceuser

Register the senators of the 115th US Congress

python register_all_senators.py

Calculate similarity between two registered users

curl -X POST -H "Content-type:application/json" -d '{"userid1":"<some_registered_id1>", "userid2":"<some_registered_id2"}' http://127.0.0.1:5000/comparedetails

Calculate pairwise similarities registered users and save the results as gephi edge graph (.csv)

curl -X POST -H "Content-type:application/json" -d '{"userids":"<comma_separated_userids_without_blanks"}' http://127.0.0.1:5000/pairwisedist

About

A System for Latent User Similarity Comparison on Texting Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages