Please use an extension like MathJax Plugin for Github so that equations are rendered properly on README pages.
The aim of this project is to propose novel idea for bias-free hatespeech detection. Among multiple methods like: Transfer Learning, Adversarial Approaches for Gender-Free Hatespeech Detection, Bias-Free Deep Learning architecture for classification etc., we have implemented an Adversary based approach for achieveing the task.
An adversarial training procedure to remove information about the sensitive attribute from the representation learnt by the neural network.
A supervised deep learning task in which it is required to predict an output variable
Combining competing tasks has been found to be a useful tool in deep learning. This technique has been applied for making models fair by trying to prevent biased representations.
We could use an adversarial training procedure to remove information about the sensitive attribute from the representation learned by a neural network. In particular, we might study how the choice of data for the adversarial training effects the resulting fairness properties.
Labelling of hate tweets : Analysing the content of the tweet - Depending on presence of certain words we are labelling a tweet as hate/non-hate.
Dataset used for this project contains $~24$k tweets with four labels - Sexist, Racist, Neither and Both. But since we are dealing with a binary classification problem, all racist tweets are labelled as 1, and the rest, labelled as 0.
There are two possibilities:
-
$F_y ∩ F_z = \phi$
This implies that the predictor and adversary don’t share any common feature and hence removing adversarial features won’t affect predictor’s ability. -
$F_y ∩ F_z ≠ \phi$
This implies that there is an intersecting set of features between the discriminator and the adversary. Hence, removing the adversarial features might affect the performance of the discriminator network.