Skip to content

Commit

Permalink
Merge pull request #16 from novoic/dev
Browse files Browse the repository at this point in the history
Merging all changes for BlaBla V0.2 Release
  • Loading branch information
abhisheknovoic authored Jul 22, 2020
2 parents 88bce71 + c9372cf commit 2315b7e
Show file tree
Hide file tree
Showing 6 changed files with 701 additions and 135 deletions.
18 changes: 16 additions & 2 deletions FEATURES.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,34 @@ speech rate| The number of words per minute | ***speech_rate*** | Stanza(66) | N
maximum speech rate| The average number of words per minute across the top N (set to 10 by default) rapid sentences| ***maximum_speech_rate*** | Stanza(66) | No | No | JSON | ***num_rapid_sentences*** | 10
total phonation time| Total time duration of all words across all sentences | ***total_phonation_time*** | Stanza(66) | No | No | JSON | - | -
standardized phonation time| The total number of words divided by the total phonation time | ***standardized_phonation_time*** | Stanza(66) | No | No | JSON | - | -
total locution time| The total amount of time in speech that contains both speech and pauses | ***total_locution_time*** | Stanza(66) | No | No | JSON | - | -
total locution time| The total amount of time in speech that contains both speech and pauses | ***total_locution_time*** | Stanza(66) | No | No | JSON | - | -
noun rate|The rate of nouns across sentences |***noun_rate***|Stanza(66)|Yes|No|String or JSON | - | -
verb rate|The rate of verbs across sentences|***verb_rate***|Stanza(66)|Yes|No|String or JSON | - | -
demonstrative rate|The rate of demonstrative across sentences|***demonstrative_rate***|Stanza(66)|Yes|No|String or JSON | - | -
adjective rate|The rate of adjectives across sentences|***adjective_rate***|Stanza(66)|Yes|No|String or JSON | - | -
pronoun rate|The rate of pronouns across sentences|***pronoun_rate***|Stanza(66)|Yes|No|String or JSON | - | -
adposition rate|The rate of adpositions across sentences|***adposition_rate***|Stanza(66)|Yes|No|String or JSON | - | -
adverb rate|The rate of adverbs across sentences|***adverb_rate***|Stanza(66)|Yes|No|String or JSON | - | -
auxiliary rate|The rate of auxiliaries across sentences|***auxiliary_rate***|Stanza(66)|Yes|No|String or JSON | - | -
conjunction rate|The rate of conjunctions across sentences|***conjunction_rate***|Stanza(66)|Yes|No|String or JSON | - | -
determiner rate|The rate of determiners across sentences|***determiner_rate***|Stanza(66)|Yes|No|String or JSON | - | -
interjection rate|The rate of interjections across sentences|***interjection_rate***|Stanza(66)|Yes|No|String or JSON | - | -
numeral rate|The rate of numerals across sentences|***numeral_rate***|Stanza(66)|Yes|No|String or JSON | - | -
particle rate|The rate of particles across sentences|***particle_rate***|Stanza(66)|Yes|No|String or JSON | - | -
pronoun rate|The rate of pronouns across sentences|***pronoun_rate***|Stanza(66)|Yes|No|String or JSON | - | -
proper noun rate|The rate of proper nouns across sentences|***proper_noun_rate***|Stanza(66)|Yes|No|String or JSON | - | -
punctuation rate|The rate of punctuations across sentences|***punctuation_rate***|Stanza(66)|Yes|No|String or JSON | - | -
subordinating conjunction rate|The rate of subordinating conjunctions across sentences|***subordinating_conjunction_rate***|Stanza(66)|Yes|No|String or JSON | - | -
symbol rate|The rate of symbols across sentences|***symbol_rate***|Stanza(66)|Yes|No|String or JSON | - | -
possessive rate|The rate of possessive words across sentences|***possessive_rate***|Stanza(66)|Yes|No|String or JSON | - | -
noun verb Ratio|The ratio of nouns to verbs across sentences|***noun_verb_ratio***|Stanza(66)|Yes|No|String or JSON | - | -
noun ratio|The ratio of nouns to the sum of nouns and verbns across sentences|***noun_ratio***|Stanza(66)|Yes|No|String or JSON | - | -
pronoun noun ratio|The ratio of pronouns to nouns across sentences|***pronoun_noun_ratio***|Stanza(66)|Yes|No|String or JSON | - | -
closed-class word rate |The proportions of determiners, pronouns, conjunctions and prepositions to all words across sentences|***closed_class_word_rate***|Stanza(66)|Yes|No|String or JSON | - | -
open-class word rate |The proportions of nouns, verbs, adjectives and adverbs to all words across sentences|***open_class_word_rate***|Stanza(66)|Yes|No|String or JSON | - | -
total dependency distance|The total distance of all dependencies across sentences|***total_dependency_distance***|Stanza(66)|Yes|No|String or JSON | - | -
average dependency distance|The average distance of all dependencies across sentences|***average_dependency_distance***|Stanza(66)|Yes|No|String or JSON | - | -
total dependencies|The total number of unique dependencies across sentences|***total_dependencies***|Stanza(66)|Yes|No|String or JSON | - | -
average dependencies|The average number of unique dependencies across sentences|***average_dependencies***|Stanza(66)|Yes|No|String or JSON | - | -
content density |The proportions of numebr of open class words to the numebr of close class words |***content_density***|Stanza(66)|Yes|No|String or JSON | - | -
idea density |he proportions of verbs, adjectives, adverbs, prepositions and conjucntions to all words across sentences |***idea_density***|Stanza(66)|Yes|No|String or JSON | - | -
honore's statistic |Calculated as R = (100*log(N))/(1-(V1)/(V)), where V is number of unique words, V1 is the number of words in the vocabulary only spoken once, and N is overall text length / number of words. |***honore_statistic***|Stanza(66)|Yes|No|String or JSON | - | -
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ To set up CoreNLP version 4.0.0, do `./setup_corenlp.sh` after changing `corenlp

After installation, or if you already have CoreNLP installed, let BlaBla know where to find it using `export CORENLP_HOME=/path/to/corenlp`.

CoreNLP also requires the [Java Developer Kit](https://www.oracle.com/java/technologies/javase-downloads.html) to be installed. To check whether it is already installed locally, run `$ javac -version`.

## Quickstart
Print the noun rate for some example text using Python (find the YAML configs inside the BlaBla repo):
```python
Expand Down
249 changes: 217 additions & 32 deletions blabla/document_engine.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import traceback

from blabla.sentence_aggregators.phonetic_and_phonological_feature_aggregator import (
phonetic_and_phonological_feature_processor,
)
Expand All @@ -15,7 +17,7 @@
)
import blabla.utils.settings as settings
from blabla.utils.exceptions import *
import traceback



class Document(object):
Expand Down Expand Up @@ -47,11 +49,11 @@ def validate_features_list(self, feature_list):
)
)

def compute_features(self, *feature_list, **kwargs):
def compute_features(self, feature_list, **kwargs):
"""Compute features
Args:
feature_list (str): A list of features to be extracted
feature_list (list of str): A list of features to be extracted
Returns:
dict: A dictionary of features and their values
Expand Down Expand Up @@ -145,7 +147,7 @@ def _extract_syntactic_features(self, *features, **kwargs):
return features_dict

def _extract_discourse_and_pragmatic_feature_processor(self, *features, **kwargs):
"""Extract discourse and pragmatic features across all sentence objects
"""Extract discourse and pragmatic features across all sentence objects
Args:
features (list): The list of features to be extracted
Expand Down Expand Up @@ -291,6 +293,90 @@ def total_locution_time(self, **kwargs):
'total_locution_time', **kwargs
)['total_locution_time']

def adjective_rate(self, **kwargs):
"""Extract the adjective rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
The adjective rate across all sentence objects
"""
return self._extract_lexico_semantic_features('adjective_rate', **kwargs)['adjective_rate']

def adposition_rate(self, **kwargs):
"""Extract the adposition rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
The adposition rate across all sentence objects
"""
return self._extract_lexico_semantic_features('adposition_rate', **kwargs)['adposition_rate']

def adverb_rate(self, **kwargs):
"""Extract the adverb rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
The adverb rate across all sentence objects
"""
return self._extract_lexico_semantic_features('adverb_rate', **kwargs)['adverb_rate']

def auxiliary_rate(self, **kwargs):
"""Extract the auxiliary rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
The auxiliary rate across all sentence objects
"""
return self._extract_lexico_semantic_features('auxiliary_rate', **kwargs)['auxiliary_rate']

def conjuction_rate(self, **kwargs):
"""Extract the conjuction rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
The conjuction rate across all sentence objects
"""
return self._extract_lexico_semantic_features('conjuction_rate', **kwargs)['conjuction_rate']

def determiner_rate(self, **kwargs):
"""Extract the determiner rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
The determiner rate across all sentence objects
"""
return self._extract_lexico_semantic_features('determiner_rate', **kwargs)['determiner_rate']

def interjection_rate(self, **kwargs):
"""Extract the interjection rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
The interjection rate across all sentence objects
"""
return self._extract_lexico_semantic_features('interjection_rate', **kwargs)['interjection_rate']

def noun_rate(self, **kwargs):
"""Extract the noun rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Expand All @@ -303,70 +389,113 @@ def noun_rate(self, **kwargs):
"""
return self._extract_lexico_semantic_features('noun_rate', **kwargs)['noun_rate']

def verb_rate(self, **kwargs):
"""Extract the verb rate.
def numeral_rate(self, **kwargs):
"""Extract the numeral rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
flaot: The verb rate across all sentence objects
The numeral rate across all sentence objects
"""
return self._extract_lexico_semantic_features('verb_rate', **kwargs)['verb_rate']
return self._extract_lexico_semantic_features('numeral_rate', **kwargs)['numeral_rate']

def demonstrative_rate(self, **kwargs):
"""Extract the demonstrative rate
def particle_rate(self, **kwargs):
"""Extract the particle rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
float: The demonstrative rate across all sentence objects
The particle rate across all sentence objects
"""
return self._extract_lexico_semantic_features('demonstrative_rate', **kwargs)[
'demonstrative_rate'
]
return self._extract_lexico_semantic_features('particle_rate', **kwargs)['particle_rate']

def adjective_rate(self, **kwargs):
"""Extract the adjective rate
def pronoun_rate(self, **kwargs):
"""Extract the pronoun rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
float: The adjective rate across all sentence objects
The pronoun rate across all sentence objects
"""
return self._extract_lexico_semantic_features('adjective_rate', **kwargs)[
'adjective_rate'
]
return self._extract_lexico_semantic_features('pronoun_rate', **kwargs)['pronoun_rate']

def pronoun_rate(self, **kwargs):
"""Extract the pronoun rate.
def proper_noun_rate(self, **kwargs):
"""Extract the proper_noun rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
float: The pronoun rate across all sentence objects
The proper_noun rate across all sentence objects
"""
return self._extract_lexico_semantic_features('pronoun_rate', **kwargs)[
'pronoun_rate'
]
return self._extract_lexico_semantic_features('proper_noun_rate', **kwargs)['proper_noun_rate']

def adverb_rate(self, **kwargs):
"""Extract the adverb rate.
Ref: https://www.cs.toronto.edu/~kfraser/Fraser15-JAD.pdf
def punctuation_rate(self, **kwargs):
"""Extract the punctuation rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
The punctuation rate across all sentence objects
"""
return self._extract_lexico_semantic_features('punctuation_rate', **kwargs)['punctuation_rate']

def subordinating_conjunction_rate(self, **kwargs):
"""Extract the subordinating_conjunction rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
The subordinating_conjunction rate across all sentence objects
"""
return self._extract_lexico_semantic_features('subordinating_conjunction_rate', **kwargs)['subordinating_conjunction_rate']

def symbol_rate(self, **kwargs):
"""Extract the symbol rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
float: The adverb rate across all sentence objects
The symbol rate across all sentence objects
"""
return self._extract_lexico_semantic_features('adverb_rate', **kwargs)[
'adverb_rate'
return self._extract_lexico_semantic_features('symbol_rate', **kwargs)['symbol_rate']

def verb_rate(self, **kwargs):
"""Extract the verb rate.
Ref: https://pubmed.ncbi.nlm.nih.gov/28321196/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
flaot: The verb rate across all sentence objects
"""
return self._extract_lexico_semantic_features('verb_rate', **kwargs)['verb_rate']

def demonstrative_rate(self, **kwargs):
"""Extract the demonstrative rate
Args:
kwargs (list): Optional arguments for threshold values
Returns:
float: The demonstrative rate across all sentence objects
"""
return self._extract_lexico_semantic_features('demonstrative_rate', **kwargs)[
'demonstrative_rate'
]

def conjunction_rate(self, **kwargs):
Expand Down Expand Up @@ -439,6 +568,62 @@ def pronoun_noun_ratio(self, **kwargs):
'pronoun_noun_ratio'
]

def total_dependency_distance(self, **kwargs):
"""Extract the total dependency distance.
Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5337522/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
float: The total dependency distance across all sentence objects
"""
return self._extract_lexico_semantic_features('total_dependency_distance', **kwargs)[
'total_dependency_distance'
]

def average_dependency_distance(self, **kwargs):
"""Extract the average dependency distance.
Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5337522/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
float: The average dependency distance across all sentence objects
"""
return self._extract_lexico_semantic_features('average_dependency_distance', **kwargs)[
'average_dependency_distance'
]

def total_dependencies(self, **kwargs):
"""Extract the number of unique dependency relations.
Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5337522/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
float: The total number of unique dependencies across all sentence objects
"""
return self._extract_lexico_semantic_features('total_dependencies', **kwargs)[
'total_dependencies'
]

def average_dependencies(self, **kwargs):
"""Extract the average number of unique dependency relations.
Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5337522/
Args:
kwargs (list): Optional arguments for threshold values
Returns:
float: The average number of unique dependencies across all sentence objects
"""
return self._extract_lexico_semantic_features('average_dependencies', **kwargs)[
'average_dependencies'
]

def closed_class_word_rate(self, **kwargs):
"""Extract the proportion of closed class words.
Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5337522/
Expand Down
Loading

0 comments on commit 2315b7e

Please sign in to comment.