Skip to content

Commit

Permalink
Merge in from ETS Updates. (#129)
Browse files Browse the repository at this point in the history
Add in new ETS Updates.
  • Loading branch information
DrLynch authored Oct 25, 2024
2 parents 6f12321 + e42b451 commit 1b6de21
Show file tree
Hide file tree
Showing 140 changed files with 4,297 additions and 1,545 deletions.
6 changes: 3 additions & 3 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
os: ubuntu-24.04
tools:
python: "3.10"
python: "3.11"
# You can also specify other tool versions:
# nodejs: "19"
# rust: "1.64"
Expand All @@ -26,4 +26,4 @@ sphinx:
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: requirements.txt
- requirements: autodocs/requirements.txt
1 change: 1 addition & 0 deletions CONTRIBUTORS.TXT
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Piotr Mitros
Oren Livne
Paul Deane
Bradley Erickson
14 changes: 12 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ PACKAGES ?= wo,awe
run:
# If you haven't done so yet, run: make install
# we need to make sure we are on the virtual env when we do this
cd learning_observer && python learning_observer --watchdog=restart
cd learning_observer && python learning_observer

venv:
# This is unnecessary since LO installs requirements on install.
Expand Down Expand Up @@ -34,6 +34,7 @@ install-packages: venv
pip install -e learning_observer/[${PACKAGES}]

# Just a little bit of dependency hell...

# The AWE Components are built using a specific version of
# `spacy`. This requires an out-of-date `typing-extensions`
# package. There are few other dependecies that require a
Expand All @@ -42,7 +43,16 @@ install-packages: venv
# components.
# TODO remove this extra step after AWE Component's `spacy`
# is no longer version locked.
pip install -U typing-extensions
# This is no longer an issue, but we will leave until all
# dependecies can be resolved in the appropriate locations.
# pip install -U typing-extensions

# On Python3.11 with tensorflow, we get some odd errors
# regarding compatibility with `protobuf`. Some installation
# files are missing from the protobuf binary on pip.
# Using the `--no-binary` option includes all files.
pip uninstall -y protobuf
pip install --no-binary=protobuf protobuf==4.25

# testing commands
test:
Expand Down
3 changes: 2 additions & 1 deletion autodocs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
_build/
generated/
generated/
apidocs/
9 changes: 2 additions & 7 deletions autodocs/api.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
API
===

.. autosummary::
:recursive:
:toctree: generated/

learning_observer
writing_observer
lo_dash_react_components
.. toctree::

apidocs/index
9 changes: 6 additions & 3 deletions autodocs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,15 @@
sys.path.insert(0, os.path.abspath('../'))

extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.viewcode',
'autodoc2',
'myst_parser',
]

autodoc2_packages = [
'../learning_observer/learning_observer',
'../modules/writing_observer/writing_observer'
]

source_suffix = {
'.rst': 'restructuredtext',
'.md': 'markdown',
Expand Down
3 changes: 3 additions & 0 deletions autodocs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
myst_parser
sphinx
sphinx-autodoc2
3 changes: 3 additions & 0 deletions awe_requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
spacy==3.4.4
pydantic==1.10
spacytextblob==3.0.1
AWE_SpellCorrect @ git+https://github.com/ETS-Next-Gen/AWE_SpellCorrect.git
AWE_Components @ git+https://github.com/ETS-Next-Gen/AWE_Components.git
AWE_Lexica @ git+https://github.com/ETS-Next-Gen/AWE_Lexica.git
Expand Down
3 changes: 2 additions & 1 deletion devops/tasks/config/postuploads
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
sudo hostnamectl set-hostname {hostname}
sudo rm -f /etc/nginx/sites-available/default
sudo rm -f /etc/nginx/sites-enabled/default
sudo ln -f /etc/nginx/sites-available/{hostname} /etc/nginx/sites-enabled/{hostname}
if [ -f /etc/nginx/sites-available/{hostname} ]; then sudo ln -f /etc/nginx/sites-available/{hostname} /etc/nginx/sites-enabled/{hostname}; else echo "WARNING: Failed to make symlink in /etc/nginx/sites-available (config/postupload)"; fi

sudo chown -R ubuntu:ubuntu /home/ubuntu/writing_observer
sudo systemctl daemon-reload
sudo service learning_observer stop
Expand Down
116 changes: 116 additions & 0 deletions docs/scaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Scaling Architecture

The goal is for the Learning Observer to be:

* Fully horizontally-scaleable in large-scale settings
* Simple to run in small-scale settings

It is worth noting that some uses of Learning Observer require
long-running processes (e.g. NLP), but the vast majority are small,
simple reducers of the type which would work fine on an 80386
(e.g. event count, time-on-task, or logging scores / submission).

## Basic use case

In the basic use case, there is a single Learning Observer process
running. It is either using redis or, if unavailable, disk/memory as a
storage back-end.

## Horizontally-scalable use-case

LO needs to handle a high volume of incoming data. Fortunately,
reducers are sharded on a key. In the present system, the key is
always a student. However, in the future, we may have per-resource,
per-class, etc. reducers.

A network roundtrip is typically around 30ms, which we would like to
avoid. Therefore, we would like reducers to be able to run keeping
state in-memory (and simply writing the state out to our KVS either
with each event, or periodically e.g. every second). Therefore, we
would like to have a fixed process per key so that reducers can run
without reads.

Our eventual architecture here is:

```
incoming event --> load balancer routing based on key --> process pool
```

Events for the same key (typically, the same student) should always
land on the same process.

Eventually, we will likely want a custom load balancer / router, but
this can likely be accomplished off-the-shelf, for example by
including the key in an HTTP header or in the URL.

**HACK**: At present, if several web sockets hit a server even with a
common process, they may not share the same in-memory storage. We
should fix this.

## Task-scalable use-case

A second issue is that we would like to be able to split work by
reducer, module, or similar (e.g. incoming data versus dashboards).

Our eventual architecture here is:

```
incoming event --> load balancer routing based on module / reducer --> process pool
```

The key reason for this is robustness. We expect to have many modules,
at different levels of performance and maturity. If one module is
unstable, uses excessive resources, etc. we'd like it to not be able
to take down the rest of the system.

This is also true for different views. For example, we might want to
have servers dedicated to:

* Archiving events into the Merkle tree (must be 100% reliable)
* Other reducers
* Dashboards

## Rerouting

In the future, we expect modules to be able to send messages to each
other.

## Implementation path

At some point, we expect we will likely need to implement our own
router. However, for now, we hope to be able to use sticky routing and
content-based routing in existing load balancers. This may involve
communcation protocol changes, such as:

- Moving auth information from the websocket stream to the header
- Moving information into the URL (e.g. `http://server/in#uid=1234`)

Note that these are short-term solutions, as in the long-term, only
the server will know which modules handle a particular event. Once we
route on modules, an event might need to go to serveral servers. At
that point, we will likely need our own custom router / load balancer.

In the short-term:

* [Amazon](https://aws.amazon.com/elasticloadbalancing/application-load-balancer/?nc=sn&loc=2&dn=2)
supports sticky sessions and content-based routing. This can work on data in the headers.
* nginx can be configured to route to different servers based on headers and URLs. This is slightly manual, but would work as well.

## Homogenous servers

Our goal is to continue to maintain homogenous servers as much as
possible. The same process can handle incoming sockets of data, render
dashboards, etc. The division is handled in devops and in the load
balancer, e.g. by:

- Installing LO modules only on specific servers
- Routing events to specific servers

The goal is to continue to support the single server use-case.

## To do

We need to further think through:

- Long-running processes (e.g. NLP)
- Batched tasks (e.g. nightly processes)
1 change: 0 additions & 1 deletion docs/workshop.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ git clone [email protected]:ETS-Next-Gen/writing_observer.git lo_workshop

```bash
cd lo_workshop/
git checkout berickson/workshop # This is a branch we set up with some extra things for this workshop!
```

NOTE: All future commands should be ran starting from the repository's root directory. The command will specify if changing directories is needed.
Expand Down
10 changes: 5 additions & 5 deletions extension/writing-process/src/background.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ var RAW_DEBUG = false;
/* This variable must be manually updated to specify the server that
* the data will be sent to.
*/
var WEBSOCKET_SERVER_URL = "wss://learning-observer.org/wsapi/in/"
var WEBSOCKET_SERVER_URL = "wss://learning-observer.org/wsapi/in/";

import { googledocs_id_from_url } from './writing_common';

Expand All @@ -35,7 +35,7 @@ const loggers = [

loEvent.init('org.mitros.writing_analytics', '0.01', loggers, loEventDebug.LEVEL.SIMPLE);
loEvent.setFieldSet([loEventUtils.getBrowserInfo(), loEventUtils.fetchDebuggingIdentifier()]);
loEvent.go()
loEvent.go();

// Function to serve as replacement for
// chrome.extension.getBackgroundPage().console.log(event); because it is not allowed in V3
Expand Down Expand Up @@ -157,7 +157,7 @@ chrome.webRequest.onBeforeRequest.addListener(
'bundles': JSON.parse(formdata.bundles),
'rev': formdata.rev,
'timestamp': parseInt(request.timeStamp, 10)
}
};
logFromServiceWorker(event);
loEvent.logEvent('google_docs_save', event);
} catch(err) {
Expand All @@ -170,7 +170,7 @@ chrome.webRequest.onBeforeRequest.addListener(
'formdata': formdata,
'rev': formdata.rev,
'timestamp': parseInt(request.timeStamp, 10)
}
};
loEvent.logEvent('google_docs_save_extra', event);
}
} else if(this_a_google_docs_bind(request)) {
Expand All @@ -181,7 +181,7 @@ chrome.webRequest.onBeforeRequest.addListener(
},
{ urls: ["*://docs.google.com/*"] },
['requestBody']
)
);

// re-injected scripts when chrome extension is reloaded, upgraded or re-installed
// https://stackoverflow.com/questions/10994324/chrome-extension-content-script-re-injection-after-upgrade-or-install
Expand Down
14 changes: 7 additions & 7 deletions extension/writing-process/src/writing.js
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ function google_docs_version_history(token) {
}
*/

const metainfo_url = "https://docs.google.com/document/d/"+doc_id()+"/revisions/tiles?id="+doc_id()+"&start=1&showDetailedRevisions=false&filterNamed=false&token="+token+"&includes_info_params=true"
const metainfo_url = "https://docs.google.com/document/d/"+doc_id()+"/revisions/tiles?id="+doc_id()+"&start=1&showDetailedRevisions=false&filterNamed=false&token="+token+"&includes_info_params=true";

fetch(metainfo_url).then(function(response) {
response.text().then(function(text) {
Expand Down Expand Up @@ -354,7 +354,7 @@ function generic_eventlistener(event_type, frameindex) {
if (event_type=='attention') {
refresh_stream_view_listeners();
}
}
};
}

function refresh_stream_view_listeners() {
Expand Down Expand Up @@ -393,13 +393,13 @@ var editor = document.querySelector('.kix-appview-editor');
var frames = Array.from(document.getElementsByTagName("iframe"));

// TODO: We should really make a list of documents instead of a fake iframe....
frames.push({'contentDocument': document})
frames.push({'contentDocument': document});

// Add an event listener to each iframe in the iframes under frames.
for(var event_type in EVENT_LIST) {
for(var event_idx in EVENT_LIST[event_type]['events']) {
const js_event = EVENT_LIST[event_type]['events'][event_idx];
const target = EVENT_LIST[event_type]['target']
const target = EVENT_LIST[event_type]['target'];
if(target === 'document') {
for(var iframe in frames) {
if(frames[iframe].contentDocument) {
Expand Down Expand Up @@ -608,7 +608,7 @@ function prepare_mutation_observer() {
*/
var observer = new MutationObserver(function (mutations) {
mutations.forEach(function (mutation) {
const event = {}
const event = {};

// This list guarantees that we'll have the information we need
// to understand what happened in a change event.
Expand Down Expand Up @@ -718,8 +718,8 @@ function writing_onload() {
if(this_is_a_google_doc()) {
log_event("document_loaded", {
"partial_text": google_docs_partial_text()
})
execute_on_page_space("_docs_flag_initialData.info_params.token")
});
execute_on_page_space("_docs_flag_initialData.info_params.token");
const handleFromWeb = async (event) => {
if (event.data.from && event.data.from === "inject.js") {
const data = event.data.data;
Expand Down
11 changes: 10 additions & 1 deletion learning_observer/learning_observer/adapters/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,20 @@ def dash_to_underscore(event):

return event


common_transformers = [
dash_to_underscore
]

def add_common_migrator(migrator, file):
'''Add a migrator to the common transformers list.
TODO
We ought check each module on startup for migrators
and import them instead of using this function to
add them to the transformations.
'''
print('Adding migrator', migrator, 'from', file),
common_transformers.append(migrator)


class EventAdapter:
def __init__(self, metadata=None):
Expand Down
18 changes: 5 additions & 13 deletions learning_observer/learning_observer/auth/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,14 +289,6 @@ async def test_case_identify(request, headers, first_event, source):
}


@register_event_auth("http_auth")
async def http_auth_identify(request, headers, first_event, source):
'''
TODO: Allow events to be authorized by HTTP basic authentication
'''
raise NotImplementedError("Not yet built; sorry")


async def authenticate(request, headers, first_event, source):
'''
Authenticate an event stream.
Expand All @@ -311,12 +303,12 @@ async def authenticate(request, headers, first_event, source):
type (e.g. require auth for writing, but not for dynamic assessment)
Our thoughts are that the auth metadata ought to contain:
1. Whether the user was authenticated (`sec` field):
* `authenticated` -- we trust who they are
* `unauthenticated` -- we think we know who they are, without security
* `guest` -- we don't know who they are
2. Providence: How they were authenticated (if at all), or how we believe
they are who they are.
* `authenticated` -- we trust who they are
* `unauthenticated` -- we think we know who they are, without security
* `guest` -- we don't know who they are
2. Providence: How they were authenticated (if at all), or how we believe they are who they are.
3. `user_id` -- a unique user identifier
'''
for auth_method in learning_observer.settings.settings['event_auth']:
Expand Down
Loading

0 comments on commit 1b6de21

Please sign in to comment.