Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamlit app, Emap portal, and other fixes #72

Closed
wants to merge 34 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
557e7f1
Validation script to check for gaps
jeremyestein Nov 1, 2024
cf4963a
Allow disabling of periodic deletion (infinite retention period) and
jeremyestein Nov 7, 2024
576e812
Improve logging esp. in case of overlap, and hopefully at least isola…
jeremyestein Nov 8, 2024
2837812
More logging tweaks
jeremyestein Nov 11, 2024
7423371
First attempt at data visualisation/validation. Fix generator bug that
jeremyestein Nov 11, 2024
2250acb
convert validation scripts to jupytext
jeremyestein Nov 11, 2024
4497a36
Rename component to reflect more accurately its function
jeremyestein Nov 12, 2024
7b1fc8f
Make waveforms a somewhat realistic frequency (but not shape). Fix
jeremyestein Nov 15, 2024
2514238
Use jupyter lab plus first attempt at streamlit app
jeremyestein Nov 15, 2024
2718c3f
Optimise DB calls for caching, cache a DB method I'd missed, and remo…
jeremyestein Nov 15, 2024
ef826b3
Do array unpacking and the related timestamp arithmetic in the database
jeremyestein Nov 15, 2024
dd63efd
Put streamlit (and experimentally, glowroot) behind an nginx reverse
jeremyestein Nov 19, 2024
dcad7e5
Add HTTP basic auth
jeremyestein Nov 20, 2024
57b809a
Add various graph metadata; use streamlit caching with TTL to allow for
jeremyestein Nov 20, 2024
edbe0dc
Need network access during build
jeremyestein Nov 20, 2024
89a80fc
A few settings tweaks to see if they solve the websocket proxying
jeremyestein Nov 21, 2024
656c206
Configure streamlit to run on different base URL rather than rewrite
jeremyestein Nov 21, 2024
d43076c
Streamlit config tweaks
jeremyestein Nov 21, 2024
32a31e8
Get external hostname and external port into the streamlit app, as
jeremyestein Nov 21, 2024
2c6becb
Bring variables (mostly waveform related) into the global config script.
jeremyestein Nov 21, 2024
b119d40
Make various variables configurable
jeremyestein Nov 26, 2024
9b09d3c
Convert null values in global yaml to empty string in envs, not "None"
jeremyestein Nov 27, 2024
740e561
Fix annoying off by one bug that could cause collation to not happen in
jeremyestein Nov 27, 2024
824d549
Clear up db connections and tweak cache timeout
jeremyestein Nov 27, 2024
70fd9a7
Use points instead of line and give tidier missing data error
jeremyestein Nov 27, 2024
30235e6
Slider memory, improve errors, call methods on containers rather than
jeremyestein Nov 29, 2024
2630b32
Lower cache TTLs to allow for live data to be displayed sooner
jeremyestein Nov 29, 2024
447e586
Go back to line chart and make slider a bit more controllable hopefully
jeremyestein Nov 29, 2024
ec79b58
Put cache TTL back up but add button to manually invalidate it, move
jeremyestein Nov 29, 2024
5454d13
Check none
jeremyestein Nov 29, 2024
a0b2f31
Try to reproduce bug by not enabling all streams for all patients. Don't
jeremyestein Dec 2, 2024
5b16aa9
Only show streams in dropdown for which data exists. Fix upper bound
jeremyestein Dec 2, 2024
107c9bc
Make into multi-page app
jeremyestein Dec 2, 2024
fe043a0
Point to right page
jeremyestein Dec 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,10 @@ hs_err_pid*
# IntelliJ files
.idea/
*.iml

# python
__pycache__

# prevent notebooks from being checked in
*.ipynb
.ipynb_checkpoints
2 changes: 1 addition & 1 deletion cassandra-config-envs.EXAMPLE
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# these settings work on the GAE but if the process is getting OOM killed you can reduce them
# defaults work on the GAE but if the process is getting OOM killed you can reduce them
MAX_HEAP_SIZE=4G
HEAP_NEWSIZE=800M
10 changes: 10 additions & 0 deletions core/Dockerfile.emap-portal
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM nginx:otel
RUN apt update && \
apt install -y apache2-utils && \
apt clean
COPY core/emap-portal/nginx.conf /etc/nginx/
COPY core/emap-portal/conf.d/ /etc/nginx/conf.d/
COPY core/emap-portal/www/* /usr/share/nginx/html/
RUN --mount=type=secret,id=portal-build-secrets \
. /run/secrets/portal-build-secrets && \
htpasswd -b -B -c /etc/nginx/conf.d/htpasswd "$PORTAL_USERNAME" "$PORTAL_PASSWORD"
1 change: 0 additions & 1 deletion core/core-config-envs.EXAMPLE
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,4 @@ SPRING_RABBITMQ_USERNAME=emap
SPRING_RABBITMQ_PASSWORD=yourstrongpassword
LOGGING_LEVEL_UK_AC_UCL_RITS_INFORM=INFO
CORE_WAVEFORM_RETENTION_HOURS=24
CORE_WAVEFORM_IS_NON_CURRENT_TEST_DATA=true
TZ=Europe/London
21 changes: 21 additions & 0 deletions core/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,25 @@ services:
restart: on-failure
depends_on:
- cassandra
emap-portal:
build:
context: ..
dockerfile: core/Dockerfile.emap-portal
args:
HTTP_PROXY: ${HTTP_PROXY}
http_proxy: ${http_proxy}
HTTPS_PROXY: ${HTTPS_PROXY}
https_proxy: ${https_proxy}
secrets:
- portal-build-secrets
env_file:
- ../../config/portal-config-envs
ports:
- "${PORTAL_PORT}:80"



secrets:
portal-build-secrets:
file: ../../config/portal-config-envs

43 changes: 43 additions & 0 deletions core/emap-portal/conf.d/default.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
server {
listen 80;
listen [::]:80;
server_name localhost;
# nginx is behind docker, so the browser is using a different port number which nginx doesn't know about.
# Use relative redirects to avoid redirecting to port 80. (301s are used when trailing slashes are omitted)
absolute_redirect off;
auth_basic "Administrator’s Area";
auth_basic_user_file conf.d/htpasswd;


access_log /var/log/nginx/host.access.log main;

#error_page 404 /404.html;

# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}

location / {
root /usr/share/nginx/html;
include conf.d/shared/shared_location_config.conf;
}

location /glowroot/ {
include conf.d/shared/shared_location_config.conf;
proxy_pass http://glowroot-central:4000/;
sub_filter 'href="/' 'href="/glowroot/';
sub_filter 'src="/' 'src="/glowroot/';
}

location /streamlit/ {
include conf.d/shared/shared_location_config.conf;
proxy_pass http://streamlit:8501/streamlit/;
proxy_buffering off;

proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
11 changes: 11 additions & 0 deletions core/emap-portal/conf.d/shared/shared_location_config.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
sub_filter_once off;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Accept-Encoding ""; # turn off gzip for upstream so rewriting can work
# needed for websockets
proxy_http_version 1.1;
proxy_read_timeout 86400;
proxy_send_timeout 3600;
50 changes: 50 additions & 0 deletions core/emap-portal/nginx.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@

user nginx;
worker_processes auto;

error_log /var/log/nginx/error.log debug;
pid /var/run/nginx.pid;


events {
worker_connections 1024;
}


http {
include /etc/nginx/mime.types;
default_type application/octet-stream;

log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';

access_log /var/log/nginx/access.log main;

sendfile on;
#tcp_nopush on;

keepalive_timeout 65;

#gzip on;

include /etc/nginx/conf.d/*.conf;
}

# pure TCP proxy?
#
# stream {
# upstream backend {
# server backend-server:12345;
# }
#
# server {
# listen 12345;
# proxy_pass backend;
# Allow specific IP addresses
# allow 192.168.1.1; # Replace with the allowed IP address
# allow 192.168.1.2; # Add more allowed IP addresses as needed
# deny all; # Deny all other IP addresses

# }
# }
17 changes: 17 additions & 0 deletions core/emap-portal/www/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Emap admin page</title>
</head>
<body>
You can access various Emap admin/visualisation/monitoring services:

<ul>
<li><a href="/streamlit">streamlit</a>
<!-- experimental - still accessible directly so don't forget to keep auth enabled in the apps themselves! -->
<li><a href="/glowroot">Glowroot</a> (experimental)
</ul>

</body>
</html>
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,6 @@ public class WaveformProcessor {

@Value("${core.waveform.retention_hours}")
private int retentionTimeHours;
@Value("${core.waveform.is_non_current_test_data}")
private boolean isNonCurrentTestData;

/**
* @param visitObservationController visit observation controller
Expand Down Expand Up @@ -59,25 +57,37 @@ public void processMessage(final WaveformMessage msg, final Instant storedFrom)
*/
@Scheduled(fixedRate = 60 * 60 * 1000)
public void deleteOldWaveformData() {
logger.info("deleteOldWaveformData: Checking for old waveform data for deletion");
Instant baselineDatetime;
if (isNonCurrentTestData) {
// while testing, use the current data (which may be for a
// date far from the present) as a reference for when to apply retention cutoff date from.
// ie. assume the time of the most recent data is "now"
baselineDatetime = waveformController.mostRecentObservationDatatime();
if (baselineDatetime == null) {
logger.info("deleteOldWaveformData: nothing in DB, do nothing");
return;
}
/* When calculating the retention cutoff datetime, instead of working back from the current datetime,
* start at the datetime of the most recent piece of waveform data.
* The main purpose of this is that when testing (eg. using a dump file that might be quite old),
* you don't want to immediately delete all the data due to its timestamps being way in the past.
* And in production the most recent piece of data will be very close to the present time anyway,
* so keep things simple and use the same logic in both cases.
*/
Instant baselineDatetime = waveformController.mostRecentObservationDatatime();
if (baselineDatetime == null) {
logger.info("deleteOldWaveformData: nothing in DB, do nothing");
return;
}

Instant now = Instant.now();
if (baselineDatetime.isAfter(now)) {
// In the hopefully unlikely case that the incoming data is in the future, don't
// go and delete all our data!
logger.warn("deleteOldWaveformData: most recent data is in the future ({}), using current time instead",
baselineDatetime);
baselineDatetime = now;
}

} else {
baselineDatetime = Instant.now();
if (retentionTimeHours <= 0) {
logger.info("deleteOldWaveformData: retention time is infinite, do nothing (baseline date = {})",
baselineDatetime);
return;
}
Instant cutoff = baselineDatetime.minus(retentionTimeHours, ChronoUnit.HOURS);
logger.info("deleteOldWaveformData: baseline = {}, cutoff = {}", baselineDatetime, cutoff);
logger.info("deleteOldWaveformData: deleting, baseline date = {}, cutoff = {}", baselineDatetime, cutoff);
int numDeleted = waveformController.deleteOldWaveformData(cutoff);
logger.info("deleteOldWaveformData: Old waveform data deletion: {} rows older than {}", numDeleted, cutoff);
logger.info("deleteOldWaveformData: deleted {} rows older than {}", numDeleted, cutoff);
}

}
1 change: 0 additions & 1 deletion core/src/main/resources/application.properties
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,5 @@ core.rabbitmq.listen_queues = hl7Queue,databaseExtracts,extensionProjects,wavefo
# Data older than this is liable to be deleted to keep overall disk usage small.
# In production we will want to have this longer (more like 7 days)
core.waveform.retention_hours = 1
core.waveform.is_non_current_test_data = 0

spring.rabbitmq.listener.simple.acknowledge-mode=manual
11 changes: 4 additions & 7 deletions docs/dev/features/waveform_hf_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,8 @@ but in practice it's typically 20 seconds. We have aimed for similar.
## Config options added

Core:
- `core.waveform.retention_hours` periodically delete data more than this many hours old
- `core.waveform.is_non_current_test_data` for testing only - when deciding which data to delete/retain, if set to true,
then treat the "now" point as the most recent observation date in the waveform table, rather than the actual
current time. Purpose is to avoid test data getting immediately deleted because it's too old, which could happen
if we have a fixed set of test data with observation dates way in the past.
- `core.waveform.retention_hours` periodically delete data that is more than this many hours older than
the newest piece of waveform data.

Waveform Generator:
- `waveform.hl7.send_host`, `waveform.hl7.send_port` - the host and port to send the generated data to
Expand All @@ -53,8 +50,8 @@ Waveform Reader:

## Container housekeeping (setup script)
The waveform processing feature is enabled or disabled in the global configuration file. I've added
a "features" section for this, and taken the opportunity to also add the `fakeuds` container to make that easier
to turn on and off.
a "waveform" section for these and related settings.
I also added the `fake_uds` section for turning that on and off.

Because the waveform feature flag will include/exclude the relevant docker compose files from
the docker commands it generates, you can continue to
Expand Down
8 changes: 5 additions & 3 deletions emap-setup/emap_runner/docker/docker_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ def __init__(self,
self.project_dir = project_dir
self.emap_dir = project_dir / "emap"
self.config = config
self.enable_waveform = first_not_none(enable_waveform, self.config.get("features", "waveform"))
self.use_fake_waveform = first_not_none(use_fake_waveform, self.config.get("features", "waveform_generator"))
self.use_fake_uds = first_not_none(use_fake_uds, self.config.get("features", "fake_uds"))
self.enable_waveform = first_not_none(enable_waveform, self.config.get("waveform", "enable_waveform"))
self.use_fake_waveform = first_not_none(use_fake_waveform, self.config.get("waveform", "enable_waveform_generator"))
self.use_fake_uds = first_not_none(use_fake_uds, self.config.get("fake_uds", "enable_fake_uds"))

def run(
self,
Expand Down Expand Up @@ -108,6 +108,8 @@ def docker_compose_paths(self) -> List[Path]:
paths.append(Path(self.emap_dir, "waveform-reader", "docker-compose.yml"))
if self.use_fake_waveform:
paths.append(Path(self.emap_dir, "waveform-generator", "docker-compose.yml"))
if self.config.get("monitoring", "use_streamlit"):
paths.append(Path(self.emap_dir, "monitoring", "docker-compose.yml"))

# allow for hoover and to be optional compose path
if "hoover" in self.config["repositories"]:
Expand Down
8 changes: 7 additions & 1 deletion emap-setup/emap_runner/global_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ class GlobalConfiguration(dict):
"global",
"glowroot",
"common",
"features",
"fake_uds",
"monitoring",
"waveform"
)

def __init__(self, filepath: Path):
Expand Down Expand Up @@ -136,6 +138,10 @@ def _substitute_vars(self, env_file: "EnvironmentFile") -> None:

try:
value = self.get_first(key, env_file.basename)
if value is None:
# Don't stringify None, Spring won't understand.
# Empty string is the closest alternative.
value = ""
env_file.set_new_line_at(f"{key}={value}\n", idx=i)

except KeyError:
Expand Down
40 changes: 35 additions & 5 deletions emap-setup/global-configuration-EXAMPLE.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,18 @@ repositories:
# hoover:
# branch: develop

# Feature flags for not quite ready features, or for turning fake services on and off
features:
waveform: false
waveform_generator: false
fake_uds: false
# Each section below could represent either a service defined by Emap (eg. rabbitmq),
# an external service (eg. IDS/UDS), or really anything at all. The meaning is ultimately
# defined by the setup script.
# To pull a variable into the environment for a container, create/edit a file `*-config-envs.EXAMPLE`
# containing the (empty) variables you wish to have populated. The file (minus ".EXAMPLE" suffix)
# will be copied into the config dir with real values when you run `emap setup -g`. This config file
# should be referenced from the relevant docker-compose service definition to bring the envs into the
# container.
# Note that the sections are not namespaces, and thus variable names should be unique
# even if in a different section.
# By convention, variables intended to be passed into containers directly are in
# upper case. Variables to control the setup script itself are lower case.

# Configuration data for the rabbitmq instance used by Spring in the pipeline
rabbitmq:
Expand Down Expand Up @@ -87,3 +94,26 @@ glowroot:
GLOWROOT_PASSWORD: glowrootpw
GLOWROOT_ADMIN_PORT: 4000

# For testing outside the GAE, you can enable a fake UDS
fake_uds:
enable_fake_uds: false

# The nginx portal and other monitoring/validation/visualisation services
monitoring:
SERVER_EXTERNAL_HOSTNAME: server.fqdn.example
PORTAL_PORT: 7100
PORTAL_USERNAME: emap
PORTAL_PASSWORD: portal_password
use_streamlit: false

# config related to waveform data ingress
waveform:
enable_waveform: false
enable_waveform_generator: false
CORE_WAVEFORM_RETENTION_HOURS: 24
WAVEFORM_HL7_SOURCE_ADDRESS_ALLOW_LIST: 127.0.0.1
WAVEFORM_HL7_TEST_DUMP_FILE: ""
WAVEFORM_SYNTHETIC_NUM_PATIENTS: 30
WAVEFORM_SYNTHETIC_WARP_FACTOR: 6
WAVEFORM_SYNTHETIC_START_DATETIME: "2024-01-02T12:00:00Z"
WAVEFORM_SYNTHETIC_END_DATETIME: "2024-01-03T12:00:00Z"
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,12 @@ EMAP_PROJECT_NAME:
repositories:
emap_documentation:
branch: main
# For testing outside the GAE, you can enable a fake UDS
fake_uds:
enable_fake_uds: false
# config related to waveform data ingress
waveform:
enable_waveform: false
enable_waveform_generator: false
WAVEFORM_HL7_SOURCE_ADDRESS_ALLOW_LIST: 127.0.0.1
WAVEFORM_HL7_TEST_DUMP_FILE: ""
Loading
Loading