Merge branch 'main' of https://github.com/seedcase-project/seedcase-s…

…prout into refactor/edit-package-properties
seedcase-project · Jan 27, 2025 · 54bea5f · 54bea5f
2 parents 82c753d + ffd83aa
commit 54bea5f
Show file tree

Hide file tree

Showing 24 changed files with 641 additions and 604 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,15 @@
+## 0.13.0 (2025-01-24)
+
+### Feat
+
+- ✨ output everything from code cells in Reference docs (#997)
+
+## 0.12.2 (2025-01-24)
+
+### Refactor
+
+- :recycle: `path_package_properties()` -> `path_properties()` (#996)
+
 ## 0.12.1 (2025-01-23)
 
 ### Refactor

diff --git a/_quarto.yml b/_quarto.yml
@@ -52,9 +52,8 @@ website:
         - section: "Architecture"
           href: docs/design/architecture/index.qmd
           contents:
-            - docs/design/architecture/requirements.qmd
             - docs/design/architecture/naming.qmd
-            - docs/design/architecture/modular-design.qmd
+            - docs/design/architecture/input-data.qmd
         - section: "Interface"
           href: docs/design/interface/index.qmd
           contents:
@@ -94,9 +93,9 @@ quartodoc:
       desc: "Functions to work with and manage data resources found within a data package."
       package: "seedcase_sprout.core"
       contents:
-        - write_resource_properties
         - create_resource_properties
         - create_resource_structure
+        - write_resource_properties
 
     - subtitle: "Property dataclasses"
       desc: "Dataclasses that support creating correct data package properties."

diff --git a/_renderer.py b/_renderer.py
@@ -12,7 +12,20 @@
 
 
 class Renderer(MdRenderer):
-    style = "fix-output-of-returns-and-raises-without-names"
+    style = "seedcase"
+
+    @dispatch
+    def render_header(self, el: layout.Doc) -> str:
+        """Render the header of a docstring, including any anchors."""
+        _str_dispname = el.name
+
+        _anchor = f"{{ #{el.obj.path} }}"
+
+        # For lvl 1 headers, add a yml header with the ipynb-shell-interactivity setting
+        # to get all output from the cell
+        if self.crnt_header_level == 1:
+            return f"---\nipynb-shell-interactivity: all\ntitle: {_str_dispname}\n---"
+        return f"{'#' * self.crnt_header_level} {_str_dispname} {_anchor}"
 
     # returns ----
 

diff --git a/docs/design/architecture/input-data.qmd b/docs/design/architecture/input-data.qmd
@@ -0,0 +1,96 @@
+---
+title: "Input data"
+---
+
+The types of data we expect or anticipate to be input into Sprout are
+described in this section. We design Sprout with these types of data and
+formats in mind.
+
+## Domain-specific types of data
+
+Currently, we only have experience with health data, so we have a bias
+towards that type of data.
+
+### Health research
+
+Health research data tends to consist of these types of data:
+
+-   **Clinical**: This data is typically collected during patient visits
+    to doctors. Depending on the country or administrative region, there
+    will likely already be well-established data processing and storage
+    pipelines in place.
+-   **Register**: This type of data is highly dependent on the country
+    or region. Generally, this data is collected for national or
+    regional administrative purposes, such as recording employment
+    status, income, address, medication purchases, and diagnoses. Like
+    for routine clinical data, the pipelines in place for processing and
+    storing this data are usually very extensive and well
+    established.
+-   **Biological sample data**: This type of data is generated from
+    biological samples, like blood, saliva, semen, hair, or urine. Data
+    generated from sample analytic techniques often produce large
+    volumes of data per person. Samples may be generated in larger
+    established laboratories or in smaller research groups, depending on
+    what analytic technology is used and how new it is. The structure
+    and format of the generated data also tend to be highly variable
+    and depend heavily on the technology used, sometimes requiring
+    specialized software to process and output.
+-   **Survey or questionnaire**: This type of data is often collected based
+    on a given study's aims and research questions. There are hundreds
+    of different questionnaires that can have highly specific purposes
+    and uses for their data. They are also highly variable in the volume
+    of data collected based on the survey, and on the format of the
+    data.
+
+## File and data formats
+
+While we aim to handle a wide variety of data types, we will start with
+the most common types of formats. We also have a limitation or
+restriction that the data format needs to be open source and not
+proprietary, since we cannot process it if we don't have the software to
+read it.
+
+The file formats we expect to work with are text (`.txt`) files, various
+forms of comma-separated value (`.csv`) files, Excel (`.xls` or `.xlsx`)
+files (technically closed source but practically easy to read), images,
+audio, XML, JSON, and potentially some SQL databases.
+
+## Flow or frequency of data collection
+
+In research (and even in most industry settings), we rarely encounter
+truly real-time data collection. Most data collection is done in
+"batches", with data being collected at irregular and inconsistent
+intervals and then stored to be processed later. This batch
+collection can be broken down into two categories based on its
+frequency:
+
+-   *Routine or continuous collection*, where data is collected on a
+    more regular interval and in smaller batches of "observational
+    units"[^1]. Ingestion or processing of this type of data may happen
+    on a more regular basis. Clinical data as well as survey or
+    questionnaire data may likely fall under this category. For example,
+    data collected on a few patients seen during the day at a clinic.
+-   *Grouped collection*, where data is collected from many observational
+    units during a short period of time at very irregular intervals or
+    potentially only once. Data ingesting or processing occurs some time
+    after all the data has been collected. Biological sample data
+    would fall under this category, since laboratories usually run
+    several samples at once and input data after internal quality
+    control checks and machine-specific data processing. While
+    register-based and clinical data usually get collected
+    continuously, direct access to them is only given on a batch and
+    infrequent basis, so they may also fall under this category. Survey
+    data may also come in batches, depending on the questionnaire and
+    software used for its collection.
+
+[^1]: Observational unit is the "entity" that the data was collected
+    from at a given point in time, such as a human participant in a
+    cohort study or a rat in an animal study at a specific time point.
+
+Regardless of the flow or frequency of data generation and collection,
+the ability to automatically ingest the data into Sprout will vary wildly
+based on the data source, the organization who generates the data, and
+their technical expertise. Some data sources may have well-established,
+but not always programmatic or automatic, workflows and processes.
+Others may not have any workflow and it may be an extremely manual
+process.
diff --git a/docs/design/architecture/naming.qmd b/docs/design/architecture/naming.qmd
@@ -63,7 +63,7 @@ We may also occasionally use "properties" to refer to the file itself.
 | Action | Description |
 |----------------------------|--------------------------------------------|
 | create | Create a new object. |
-| construct | Construct or reconstruct a data file (like the Parquet file). |
+| build | Build implies either creating a new object or recreating an existing one, e.g. (re-)build a file like the README or Parquet file. |
 | view | View details about an object. |
 | list | List basic details about many objects. |
 | edit | Edit an object, specifically the properties object. |

diff --git a/docs/design/architecture/runtime-view.qmd b/docs/design/architecture/runtime-view.qmd