Skip to content

Commit

Permalink
Merge branch 'main' into refactor/use-properties-classes-in-userfacin…
Browse files Browse the repository at this point in the history
…g-functions
  • Loading branch information
lwjohnst86 authored Jan 22, 2025
2 parents b55e965 + dc6e4be commit 3dbf426
Show file tree
Hide file tree
Showing 43 changed files with 2,403 additions and 692 deletions.
5 changes: 5 additions & 0 deletions .vscode/google-notypes.mustache
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,8 @@ Yields:
{{descriptionPlaceholder}}.
{{/yields}}
{{/yieldsExist}}

Examples:
```{python}
{{descriptionPlaceholder}}
```
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,39 @@
## 0.11.0 (2025-01-21)

### Feat

- :sparkles: add `check_properties()` (#967)

## 0.10.0 (2025-01-21)

### Feat

- :sparkles: add `check_package_properties()` (#966)

## 0.9.0 (2025-01-21)

### Feat

- :sparkles: add `check_resource_properties()` (#965)

## 0.8.0 (2025-01-21)

### Feat

- :sparkles: collect all Sprout-specific property errors (#964)

## 0.7.0 (2025-01-21)

### Feat

- :sparkles: add functions to check required and blank fields (#963)

## 0.6.0 (2025-01-21)

### Feat

- :sparkles: add simple helper functions (#962)

## 0.5.0 (2025-01-20)

### Feat
Expand Down
1 change: 0 additions & 1 deletion _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,6 @@ quartodoc:
package: "seedcase_sprout.core"
contents:
- path_package
- path_package_database
- path_package_properties
- path_packages
- path_resource
Expand Down
3 changes: 1 addition & 2 deletions docs/design/architecture/naming.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ We may also occasionally use "properties" to refer to the file itself.
| properties or property | Metadata about a package or resource. |
| file structure or structure | The file and folder structure of a package. |
| path | A file path listing the location of folders or files that Sprout interacts with. |
| data | The data file within a resource. Within Sprout, this includes the table in the database, the Parquet file, as well as raw files. |
| data | The data file within a resource. Within Sprout, this includes the Parquet file and raw files. |
| identifier | A unique identifier for a package or resource, denoted as `id`. |

: General objects used throughout Sprout.
Expand All @@ -55,7 +55,6 @@ We may also occasionally use "properties" to refer to the file itself.
| raw data file | A compressed raw data file format, such as a `.zip`, `.gzip`, or `.tar.gz` file. |
| JSON | The main way we save and pass information around on the package's or resource's properties. |
| Parquet | A columnar, efficient storage file format that we use to save data within a resource. |
| SQLite | A relational database file format that we use to contain all data resources in a package. |

: Specific files and folders used within or output by Sprout.

Expand Down
36 changes: 5 additions & 31 deletions docs/design/interface/outputs.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ When a user starts using Sprout to structure their data, that data will
form a [Data Package](https://datapackage.org) (from the Frictionless
Data standard). A data package is called many things in other fields.
For instance it could be a "cohort study data", a "data resource", a
database, a clinical trial dataset, a dataset, a study dataset, simply
"data", or any other combination of words that include the word data. To
keep it consistent with the Frictionless Data standard, we name it a
data package.
clinical trial dataset, a dataset, a study dataset, simply "data", or
any other combination of words that include the word data. To keep it
consistent with the Frictionless Data standard, we name it a data
package.

A data package consists of one or more files (called data resources)
that contain data specific to a given set of conditions. For instance, a
Expand Down Expand Up @@ -116,23 +116,6 @@ Sprout creates and uses these files:
serve potentially as the basis for displaying information for a data
package's landing page in the Sprout web app.

`./database.sqlite`

: This file contains all the data in the data package as a
[SQLite](https://decisions.seedcase-project.org/why-sqlite/)
database. Each relational table within represents one data resource.
The SQLite database is used because it is a lightweight, serverless,
and self-contained database that is easy to use and manage. It is
also a common database format that is used in many applications and
is easy to export to other database formats. The file is only
described within the `datapackage.json` file and is not classified
as a data resource, so the database file cannot be used by software
that follows the Frictionless Data standard. We use and provide it
because of the features that formal databases provide, like
indexing, querying, and data integrity. The `data.parquet` file
described below will be the data resource that is linked within the
`datapackage.json` file.

`./resources/<id>/raw/<timestamp>-<uuid>.<extension>.gzip`

: These are compressed raw data files associated with a specific
Expand All @@ -150,16 +133,7 @@ Sprout creates and uses these files:
: When a user creates a data resource and adds or updates data in the
resource, all resource data is processed and stored in the
[Parquet](https://decisions.seedcase-project.org/why-parquet/) data
format (if it is tabular data). The reason there is both a Parquet
file as well as a table in the SQLite database is the way the
[Frictionless Data Package](https://datapackage.org/) specification
describes and sets data resources. The specification requires either
a path value or a data value to be set for a data resource, but does
not allow for setting a table in a database as a value. So
interoperability of the data package is achieved by providing the
Parquet file, rather than the SQLite database. In this way, we use
both the Parquet data file and the database table for different
reasons, even though they contain the same data.
format (if it is tabular data).

## Multi-user server environment

Expand Down
20 changes: 1 addition & 19 deletions docs/design/interface/python-functions.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ accidental deletion. Outputs `true` if the deletion was successful.
::: {.callout-note collapse="true"}
### `delete_resource_data(path, confirm)`

Delete all data (Parquet and in the database) and raw data of a specific
Delete all data (Parquet) and raw data of a specific
resource. Use `path_resource_raw()` to provide the correct path
location. The `confirm` argument defaults to `false` so that the user
needs to explicitly provide `true` to the function argument as
Expand Down Expand Up @@ -289,16 +289,6 @@ function.](images/core/write-resource-properties.svg){#fig-write-resource-proper
fig-alt="A Plant UML schematic of the detailed code flow within the `write_resource_properties()` function."}
:::

::: {.callout-note collapse="true"}
### `write_resource_to_database(data, path, properties)`

Writes a data object to the `path` database location. Use
`path_resource_database()` to provide the correct location for the
`path` object. The `properties` argument, using `view_resource()`,
ensures the correct table is created in the database. Outputs the same
`path` object as well as the name of the newly created database table.
:::

::: {.callout-note collapse="true"}
### `extract_resource_properties(path, data_path)`

Expand Down Expand Up @@ -457,14 +447,6 @@ print(path_properties(6))
```
:::

::: {.callout-note collapse="true"}
### `path_package_database(package_id)`

Creates the absolute path to the package's database.

<!-- TODO: Not sure if we will keep the database -->
:::

::: {.callout-note collapse="true"}
### `path_packages()`

Expand Down
Loading

0 comments on commit 3dbf426

Please sign in to comment.