-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: 🏗️ move .puml
into pseudocode
#1021
base: main
Are you sure you want to change the base?
Conversation
|
||
- Can it be at a minimal read without problems or warnings? | ||
- Do the columns in the data file match those in the properties? | ||
- Do the data types in the data file match those in the properties? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were the initial ones I was thinking about, but I guess as we use it in examples and real-world data, we could add more. Could even eventually move this function out into the checks
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense in general, just added some comments!
Copy the file from `data_path` over into the resource location given by | ||
`path`. This will compress the file and use a timestamped, unique file | ||
name to store it as a backup. See the | ||
[design](https://sprout.seedcase-project.org/docs/design/) docs for an | ||
explanation of this file. Use `path_resource_raw()` to provide the | ||
correct `path` location. Copies and compresses the file, and outputs the | ||
path object of the created file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe include that the data is checked against the metadata?
@@ -0,0 +1,116 @@ | |||
# ruff: noqa | |||
def write_resource_data_to_raw(data_path, resource_properties) -> Path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the path
so we can use the path
properties instead. One thing we need to consider is the location of where this function will run. We either need to figure a way to give an absolute path, or restrict this function to only running in a directory that has a datapackage.json
(so it know's where root is). Or, we have a function to seek out what the root of the package is, if this is run from a subfolder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! Allowing it to run in the root folder or any subfolder of that seems okay to me.
check_is_supported_format(data_path) | ||
check_data_basics(data_path, resource_properties) | ||
check_data_constraints(data_path, resource_properties) | ||
raw_dir = Path(resource_properties.path / "raw") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think path
is resources/id/data.parquet
, but this a minor point.
Description
This PR moves the PlantUML diagram over into pseudocode and also adds a basic Mermaid diagram of the input and output flow.
This PR needs an in-depth review.
Checklist
just run-all