Add Dataset model #1046

nozomione · 2025-01-11T00:00:01Z

Context

To prepare the API for cart functionality, we need to create a Dataset model.

This issue defines the initial model schema by bootstrapping the model structure, which allows us integrate it into other parts of the process.

Problem or idea

The Dataset model will consist of the following field types:

User editable fields
On-request fields
Dataset status fields
Process information fields
Dynamic model properties

The initial model should contain the following fields:

class Dataset(TimestampModel):

    # Choices matched with the ComputedFile/Library models
    class FileFormats:
        ANN_DATA = "ANN_DATA"
        SINGLE_CELL_EXPERIMENT = "SINGLE_CELL_EXPERIMENT"

        CHOICES = (
            (ANN_DATA, "AnnData"),
            (SINGLE_CELL_EXPERIMENT, "Single cell experiment"),
        )
    
    id: UUIDField()

    # User editable fields (with validation)
    format: TextField(choices=FileFormats.CHOICES)
    data: JSONField(default={})   
    regenerated_from: ForeignKey('self', null=True) # validation required
    email: EmailField(null=True) # validation required
    start: BooleanField(null=True) # validation required (with a serializer)

    # On-request fields - values assigned on requests
    token: OneToOneField(Token) # Used to process the dataset
    download_tokens: ManyToManyField(Token) # Used to create download urls

    # Dataset status fields - values set during the processing workflow
    started_at: DateTimeField(null=True)
    is_started: Boolean(default=False)
    is_processing: Boolean(default=False)

    processed_at: DateTimeField(null=True)
    is_processed: Boolean(default=False)
            
    expires_at: DateTimeField(null=True)
    is_expired: Boolean(default=False) # If expires_at exists and has passed
    
    errored_at: DateTimeField(null=True)
    is_errored: Boolean(default=False)
    error_message: TextField(null=True)

    computed_file: OneToOneField(ComputedFile)

Other anticipated fields to be implemented later that require further discussion:

    # Implement later
    # Process information fields        
    api_release: CharField() # A commit hash from Docker
    last_edit_hash: CharField() # A hash for original files when the data attribute was last modified
    input_bucket_sync: DateTimeField() # The last time the original files were synced with the input bucket
    job_id: CharField() # The batch job ID

    # Dynamic model properties
    @property
    stats: Dict # For client UI presentation
    original_files # The result of getting the original files from the data attribtue

Solution or next step

Based on the above requirements:

Implement the Dataset model
Generate the migration to apply the updates

The text was updated successfully, but these errors were encountered:

davidsmejia · 2025-01-13T17:30:51Z

So after the meeting I thought about the regenerated_from field for the model and I think that it should be a foreign key instead of OneToOne since it will be possible to have multiple datasets generated from an initial dataset. Can we update the issue to reflect that?

nozomione self-assigned this Jan 11, 2025

nozomione added the API label Jan 11, 2025

nozomione mentioned this issue Jan 14, 2025

1046 - Add Dataset model #1052

Merged

4 tasks

nozomione closed this as completed in #1052 Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dataset model #1046

Add Dataset model #1046

nozomione commented Jan 11, 2025 •

edited

Loading

davidsmejia commented Jan 13, 2025

Add Dataset model #1046

Add Dataset model #1046

Comments

nozomione commented Jan 11, 2025 • edited Loading

Context

Problem or idea

Solution or next step

davidsmejia commented Jan 13, 2025

nozomione commented Jan 11, 2025 •

edited

Loading