Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dataset model #1046

Closed
nozomione opened this issue Jan 11, 2025 · 1 comment · Fixed by #1052
Closed

Add Dataset model #1046

nozomione opened this issue Jan 11, 2025 · 1 comment · Fixed by #1052
Assignees
Labels

Comments

@nozomione
Copy link
Member

nozomione commented Jan 11, 2025

Context

To prepare the API for cart functionality, we need to create a Dataset model.

This issue defines the initial model schema by bootstrapping the model structure, which allows us integrate it into other parts of the process.

Problem or idea

The Dataset model will consist of the following field types:

  • User editable fields
  • On-request fields
  • Dataset status fields
  • Process information fields
  • Dynamic model properties

The initial model should contain the following fields:

class Dataset(TimestampModel):

    # Choices matched with the ComputedFile/Library models
    class FileFormats:
        ANN_DATA = "ANN_DATA"
        SINGLE_CELL_EXPERIMENT = "SINGLE_CELL_EXPERIMENT"

        CHOICES = (
            (ANN_DATA, "AnnData"),
            (SINGLE_CELL_EXPERIMENT, "Single cell experiment"),
        )
    
    id: UUIDField()

    # User editable fields (with validation)
    format: TextField(choices=FileFormats.CHOICES)
    data: JSONField(default={})   
    regenerated_from: ForeignKey('self', null=True) # validation required
    email: EmailField(null=True) # validation required
    start: BooleanField(null=True) # validation required (with a serializer)

    # On-request fields - values assigned on requests
    token: OneToOneField(Token) # Used to process the dataset
    download_tokens: ManyToManyField(Token) # Used to create download urls

    # Dataset status fields - values set during the processing workflow
    started_at: DateTimeField(null=True)
    is_started: Boolean(default=False)
    is_processing: Boolean(default=False)

    processed_at: DateTimeField(null=True)
    is_processed: Boolean(default=False)
            
    expires_at: DateTimeField(null=True)
    is_expired: Boolean(default=False) # If expires_at exists and has passed
    
    errored_at: DateTimeField(null=True)
    is_errored: Boolean(default=False)
    error_message: TextField(null=True)

    computed_file: OneToOneField(ComputedFile)

Other anticipated fields to be implemented later that require further discussion:

    # Implement later
    # Process information fields        
    api_release: CharField() # A commit hash from Docker
    last_edit_hash: CharField() # A hash for original files when the data attribute was last modified
    input_bucket_sync: DateTimeField() # The last time the original files were synced with the input bucket
    job_id: CharField() # The batch job ID

    # Dynamic model properties
    @property
    stats: Dict # For client UI presentation
    original_files # The result of getting the original files from the data attribtue 

Solution or next step

Based on the above requirements:

  • Implement the Dataset model
  • Generate the migration to apply the updates
@nozomione nozomione self-assigned this Jan 11, 2025
@nozomione nozomione added the API label Jan 11, 2025
@davidsmejia
Copy link
Contributor

So after the meeting I thought about the regenerated_from field for the model and I think that it should be a foreign key instead of OneToOne since it will be possible to have multiple datasets generated from an initial dataset. Can we update the issue to reflect that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants