Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch cost analysis #959

Open
avrohomgottlieb opened this issue Nov 15, 2024 · 3 comments
Open

Batch cost analysis #959

avrohomgottlieb opened this issue Nov 15, 2024 · 3 comments
Assignees

Comments

@avrohomgottlieb
Copy link
Contributor

avrohomgottlieb commented Nov 15, 2024

Context

In issue #944, we succeed in generating computed files on Batch and successfully ran jobs for each download_config.

Issues #956 and #957 will address certain discrepancies that arose in the output files and during job runs themselves (platform related).

In this issue we should look into different resource allocation strategies for determining optimal cost. As a result of issues 956 and 957, the bottlenecks should become clearer and we'll be able to ascertain the best possible cost strategies.

We should attempt to collect and produce the results of these different strategies, as they pertain to items like job duration / file size / memory size / networking data, etc.

@avrohomgottlieb avrohomgottlieb self-assigned this Nov 19, 2024
@avrohomgottlieb
Copy link
Contributor Author

Now that we've spent the last few weeks optimizing the Batch implementation and working out the kinks, this issue is going to be repurposed for the goal of just measuring the baseline time and cost of reloading all projects on staging.

We're going to keep the current queue and compute_environment configurations, with 16 vCPUs and 200 GB of Ephemeral storage.

@avrohomgottlieb
Copy link
Contributor Author

avrohomgottlieb commented Dec 9, 2024

Dev Stack Results

Below were the results from running everything on my dev stack.

The dev stack utilized the following resources:

  • compute environment: 16 vCPU
  • job definition: 1 vCPU, 4 GB Memory, 200 GB of ephemeral storage per job
  • job queue: 1 queue
Location Process Total Duration First Job Received Last Job Completed
API Metadata Loading 00:27:00 Sunday 14:37 Sunday 15:04
Batch Computed File Generation 02:58:56 Sunday 15:04:34 Sunday 18:03:30

@avrohomgottlieb
Copy link
Contributor Author

avrohomgottlieb commented Dec 9, 2024

Staging Results

The following are the results of running the entire portal on Batch with different resource allocations:

Compute Environment Job Definition Total Duration First Job Received Last Job Completed
16 vCPU 1.0 vCPU - - -
16 vCPU 0.5 vCPU - - -
32 vCPU 1.0 vCPU - - -
32 vCPU 0.5 vCPU - - -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant