Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umar/4903 external resources false broken #2249

Merged
merged 3 commits into from
Aug 2, 2024

Conversation

umar8hassan
Copy link
Contributor

@umar8hassan umar8hassan commented Jul 22, 2024

What are the relevant tickets?

closes https://github.com/mitodl/hq/issues/4903

Description (What does it do?)

Fixes the false is_broken flag in case of Ignored HTTP status codes. And adds the functionality to update is_broken flag on each check rather waiting to be set manually.

How can this be tested?

  1. switch to branch 'umar/4903-external-resources-false-broken'
  2. Restart containers.
  3. From celery logs, verify a repeated task should be added in the celery with name check-broken-external-urls. Task is set to repeat every week. Frequency can be modified by changing CHECK_EXTERNAL_RESOURCE_STATUS_FREQUENCY in main/settings.py
  4. Go to http://localhost:8043/admin/external_resources/externalresourcestate/. This should already be created.
  5. External Resource States should have been populated with existing url checks after first execution of task.

Additional Testing

To manually test the functionality locally with a batch of urls

  1. Spin up containers
  2. Get into Django shell: docker-compose exec web ./manage.py shell
  3. Run following commands to import necessary code:
    • from external_resources.tasks import check_external_resources
    • from websites.models import WebsiteContent
    • from websites.constants import CONTENT_TYPE_EXTERNAL_RESOURCE
  4. Get website content data:
    external_resources = list(WebsiteContent.objects.filter(type=CONTENT_TYPE_EXTERNAL_RESOURCE).values_list("id", flat=True))
  5. (Optional) check length of data and splice if necessary: len(external_resources) and external_resources = external_resources[:1]
  6. run checking task: check_external_resources(external_resources)
  7. to validate resource was updated or created, retrieve resource from.DB:
    • resources = WebsiteContent.objects.filter(id__in=external_resources).select_related("external_resource_state")
    • resource = next(resources.iterator())
    • str(resource.external_resource_state.last_checked)
    • resource.metadata['is_broken']
  8. To test if flag is updated on each try, manually update flag and save.
    • resource.metadata['is_broken'] = not resource.metadata['is_broken']
    • resource.save()
  9. execute step 6
  10. Validate last_checked and flag again of the updated resource.
    • resource = WebsiteContent.objects.get(id=resource.id)
    • print(str(resource.external_resource_state.last_checked), resource.metadata['is_broken'])

To manually execute task in celery

  1. Spin up containers
  2. Get into Django shell: docker-compose exec web ./manage.py shell
  3. Run following commands to import necessary code:
    • from external_resources.tasks import check_external_resources_for_breakages
    • results = check_external_resources_for_breakages.apply()

** This might produce large number of urls to check and may take some time to complete.

@umar8hassan umar8hassan marked this pull request as ready for review July 22, 2024 12:22
@umar8hassan umar8hassan self-assigned this Jul 22, 2024
@pdpinch
Copy link
Member

pdpinch commented Jul 22, 2024

I'm still confused about the business logic around the backup_url. Do we have a process yet for setting backup_url? When do we use the backup_url, if ever?

@umar8hassan
Copy link
Contributor Author

I'm still confused about the business logic around the backup_url. Do we have a process yet for setting backup_url? When do we use the backup_url, if ever?

No, we do not have any process yet.
Backup URL is used when the link checker marks the main url as broken. It was kinda contingency to keep the resources working until someone can check and update main URL.

@pdpinch If there is no plan to add a process for backup URL, I'll drop it. Currently, it's serving no purpose.

@pt2302
Copy link
Contributor

pt2302 commented Jul 22, 2024

I think the intention of backup_url is to provide a permalink to Internet Archive or similar, if the primary link is no longer available online.

@pdpinch
Copy link
Member

pdpinch commented Jul 22, 2024

Thanks for the reminder Paul.

We decided that we weren't ready to do any automated failover to back_url. Without that, I don't think it makes sense to include the backup_url in the calculation of whether or not the link is broken.

We could remove it altogether, but one of our next issues in this epic is to work on "when publishing a course, submit external links to the Internet Archive" at which point it may be convenient to have a field to store the internet archive URL on, even if there is no automated failover.

@pt2302 pt2302 self-requested a review August 1, 2024 11:57
Copy link
Contributor

@pt2302 pt2302 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@pt2302 pt2302 self-assigned this Aug 1, 2024
@umar8hassan umar8hassan merged commit d0a733e into master Aug 2, 2024
7 checks passed
@umar8hassan umar8hassan deleted the umar/4903-external-resources-false-broken branch August 2, 2024 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants