Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix redirects including URL fragments #1424

Merged
merged 3 commits into from
Jan 8, 2025
Merged

Fix redirects including URL fragments #1424

merged 3 commits into from
Jan 8, 2025

Conversation

deivid-rodriguez
Copy link
Member

@deivid-rodriguez deivid-rodriguez commented Jan 7, 2025

What was the end-user problem that led to this PR?

The problem was that some pages include links to URLs with a fragment, and without a v2.x/ url segment, and the fragment is actually lost when clicking the link. See for example the https://bundler.io/man/gemfile.5.html#PLATFORMS link in https://bundler.io/guides/using_bundler_in_applications.html#gemfilelock.

What was your diagnosis of the problem?

My diagnosis was that middleman uses client side redirects to implement redirections, so pages like https://bundler.io/man/gemfile.5.html actually serve the following content, with a 200 status:

curl https://bundler.io/man/gemfile.5.html 
              <html>
                <head>
                  <link rel="canonical" href="/v2.6/man/gemfile.5.html" />
                  <meta http-equiv=refresh content="0; url=/v2.6/man/gemfile.5.html" />
                  <meta name="robots" content="noindex,follow" />
                  <meta http-equiv="cache-control" content="no-cache" />
                </head>
                <body>
                </body>
              </html>

That means requesting a client side redirect (without the fragment).

What is your fix for the problem, implemented in this PR?

My fix is to actually serve the same content as v2.6/man/gemfile.5.html, by making everything in v2.6/man/ a symlink to man/. We will serve duplicated content, but our canonical meta tags should hint crawlers about what the canonical page is.

Why did you choose this fix out of the possible options?

I chose this fix because it fixes #1347, and it may also help with SEO issues like #1333.

Middleman redirects create pages with empty body that request a client
side redirect through meta tags. This does not play nice with URL
fragments.

Also, I suspect this is why our SEO is not great at the moment.
The `/v2.6/man/` and `/v2.5/man/` urls actually serve different content,
so they should not all have the same canonical URL. Only make `/man/` the
canonical page of the latest version, `/v2.6/man/`.
@olleolleolle
Copy link
Member

This looks nice, and the solutions make more sense.

(I read the commits one by one, so that I could see the changelog stuff in separation.)

Let's use this!

@deivid-rodriguez
Copy link
Member Author

changelog and contributor changes are a bit annoying. I'm considering to stop rebuilding that everytime, and instead get them updated explicitly through a weekly automatic PR or something like that. Also, maybe we don't need the changelog page at all and can link to the rubygems/rubygems repo changelog page instead.

@deivid-rodriguez
Copy link
Member Author

Ok to merge @olleolleolle?

@olleolleolle olleolleolle merged commit 6da0c6e into master Jan 8, 2025
3 checks passed
@olleolleolle olleolleolle deleted the fix-redirects branch January 8, 2025 12:18
@olleolleolle
Copy link
Member

@deivid-rodriguez I clicked the button!

@deivid-rodriguez
Copy link
Member Author

Cool! I'll check Google in a couple of weeks to see if something has changed regarding #1333.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Url fragment is lost due to redirection
2 participants