Skip to content

carlblanchard/SitemapParser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SitemapParser

.NET library to parse Sitemap files. See official specification: https://www.sitemaps.org/protocol.html

Originally Forked from Louw.SitemapParser, as no updates have occurred for many years I have decided to maintain a fork and update it to support features like 301 redirects etc.

Previous documentation is below.


Support for various sitemap types:

  • Parse Robots.txt to detect sitemaps
  • Index Sitemaps
  • Normal Sitemaps

FUTURE DEVELOPMENT ROADMAP:

#####nuget The package is available on nuget https://www.nuget.org/packages/Louw.SitemapParser

install-package Louw.SitemapParser

#####Basic Example

	var sitemapLink = new Sitemap(new Uri("https://www.google.com/sitemap.xml"));
    var loadedSitemap = await sitemapLink.LoadAsync();

    if (loadedSitemap.SitemapType == SitemapType.Index)
        Debug.WriteLine($"Sitemap Index contains {loadedSitemap.Sitemaps.Count()} entries");
    else if (loadedSitemap.SitemapType == SitemapType.Items)
        Debug.WriteLine($"Sitemap contains {loadedSitemap.Items.Count()} content locations");

#####Load Sitemaps From Robots.txt Example

	var loader = new SitemapLoader();
    Sitemap robotSitemap = await loader.LoadFromRobotsTxtAsync(new Uri("https://www.google.com"));
    Assert.Equal(SitemapType.RobotsTxt, robotSitemap.SitemapType);
    Assert.NotEmpty(robotSitemap.Sitemaps); //We expect at least some Sitemaps to be in list
    Assert.Empty(robotSitemap.Items); //Robots.txt can only link to Sitemaps  (Not content items)

    Sitemap firstSitemap = robotSitemap.Sitemaps.First();
    Assert.False(firstSitemap.IsLoaded); //We only have sitemap location. Contents not yet loaded nor parsed

    var firstLoadedSitemap = await loader.LoadAsync(firstSitemap);
    Assert.True(firstLoadedSitemap.IsLoaded); //Now items are loaded!

    //We have to check type as we can either have links to other sitemaps (i.e. index sitemaps) 
    //-or- links to actual sitemap items (i.e. links to content)
    switch (firstLoadedSitemap.SitemapType)
    {
        case SitemapType.Index: Assert.NotEmpty(firstLoadedSitemap.Sitemaps); break;
        case SitemapType.Items: Assert.NotEmpty(firstLoadedSitemap.Items); break;
        default: throw new NotSupportedException($"SitemapType {firstLoadedSitemap.SitemapType} not expected here");
    }

#####More Examples

More examples can be found here: https://github.com/louislouw/Louw.SitemapParser/blob/master/test/Louw.SitemapParser.Examples/Examples.cs

About

Parser for Sitemap files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%