Implementing my Website in Flask - Part 6

Dec 1, 2021

This is part 6 in a series about implementing a personal blog in Python's Flask framework. In this post, I'd like to cover something that I didn't find many good tutorials on - how to create an RSS feed. While there were a few articles out there which mentioned that RSS feeds were going out of style, they're still a relevant part of the internet and are really easy to hook into a Flask website. Also, keeping track of your favorite sites via RSS feeds is a great way to kick your social media habit.

Each part of this series can be found below:

As always, you can find the source of this blog on Github.

RSS Feed Basics

RSS (or Really Simple Sindication) is a specification for distilling a website down to a single web feed. Through tools like news aggregators, it allows users to automatically monitor their favorite websites without needing to periodically check back at each site or be tied to social media to access their favorite content.

Behind the scenes, RSS is composed of an XML document. If you'd like a in-depth look at the format, the W3 RSS validator provides a good explanation, even if the website seems a bit dated.

I find it's easiest to first show an example of a basic RSS feed and then build from there. As long as you are partially familiar with XML, the following example should be pretty straightforward.

The basic document looks like this:

1
2
3
4
5
6
7
8
<?xml version="1.0" encoding="utf-8"?>
 <rss version="2.0">
   <channel>
     <title>My RSS Feed</title>
     <link>http://my.feed.com</link>
     <description>A really basic RSS feed</description>
   </channel>
 </rss>

The first line of the feed specifies that the document is an XML document. Note that XML is very strict about whitespace and this line needs to be the first. That tripped me up for a while, as I typically add comments describing the file and license information at the top of each file.

After the XML version information comes the <rss> tag (used to specify the RSS version) and the <channel> tag (used to encapsulate all of the information about your channel). Within the <channel> tag are several pieces of required information about the channel:

  • The title of the channel (within the <title> tags)
  • The URL to the channel's website (within the <link> tags)
  • The description of the channel (within the <description> tags)

There are several other optional tags if you'd like to specify more about your channel. Some common ones are:

  • Language (through the <language> tag)
  • Copyright (through the <copyright> tag)
  • The latest publish date (through the <pubDate> tag)
  • An image (through the <image> tag)

When specifying an image, you also need to include three child elements:

  • The <url> tag - The URL of the image
  • The <title> tag - The title of the image
  • The <link> tag - The URL of the site (so that the image serves as a link to the channel)

In practice, the <title> and <link> tags should match the channel's.

This is all great, but it doesn't make sense to try to specify a channel without content. To do that, you can use the <item> tag. In theory, <item>s are optional, as are all of the children of <item>s. However, these are what represent individual pieces of content to the user, so it makes sense to have as much detail as needed. A channel can have as many <item>s as desired, and the only requirement is that either <title> or <description> are included as child elements of the <item>. Here's a quick overview of tags I found useful for my feed:

  • <author> - Who created the content
  • <description> - A summary of the content
  • <guid> - A string that uniquely identifies the item (I used the URL)
  • <link> - The URL to the content
  • <pubDate> - The date the content was created
  • <title> - The title of the item (news article name, image name, etc)

For the full list, see the spec.

For an example of how items could be used in our previous simple RSS example, see below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
<?xml version="1.0" encoding="utf-8"?>
 <rss version="2.0">
   <channel>
     <title>My RSS Feed</title>
     <link>http://my.feed.com</link>
     <description>A really basic RSS feed</description>
     <item>
       <title>My First Blog Post!</title>
       <link>http://my.feed.com/first</link>
       <description>In my first blog post, I say "Hello World!"</description>
       <pubDate>Thu, 1 Jan 1970 00:00:00 EST</pubDate>
       <guid isPermaLink="false">http://my.feed.com/first</guid>
     </item>
     <item>
       <title>How to Write RSS Feeds</title>
       <link>http://my.feed.com/rss-feeds</link>
       <description>A description of how to write RSS feeds</description>
       <pubDate>Tue, 23 Nov 2021 20:22:50 EST</pubDate>
       <guid isPermaLink="false">http://my.feed.com/rss-feeds</guid>
     </item>
   </channel>
 </rss>

With the newly added <item> tags, you now have a fully functioning RSS feed! Any user who points their web feed at the URL that serves this XML will be able to see the description and image of each new piece of content. With a simple click, they'll be redirected to your website.

Before continuing further, I just wanted to mention one thing on dates. If you look closely at the two <pubDate>s in the previous example, you'll see both are in what's known as RFC822 format. It's the only format that's accepted by the RSS standard. To meet the standard, you need the three-character representation of the day of the week followed by the zero padded day of the month, the three-character representation of the month, and then the year. Following the date is the 24-hour representation of the hours, minutes, and seconds, and then the time zone abbreviation.

Styling an RSS Feed

While the last example is good enough if you just want a simple feed, it can look pretty boring in a news aggregator. Instead of showing off all the hard work that went into styling the actual website, the RSS feed will have the default font, a white background, etc. If you want your RSS feed to pop, you'll have to add a stylesheet (yes, XML has stylesheets too).

The primary purpose of the XML stylesheet in regard to RSS feeds is to specify how to convert the XML document into a XHTML document. XHTML documents (like the name suggests) are an intermediate form between XML and HTML. It includes a really similar structure to HTML, including a <head> section where you can link CSS stylesheets.

The mechanism to convert XML to XHTML is the XSLT transformation language. See here for an overview of the spec. I found it helped to see an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:template match="/">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title>RSS Feed - <xsl:value-of select="/rss/channel/title"/></title>
        <link href="/my/stylesheet" rel="stylesheet" type="text/css" media="screen"/>
      </head>
      <body>
        <xsl:for-each select="/rss/channel/item">
          <h2><xsl:value-of select="title"/></h2>
          <time><xsl:value-of select="pubDate"/></time>
          <p><xsl:value-of select="description"/></p>
          <a>
            <xsl:attribute name="href">
              <xsl:value-of select="link"/>
            </xsl:attribute>
            See more...
          </a>                    
        </xsl:for-each>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>

In the above example, the first five lines tell the XML parser that you are using the XSLT language. Again, make sure that these lines are the first in your document (line 1 especially needs to be above all else, even comments).

After these lines, the rest of the document is really similar to HTML. The only differences are the tags that start with xsl:. These serve a really similar purpose to the Jinja template specifications, but they're just specified in XML instead. The important ones are:

  • <xsl::value-of>: Substitutes a specified value for the tag (like Jinja's { } syntax).
  • <xsl::for-each>: Repeats the child elements for each XML element matching the selection (like Jinja's {% for %} syntax).
  • <xsl::attribute>: Sets the attribute of the parent node. In this case, it's setting the href attribute of the <a> tag.

You'll notice that the first two tag-types in this list contain a select attribute. This specifies where in the RSS feed to find the associated information.

While these values look like file system paths, they're actually referring to nodes within the RSS feed XML document. For example, /rss/channel/title actually means "retrieve the <title> from each <channel> under the <rss> tag". For my example RSS feed, this would return "My RSS Feed".

For the <xsl::for-each> tag case, /rss/channel/item means "Repeat this section for each <item> in each <channel> in the RSS document". For each child element, the relative paths referred to by select refer to the title, pubDate, and description of the items of the channel (instead of the channel itself).

Now to link the XML stylesheet to the RSS feed, add the <?xml-stylesheet?> tag to the top of the RSS feed like the following:

1
2
3
4
5
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="/path/to/stylesheet.xsl" type="text/xsl"?>
 <rss version="2.0">
   <channel>
...

Now if you inspect the document after it is served to your browser, you'll notice that it was actually provided something that looks like HTML. To style it, just use regular-old CSS. You can link a CSS stylesheet with the <link> tag, like I did on line 8 in the XSLT example two code snippets above. CSS selectors work exactly the same as they would in regular HTML documents. And just like regular HTML, you can use class and id attributes to refer to elements you wish to style.

One last tip about working with XSLT documents: be sure to close all tags, even those that don't typically need to be closed in HTML. Otherwise, you could run into parser errors. You'll notice in the example above that the <link> tag is closed for this reason.

Serving the XML with Flask

Serving an RSS feed with Flask is just as simple as any other page. Plus, XML can also be rendered with the Jinja templating language, just like HTML content. You'll need two templates - one for the actual RSS feed and another for the stylesheet. The stylesheet contents can mostly be generated using the XSLT format from the last section rather than Jinja templating, but if you find you still need Jinja templating, it's possible to use both.

Once the templates are defined, serving the XML and associated stylesheet is really simple. You just need to define two additional routes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import flask
from myapplication import app, get_content  
  
@app.route('/path/to/my/feed.xml')
def serve_rss_feed():
    context = {
        'title': 'My RSS Feed',
        'content': get_content(),
    }
    rss_xml = flask.render_template('feed.xml.jinja', **context)
            
    response = flask.make_response(rss_xml)
    response.headers['Content-Type'] = 'application/xml'
    return response

@app.route('/path/to/my/feed.xsl')
def serve_rss_stylesheet():
    context = {
        'title': 'My RSS Feed',
        'content': get_content(),
    }
    rss_xsl = flask.render_template('feed.xsl.jinja', **context)
            
    response = flask.make_response(rss_xsl)
    response.headers['Content-Type'] = 'application/xml'
    return response

If you find that the RSS stylesheet can be defined completely with the XSLT language, the second path can be done away with. In that case, you can just put the stylesheet in the static content directory and allow it to be served with your images, icons, and other static content. I elected to use Jinja templating along with the XSLT language so that I could reuse some common code within the <head> and <footer> sections of the document.

The Atom Syndication Format

Another feed format which is used by a lot of news aggregators is called the Atom Syndication Format. Just like RSS, it's XML based. Most RSS validators recommend publishing feeds that meet both the RSS and Atom spec to be compatible with the widest range of news aggregators. Luckily, it's really easy to convert an existing purely RSS feed to an Atom and RSS feed.

First, add the Atom spec version to the <rss> tag of the feed XML document so that the document parser knows how to parse the feed:

1
2
3
4
5
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="/feed.xsl" type="text/xsl"?>
 <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
   <channel>
...

Next, add the <atom:link> tag as one of the children of the <channel> node. As the name implies, this serves as a URL which points the Atom feed to your feed or your website. This can be done similar to below:

1
2
3
4
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://my.feed.com" rel="self" type="application/rss+xml" />
...

...and now your feed meets the Atom spec! With those two simple changes, your feed will be accessible over the widest range of news aggregators.

Summary

Though a little outdated, RSS feeds are still all over the place and a valuable way to reach readers. They're also an invaluable way to follow your favorite online resources without having to rely solely on social media.

While XML itself is a little quirky and bulky, the feed itself is really easy to implement in Flask - just make sure to watch out for the gotchas I've outlined above. If you run into any issues or have questions, feel free to shoot me an email!