in english (en)
 DEALs   RSS DREAMFEEDER 
 RSS REPLAY 
 SOCIALINK 
 ARTICLEs 
 RONs BLOG 
 SUPPORT 
 PR & MEDIA 
 CORPORATE 
 HOW TO 
 AUTOMATED   VERY ADV TEXT FEED 
 FILEs 
 PDF 

How To: Automated Feeds:
Very Advanced Text Feed from Multiple Directories with File Name Filters and a Regular Expression Search

I'm going to assume that you already know how to use RSS DreamFeeder fairly well. If you are not already familiar with it then I would recommend that you first go through one of the other tutorials. If you're ready to proceed then we should talk about the scenario you are going to be working with.

The feed you're going to build is a promotional feed of press releases for both products and public relations. This will be an Automated RSS text feed that you will use the Advanced interface to build.

In this scenario ACME has two groups that release content for the press. The first is Product Management, who issue press releases related to new and updated products. The second is Public Relations, who issue press content for promotional purposes, including both press releases and press notes. They both have their own directories which contain this content, in 04_ProductManagement/releases and 06_PublicRelations/content respectively. A further examination of the PR/content directory reveals that PR has mixed both releases and notes in the same directory. If you are to build a feed with just press releases then you will need to filter out the notes.

If you open an example release from both pm/releases and pr/content you'll find that they look very similar and that key areas are the same. The template region has the same name (PageContent) the headline uses the same tag (H3) with no style (the style is overriding the normal display properties of the H3 tag).

As I have said before, consistency in design is a key element of effectively conveying information -- and templates and style sheets let you do that well. Templates control page structure and style sheets control the graphical presentation of content. These two tools allow content to be restricted in placement within the document (template regions) and adherent to a predefined visual order (style selection - also called classing).

Moreover, classing for styles (or dropping in a template region) is also tagging with meta data. If a headline is classed HomePageHeadline because the style sheet says so, then we can reverse that relationship and say that any text/data with the class HomePageHeadline is a headline. What something looks like is what it is and is also what it looks like.

You should also notice the names of the files. All releases are called WHATEVERRelease.html. The PM folks and the PR folks have slightly different naming conventions, but they both agree on calling a release Release.html. Enforcing naming conventions, especially on a very large website, can be difficult but it is absolutely worth while because you can then gather much information about content before the file is even opened. No serious attempt at a large-scale website should forgo naming conventions -- you'll suffer for it.

So start a new feed by pressing the new feed button in the RSS DreamFeeder floating panel.

When the dialog is displayed you are presented with the basic interface, so click on the Advanced tab.

The first panel of the Advanced interface provides fields for descriptive content for the feed. The only required fields are Title and Description. Now go to Feed Settings from the Category list.

In Feed Settings you will decide what type of a feed you are building (Text Feed) and what file format to use (RSS 2.0). Then tell it to collect content from Files and that you want to have your computer do the work of updating the feed (Local Processing). Next, provide the Site Settings.

Under Site Settings you'll give RSS DreamFeeder the Base URL that it will use to translate the local links to full URLs for the feed. Once entered move on to Summarize.

Under Summarize you will define where the files reside within the website that you want RSS DreamFeeder to extract content from. You'll be extracting content from two directories so select Directories and then use the plus button on the right to add pm/releases and pr/content to the list of directories. Then tell it that the files names end with Release.html so that RSS DreamFeeder will only grab the files that match the naming convention.

Under Elements you can decide which elements of the feed you are going to include. But in this case we're going to stick to the basic set.

Now launch the content sampler and sample the Headline and the Story (the whole PageContent template region). Then Press the Done button to return to the Edit dialog.

When you return to the edit dialog you'll come back to the same Elements panel. Now go to the panel for defining content extraction for Headline. You'll see that it already has the H3 tag defined, but I like to be more precise if I can be so that if the page changes and there is an H3 before this one or added to the template or something I can still use these settings. The headline was within the story, so restrict the location to "Within the Story". Now move on to Story.

The Story settings from the Content Sampler are perfect so we will leave them alone. On to Link.

Link's default setting is to use the location of the current page that we are extracting from. That is exactly what you want to do here -- have the link point back to the original file that RSS DreamFeeder is pulling content from. So don't change that either. On to Date.

Now Date defaults to the Current Page's Modification Date, which is useful, but not really what we're after. If a page is modified, even for something as simple as fixing a typo then the modification date will be off. So the right answer is to extract the date from the dateline text in the document. This is where a Dreamweaver datestamp would have proven useful (and is an option in Match Type popup menu) but the authors of the page didn't provide one. So there is only one final option - an advanced text search called a Regular Expression. Now there may be multiple dates on a page so to be sure to find the right one look for the first one after the Headline.

You want to match the dateline string that looks like
May 27, 2009
WORD-SPACE-NUMBER-COMMA-SPACE-TWOTHOUSANDSOMETHING

In regular expression there are special strings that mean a particular character: a word character (\w); a space character (\s); a digit character (\d). These strings are usually then modified to indicate how many characters to include: zero or more (*); one or more(+); zero or one [maybe there maybe not] (?). Any characters that are not these characters (plus some others) are what they are: a means an a; R means an R; 2 means a 2. Of course this is just the tip of the regular expression iceberg and you can learn lots more about it in the documentation or by searching online. Regular expressions are one of the most powerful text manipulation tools you can use and its part of RSS DreamFeeder.

So back to the match. To match something that looks like this
May 27, 2009
WORD-SPACE-NUMBER-COMMA(maybe)-SPACE-TWOTHOUSANDSOMETHING
\w+ \s+ \d+ ,? \s+ 20\d+
\w+\s+\d+,?\s+20\d+

The last line above is the final regular expression you want to use. If for any reason it didn't match the modification date of the file (the original default) will be used instead. With something like this, where even a small typo can happen easily and get you into a world of trouble you have got to test it out. Point the Test tool at one of your press releases (I used 090528TastyRelease.html from pr/content) and give it a shot.

Now on to the Author element. In this scenario the Author should always be the same thing (a Fixed Value): the text "ACME F&N".

The configuration is now complete so press Save and save it as releases.rss in the root directory.

You can see that there are 5 files to check -- more than in either directory alone. That means that it is finding content to collect in both directories.

Process the feed and try it in your news reader.

Congratulations -- You have created a very advanced RSS feed.

If you're interested you may choose to proceed to another tutorial::