RSS2Email in Less Than 50 Lines of PHP

Sep
16
2005

This script is part of an eventual chain of loosely coupled scripts (mentioned in the post about my new ideas for RSS aggregation), and is how I often write applications. Each script does one thing: in this case, send the contents of a single RSS feed to an email address. Other scripts will parse OPML, seek out additional content or do analysis (like the Yahoo keyword extractor), and eventually generate the RSS feed that will be fed to this script. However, as it does one thing and only one thing, it becomes a private web service that you can reuse whereever you may want this.

It's not intended for multi-user setup or even sending the same item to an address twice. Instead, it will only send each item in a feed once and thereafter refuse to send that item/posting again unless you clear out the "items" directory. Make sure that directory is writable by your web server. You should also probably do what most of the web services out there do to prevent abuse and add a big: $key = "someprivatekey" and wrap the whole script in an if $key isn't valid, die() setup. I've done that on my installation of this script because I don't want the entire world routing their RSS to random email addresses via my server. I'd end up on some nasty spam blacklists pretty quickly.

You do need installations of MagpieRSS and PHPMailer on your PHP server. However, if you want to do much with either sending email or RSS reading, you're going to want both in your library directory anyway. I didn't need to do any special setup on this server for either script other than following the instructions.

The resulting script is called via a URL that looks like this:
http://yoursite/rss2email/emailfeed.php?feedurl=URL_ENCODED_URL&email=URL_ENCODED_EMAIL

The feedurl and email address need to be URL encoded to turn things like spaces into %20 and convert the slashes, etc. While that's pretty standard practice, I know that a lot of folks are just starting out in web development and that kind of information ends up tripping them up.

On to the code.

This first part just pulls in the 2 libraries we need to get started. The paths need to be real and I suggest you set up a directory like "lib" to keep these shared libraries so that you can always just insert the same tidbit using the absolute filesystem path to pull it in. It also grabs the 2 parameters from the URL for our use.

require_once('/html/lib/magpierss/rss_fetch.inc');
require_once('/html/lib/phpmailer/class.phpmailer.php');
$url=$_GET['feedurl'];
$emailaddress = $_GET['email'];

Now that we have an RSS URL, we'll use Magpie to turn that XML into a nested array in PHP.

$rss = fetch_rss($url);
$items = array_slice($rss->items, 0);

Yes, that's it. $items now contains a nested array of the RSS feed. Do a "print_r($items);" after this point and you'll see the entire structure of the RSS feed. Use "feedurl=http%3A%2F%2Fwww.wynia.org%2Fwordpress%2Ffeed%2F" as your input and you'll see the most recent articles from this site in that structure.

OK. So, we've got an array. Now what?

We're going to loop through it and build an email to send to the email address for each item. There are comments in this next larger block of code for the parts which are less obvious.

foreach ($items as $item){
		$id = $item['guid'];
                //We're going to need a unique ASCII identifier for each item later, so we'll use md5 to get one
		$itemhash = md5($id);
		$link = $item['link'];
		$subject = $item['title'];
                //The quote entities (smart quotes) cause a problem in many mail clients
               //so strip them out as well as getting rid of the ? characters that can come in
		$subject = str_replace("?"," ",$subject);
		$subject = str_replace("’","'",$subject);
		$subject = str_replace("“","\"",$subject);
		$subject = str_replace("”","\"",$subject);
		$description = $item['content']['encoded'];
		$description = str_replace("’","'",$description);
		$description = str_replace("“","\"",$description);
		$description = str_replace("”","\"",$description);
		//Build the body with a link to the real page and the article itself
                $body = "<a href='$link'>View Page</a><br>$description<br>";

We've now got $body with the full text of the item and a link to read it on the web. The email is going to be HTML, so that link will work. If you were going to be sending to plain text, you wouldn't want to use the whole HTML link setup, but just print the URL instead.

Now, we've already got enough to just start sending email. However, if we do that, we're going to end up with a mess the second time we run our script because it's just going to send all of the items again. So, we need a mechanism to mark which ones we've already sent and avoid resending those. I frequently use a dedicated directory for this and put an empty file with a unique name for each into that directory as a way to do this. It's simple, doesn't require a database and is easy to clear out during testing. Just create a directory next to this script called "items" and make it writeable by your web server. We'll use the md5 hash of the item's guid as the filename. md5 makes decent filenames because they're all the same length, nearly perfectly unique (theoretically someone has maybe proven they *might* not be) and are ASCII letters and numbers making them no problem for filesystems.

        	//Check to see if we've already sent this item
		if(file_exists("items/$itemhash")){
			//don't send an email
			print("Already sent item: $subject<br>");
		} else {
                        //We haven't sent this one yet, so go ahead with the sending of email.
                        //Be sure to change the From email and From Name to your setup
			$mail = new phpmailer();
			$mail->IsHTML(true);
			$mail->AddAddress($emailaddress, "RSS Reader");
			$mail->From = "feedhive@feedhive.org";
			$mail->FromName = "Onyx Cube";
			$mail->Subject = $subject;
			$mail->Body    = $body;
			if(!$mail->Send())
			{
			   echo "There was an error sending the message $subject";
			}
			//Mark it as already sent by creating an empty file
                        //Note that we do this only after we haven't failed
                        //in sending. This avoids missing an item due to mail problems
			touch("items/$itemhash");
		}
}

The end result will be that you will receive at the target email address, all of the new items from that feed. Magpie already caches the feed itself to prevent overloading it and our script will only send a given item once. If you just want to use the script as a simple forwarding mechanism for your own couple of feeds, you can easily add entries in your cron scripts or in the Windows scheduler to hit this URL every so often. Or, create a wrapper script that uses Magpie to take an OPML file and calls our RSS2Email script for each feed in turn and put that into your scheduler.

Also note that if you use the $items['description'] instead of the content, you'll get the shorter summary for the item. If you are just wanting to be notified via email and not do the eventual reading there, that's the way to go with this thing.

 

Comments on this post

Feedback is always welcome. Read some from other folks or leave your own below. Just keep things civil and remember that what you post lives on in public. Forever.

Thanks,
J

5 Responses to “RSS2Email in Less Than 50 Lines of PHP”

  1. Citizen Keith Says:

    RSS2Email in Less Than 50 Lines of PHP Adium Xtras - Home EzStatic - Red Alt

  2. RSS Haciendo facil lo simple » Blog Archive » Rezzibo : Lector de feeds en español Says:

    [...] Bonus! : RSS2email en menos de 50 lineas de PHP [...]

  3. RSS2Email Extending, RSS Analysis and Yahoo Keyword Extractor Example Fixed-- The Glass is Too Big - J Wynia Says:

    [...] I was grabbing that code because I wanted to start injecting metadata into the headers of the emails that the RSS2Email script sends out. By putting things like keywords into X-Headers, the sophisticated IMAP clients like Outlook and Thunderbird can filter, sort and extract emails according to purpose. Eventually, I intend to tag feeds with the names of blogs I write on where the feed is relevant. That way, I can open a saved filter for this site and exclude all of the stuff that's just for my personal edification. Additionally, if you've got things like Technorati rank for a site, Google PageRank, a manual credibility tag, etc. on the feeds themselves, you can filter through a breaking story to see whether the people you put your credence in have weighed in on the issue or not. Eventually, I'm going to be putting a database between the fetching of the feeds and the sending out of email as it's a much better system on bandwidth, decoupling the refreshing of feeds from the sending of the emails, etc. and just plain a better idea, but for a proof of concept, leaks are OK. [...]

  4. Eric Says:

    Thank you for this. I was able to use this with some modifications to get a new RSS2Email program working for my club. Because it is in PHP I am much more comfortable with the code.

    I want to point out 2 problems I had.

    I had to change the setting for $description. I don't know if Magpie changed or what. My change was as follows:

    $description = $item['description'];

    Also at the bottom of your code you are missing an "else" statement.

    The last few lines of code were changed as follows…

    echo "There was an error sending the message $subject";

    //Mark it as already sent by creating an empty file
    //Note that we do this only after we haven't failed
    //in sending. This avoids missing an item due to mail problems
    } else {
    touch("items/$itemhash");
    }
    }
    }
    ?>

    Other than that I did do some modifications but they were just based on personal preferances not on functionality.

    Thanks Again.

  5. luis Says:

    I wonder if someone has done the work and has packed this into a php tool for rss2email for multiple users for instance and to manage subscriptions.

    Please let me know at cordoval at the g mail

    thanks,

Leave Your Own Comment

By submitting a comment, you agree to license it under the terms of the Creative Commons Attribution license.

People who post comments get the added benefit of visiting the site without advertising.

© 2003-2009 J Wynia. All original content is licensed under the terms of the Creative Commons Attribution license unless otherwise noted. Content from other sources is licensed under its original terms.