So in my current project at Shadowcat, I am using a CPAN module called Perlanet, as written by Dave Cross. This module/program is mainly for aggregating web feeds (Rss or Atom feeds) and creating a new feed and web page from them.
Now, with the stuff I’m doing for Shadowcat, I have been refactoring a lot of my code from the IronMan codebase, and customising it to do what I need. However, this did lead to a lot of ‘How the hell did that actually run?’ moments, as I didn’t fully understand the underlying modules. So to fix this, I am going to do some messing with Perlanet of my own, and hopefully have a useful walkthrough at the end of it.
Either that, or this will be an entertaining read of how to bash your head against a desk… so. To Coding!
Stage one: Throw out the current docs
Now, as with most CPAN modules, there is some documentation shipped with each piece of code. And, as with all documentation, sometimes its not massively useful. Now, this might just be me being thick, but when coming to use Perlanet::Simple, I had several problems even getting off the starting block. In the docs for Perlanet::Simple, it listed that to use it, all you need to do is:
my $perlanet = Perlanet::Simple->new_with_config('perlanet.yaml'); $perlanet->run;
Now, of course you will need to include the module with ‘use Perlanet::Simple’, and need a file called ‘perlanet.yaml’ in the same directory, however apart from that… what? Well, turns out you need a few other things to make this not throw any errors…
use Perlanet::Simple; my $perlanet = Perlanet::Simple->new_with_config( configfile => 'perlanet.yaml' ); $perlanet->run;
(note: I have left out ‘use strict;’ and ‘use warnings;’ of all code snippets to save space, but they are implied in all snippets… plus you’d be mad not to use them anyway).
The main (and only) change is to make the call to ‘new_with_config’ a hash, by adding “configfile => ‘perlanet.yaml'” instead of just the filename (note – ‘new_with_config’ is not defined in Perlanet code, it is actually defied in MooseX::ConfigFromFile, and expects a hash to be passed to it).
Phew. well, as that now works, onto the rest!
Stage two: Configfile? What Configfile?!
The next stage to getting this fully working, is to create the config files that Perlanet will understand, which also means knowing what you want to aggregate. There are some demo files given in the examples folder on CPAN, however it’s probably easier to see whats going on when you create your own. So, to do this you need to have or know the following:
- A title for your page
- A subtitle or description of your page
- The URL for your website
- your name (you do know this, right?) and an e-mail address
- The number of entries you want to have on the page
- Where you want the generated page stored and the template to use for it
- Where you want the generated feed stored and what format to use
- An URL, title, and website for each feed you want to aggregate
You will also need a template file to use, which is in the TemplateToolkit format. I won’t go into how to create one of those, just use the one below…
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>[% feed.title %]</title> </head> <body> <h1>[% feed.title %]</h1> <p>[% feed.description %]</p> [% FOREACH entry IN feed.entries %] <h2><a href="[% entry.link | url | html %]">[% entry.title | html %]</h2></a> [% entry.content.body %] [% IF entry.author OR entry.issued %] <p>Published[% IF entry.author %] by [% entry.author | html; END %] [% IF entry.issued %] on [% entry.issued | html; END %]</p> [% END %] [% END %] <hr /> <address>[% feed.author | html %] / [% feed.modified | html %]</address> </body> </html>
Now for this example, I will make something that aggregates 3 blogs – Makezine, Arduino, and Adafruit. These are mainly because I love making things, and these are 3 very popular blogs for Makers in general, but you could use any other blogs you want! All you need is the RSS or Atom feed link.
First thing, create a file called ‘perlanet.yaml’ in the same directory as your script. then add the following lines, populating it with your own bits as required:
title: Maker Planet description: Make all the things! url: http://tbsliver.wordpress.com author: name: Tom Bloor email: not.so.stupid@toputthis.here
This info is just the basic bits for the actual page you’re creating. The next few lines are for defining your input and output files:
# previous bits go here entries: 20 page: file: www/index.html template: index.tt feed: file: www/atom.xml format: Atom
In this part, you have defined the number of entries you want in your feed (here its 20, Perlanet actually defaults to a pretty sane 30, though ofcourse you can have this as high or as low as you like), the output file you want the html in (here it will be put in a folder called ‘www’) and the template to use for this file (see Template part earlier). The last part is completely optional, but will output an Atom feed file that you can then subscribe to! (You can also make this RSS by changing the format to RSS).
After these, you then add the feeds that you want!
# previous bits go here feeds: - url: http://blog.makezine.com/feed/ - url: http://arduino.cc/blog/feed/ title: Arduino Blog web: http://arduino.cc/ - url: http://www.adafruit.com/blog/feed/ title: Adafruit Industries
Here it shows the 3 feeds i chose earlier. It also shows the extra information you can add to each, though the title and web parts are completely optional – I think those are only used if you have them in your Template. (Or if you customise Perlanet, but that I’l go into later…).
With all those bits, you can now run your Perlanet, and it should just work! (as always, mileage may vary…)
Stage three: Bug Stomping
Phew! you made it this far. Now then, if you have managed to run your script, and it didnt throw any errors, and the web page that came out is respectable (Ok it may look pretty crap if you used the template I provided, however it IS very basic anyway…), then you may not need the few things I’ve learnt while doing this. However, if you have a few bugs to work out, then here are the few things I found worked.
- Double check your perlanet.yaml file. Seriously, as you’ve created it by hand, there is most likely an error, a typo, a missed variable, something. Also check the whitespace – look at the example from CPAN (here) to see how it should be spaced.
- If the web page looks especially bad, see if it isn’t just some stray style bits from the feed you have aggregated – they sometimes have spacing things in there that will screw with the layout. Will go into more detail later how to filter that out, for now… sorry.
- Lastly, it may actually be an issue with the code I’ve posted. If nothing you try fixes the issue, or you find I have a mistake in my code, feel free to comment and I’l get back to you when I can, or fix the issue in my code. Other options, are to go to the many other Perl resources that are around, such as IRC, or the many other websites (too many to list…) that may be able to help.
Well that about sums it up! Next time, I will be going into more detail of changing the workings of Perlanet to do some more useful things. I will also probably go through some changes to the file and folder structure of this project to be a bit more sane, though for now it isn’t a massive project so probably doesn’t matter as much. enjoy!
Edits:
Spelling mistake spotted by castaway (your -> you’re).
removed the ‘use Perlanet::Trait::YAMLConfig;’ as seems to not need it… random bug I had thats disappeared since.
Fixing highlighting items
Hi,
Thanks for posting this. It’s very useful. And sorry for the problem you found in the documentation for Perlanet::Simple. The code is on Github at https://github.com/davorg/perlanet, so please consider submitting a patch for this. Or, alternatively, just submit a bug report on RT (https://rt.cpan.org/Public/Dist/Display.html?Name=Perlanet) and I’ll fix it as soon as I can.
I don’t know how many of these tutorials you are planning. But it might be nice to include them in the Perlanet distribution. Would you be interested in letting me do that?
Hi Dave,
Glad you like it! I tend to find trying to teach something will teach you more about it than just reading through it anyway. Also turns up interesting problems sometimes!
I will hopefully post a pull request on Github shortly, just need to actually figure out how… not used Github much before.
As to how many of these I plan on doing, I really have no idea, although I hope to show how Ironman on EPO uses perlanet eventually (and possibly improve on that code as well!), but I would be happy to let you include this in the documentation – after all, it wouldn’t be much use without your original code!
Pingback: Some Github and Perlanet news | Didjital Vibrations
Thanks for the article. I really enjoyed setting up a perlanet site to replace a planetplanet (python) site. I found perlanet clear and simple in configuration. One aspect I really enjoyed was using Template::Toolkit to filter the feeds from the different sites aggregated to normalise the typography and even some of the picture sizes. this is what I did:
% USE Filter.HTMLScrubber %]
[% FOREACH entry IN feed.entries %]
[% entry.title | html %]
[% entry.content.body
| html_scrubber([‘-span’,’+p’,’+h3′,’+h4′,’+ol’,’+ul’,’+li’,’+strong’,’+em’,’-style’,’-script’, ‘-iframe’])
| remove(‘style=”.*?”‘ )
| remove(‘class=”.*?”‘ )
| replace( ‘<imgs', '<img class="right" width="50%" ' )
| replace( '<p', '<p class="alt"' )
| remove('’ )
| remove(” )
| remove(” )
| remove(” )
%]
The remove sections got rid of some annoying adverts. Very handy! The Template Toolkit plugin HTML::Scrubber was very handy!
Hi, glad to hear about your experience with Perlanet! Will have to take a closer look at those HTML::Scrubber and Template Toolkit bits, may be worth making a post on using things like that. Thanks for the comment though!
Above code better read at: https://gist.github.com/kevincolyer/5452267