Skip to main content

aggregator

Aggregator WebAccess

Posted in

Aggregator WebAccess provides a basic web interface for the Aggregator feed reader, so one can read news articles from a mobile device or another machine.

Aggregator WebAccess

Aggregator WebAccess

Threaded storage in Aggregator

Posted in

Storage operations in Aggregator were always single-threaded and performed in the main GUI thread. Particularly, feed updates were also performed in the main thread, including fetching already-existent items from the database, looking for modified ones, updating modified items and inserting new ones. That resulted in a terrible overall performance during feed updates — one could tell when Aggregator decides to update its feeds just by feeling how sluggish and slow LeechCraft interface suddenly becomes for a dozen of seconds.

That's in the past now. Yesterday all that feed updating machinery was moved to a separate thread, and only parsing the downloaded feeds is done in the main GUI thread now, but I doubt that it is really worth moving that into a separate thread as well — XML parsing is fast as hell and is hardly the performance hotspot.

The threaded architecture allows moving arbitrary operations into the storage thread as long it makes sense from the UI point of view. For example, marking whole channels as read or unread was also moved into that separate thread, but it hardly makes sense to fetch items of the currently selected channel to that thread — user would wait for this operation to complete anyway, and it's fast enough for user to not start doing other stuff.

Threaded storage is already tested with SQLite and PostgreSQL on Linux and generally seems to work. It should work with any PostgreSQL/MySQL installation and with most modern SQLite installations. Though, if SQLite was compiled without threading support, Aggregator would misbehave, likely resulting in segfaults. In this case, file a bug to your distro's SQLite maintainers. And, of course, the usual warning: threaded storage is still quite experimental and not thoroughly tested, so bugs may happen.

This change only applies to the master branch: it won't be included in the upcoming 0.5 release.

Aggregator

Writing recipes for BodyFetch

Introduction

This guide introduces to writing recipes: small scripts for fetching full news bodies in Aggregator BodyFetch.

Recipes are usually written in JavaScript or Python, and they contain all the required information and, possibly, algorithms for BodyFetch to get the full news stories.

Since only Qrosp plugin provides scripting support for now, and it currently supports only JavaScript and Python, only these languages are actually supported for now. And since Python support in the Qross library (which is used by Qrosp) is optional, it is recommended to write recipes in JavaScript. Because of that, we will use JavaScript during our examples.

Please note that the API is in its early stages now, and it would surely be extended and upgraded. Don't hesitate writing us your suggestions and ideas.

File locations

Custom user recipes are searched in the following directories:

  • ~/.leechcraft/data/scripts/aggregator/recipes/qtscript/ for recipes in JavaScript.
  • ~/.leechcraft/data/scripts/aggregator/recipes/python/ for recipes in Python.

All recipes can have any name, but it's recommended to use the site's domain name to simplify distinguishing them from each other.

JavaScript recipes may have .js, .qs or .es extension, while Python ones may only end with .py.

Basic API

Basic API is suitable for most cases.

The simplest recipe should have only two functions, CanHandle(link) and KeepFirstTag().

CanHandle(link)

This function is called by BodyFetch to determine whether the recipe can handle news items from the given channel, which is identified by its link parameter. This function should return true if the recipe is written for this channel, otherwise it should return false.

An example function for Habrahabr.ru website would look like:

function CanHandle(link)
{
        return link.indexOf("http://habrahabr.ru/rss/") == 0;
}

KeepFirstTag()

This function should return the list of CSS2 selectors that are used to find the elements to be considered as full news body. For each selector only first found element is considered (hence KeepFirstTag), and if there are several selectors in the returned list, the result is constructed from the elements found according to them in the same order as selectors appear in the list. This may be useful for constructing the news body out of several parts of the page.

Selectors' outer XML is used as the result.

An example function for Habrahabr.ru would look like:

function KeepFirstTag()
{
        return [ 'div[class="content"]' ];
}

Syndicate content