+FIXME: either fix or remove the sort/filter hooks, and document what
+their replacements are.
+
+- rawdog 2.12
+
+Add the "splitstate" option, which makes rawdog use a separate state
+file for each feed rather than one large one. This significantly reduces
+rawdog's memory usage at the cost of some more disk IO during --write.
+The old behaviour is still the default, but I would recommend turning
+splitstate on if you read a lot of feeds or if you're on a machine with
+limited memory.
+
+Add the "useids" option, which makes rawdog respect article GUIDs when
+updating feeds; if an article's GUID matches one we already know about,
+we just update the existing article's contents rather than treating it
+as a new article (like most aggregators do). This is turned on in the
+default configuration, since the behaviour it produces is generally more
+useful these days -- many feeds include random advertisements, or other
+dynamic content, and so the old approach resulted in lots of duplicated
+articles.
+
- rawdog 2.11
Avoid a crash when a feed's URL is changed and expiry is done on the
+++ /dev/null
-The objective here is to significantly reduce rawdog's memory usage in favour
-of IO. (Although the IO usage may actually go down, since we don't have to
-rewrite feed states that didn't change.)
-
-The plan is to enable split state while keeping regular behaviour around as the
-default (for now, to be removed in rawdog 3).
-
--- Stage 1: making update memory usage O(biggest #articles) --
-
-Feed stays as is -- i.e. persisted as part of Rawdog, containing the feed info,
-and so forth. (These may change in rawdog 3 -- there's a tradeoff, because if
-we store the update time/eTag/... in the feed state then we have to rewrite it
-every time we update, rather than just if the content's changed. Actually, we
-don't want to do this, since we don't want to read the FeedState at all if it
-doesn't need updating.)
-
-There's a new FeedState class, persisted into STATEDIR/feeds/12345678.state
-(where 12345678 is the feed URL hash as currently used).
-(FIXME: when changing feed URL, we need to rename the statefile too.)
-
-Feed.update() takes an article-dict argument, which might be the existing
-Rawdog.articles hash or might be from a FeedState, just containing that feed's
-articles. (It doesn't care either way.)
-
-When doing updates, if we're in split-state mode, it loads and saves the
-FeedState around each article.
-
-(FIXME: optimisation: only mark a FeedState as modified if it was actually
-modified, not if it was updated but nothing changed.)
-
--- Stage 2: making write memory usage O(#articles on page) --
-
-Article gets a new method to return the date that should be used for sorting
-(i.e. this logic gets moved out of the write code).
-
-Get the list of articles eligable for output -- as (sort-date, feed-hash,
-sequence-number, article-hash) tuples (for ease of sorting). Then fetch the
-articles for each feed.
-(FIXME: the implementation of this is rather messy; it should be done, perhaps,
-at the Feed level, then it would be sufficiently abstract to let us do this
-over a database at some point in the future...)
-
-Rawdog.write() then collects the list of articles from all the feeds, sorts it,
-and retrieves only the appropriate set of articles from each feed state before
-writing them.
-(FIXME: optimisation: have a dict available at update and write time into which
-the current article lists get stashed as the update progresses, to avoid
-opening the state file three times when we update a feed.)
-(FIXME: the sort hook will need to be changed -- use a different hook when in
-split-state mode.)
-
--- Stage 3: making fetch memory usage O(biggest #articles * #threads) --
-
-Give the fetcher threads a "shared channel" to the main thread that's doing the
-updates, so that updates and fetches can proceed in parallel, and the only
-buffers used are by active threads.
-
Make maxarticles work as a per-feed option.
-An idea for reducing rawdog's memory usage:
-- have a separate state file for each feed
-- have the update process for each feed return a list of articles to include in
- the output as (hash, time) pairs
-- the update process probably doesn't even need to read all the articles if
- it's got guids (or something equivalent) available
-- the write process then only needs to pull the articles that should be
- displayed from the database, rather than all of them
-
Plugin hook to allow the articles list to be sorted again after filtering -- so
you can filter out duplicates then sort by originally-published date.
-Option to do duplicate removal by more sensible article hashing: use a
-namespace for hashes where it could be hash:existing-hash or
-uid:uid-from-article (detecting articles that are already present).
-
Duplicate removal by article title.
gzip the state file.
Daemon mode -- keep a pidfile, and check the mtime of the state file to avoid
having to reread it.
-Option to quit if flocked
Option to limit update runtime
Fix rawdog -a https://www.fsf.org/blogs/rms/