Retrieving rss feed list from old hard drive quiterss

5/3/2023

As long as the website is fairly simple and focusses on the content, this still works well in most cases, but can add a little bit of noise with maybe a header, a footer and a comment section. When the feed only contains a summary, we have to fetch the actual content from the website. Many blogs do do this, and it makes it really simple for the client to produce something which is 100% useful content with zero effort (make sure to set download_full_article=false for feeds like this, so it doesn’t try to download the article instead). The best solution is for the publisher to include the full article text (and nothing else) in the rss or atom feed. For content which works well on an e-reader, (text and pictures) it works really well: I get to read the articles away from all the distractions that you have on a computer or smart phone, on a screen that’s comfortable to look at, with my favourite font (alegreya :) ), and all the style tweaks that KOReader provides (once downloaded, articles are written as epubs, so they are treated just like any other book).

Over time, I’ve probably ended up whittling the feeds I read down to a subset that works well with KOReader (I do use other feed readers on other platforms). However, at least for the feeds I’m reading, it tends to appear at the end, so I when I get to the end of the article text, I just stop. I have a couple of feeds where it has to download the full article (because the feed only has a summary), and it ends up dragging in a load of extra stuff (comment sections, footers, etc.). It works better with some feeds than others, so it really depends on individual feeds. It makes me happy to use every day, so it’s been a relative success. Currently only tries sitename.tld/favion.ico and then gives up, needs some more work. Some sites expect you to follow article links in the feed, download said HTML page and find the favicon url from that.

Oh god favicons are hard if you don’t have a proper XML/HTML parser. Originally I thought this was something I would want to fix, now I’ve come to like it. This means that if a feed has 4 new articles since last feed update they will be displayed as a block of 4 articles on the frontpage, not split up. Does not fully date sort incoming articles, instead they’re added to the top of the frontpage just how they’re discovered.Mixture of C and shell, no deps other than libc & standard linux shell tools.It’s taken a lot of tweaking to support all of my own feeds and now it seems stable with anything I throw at it (suggestions welcome). Instead scrounges for just the small pieces of information needed using a variety of techniques. I used to be the other way around in preference, this isn’t everyone’s cup of tea. Can display the raw article (see ‘c’ link on each row), but I generally prefer to see the source site these days (esp since many only provide a one-line summary in their content tag).ie uses filesystem as database, not a relational database. Contains a “feed splitter”, written in C, that reads an atom or rss feed and extracts all of the useful bits (title, link, author, articles & their parts) into plaintext files and folders.Feeds must be fetched in parallel and processed in parallel. I like my feed reading to be “do it now”, which also means I want it to be fast. I’m not a fan of daemon feed services that constantly fetch feeds.

0 Comments

Retrieving rss feed list from old hard drive quiterss

Leave a Reply.

Author

Archives

Categories