On 26/05/2021 22:40, clipka wrote:
> Posts as far back as 1998 are fine, as are recent posts. What I'm
> SPECIFICALLY NOT seeing anymore are any posts dating between 2018-10-07
> and 2020-04-18. And I know there were quite a lot, even a couple I had
> read just before rebuilding the index. Even a couple I had posted myself.
I can confirm I see this behavior locally now also. It's not *every*
message, but certainly a lot of them, predominantly high-traffic groups.
In terms of what's causing it, I do note that the last message in one of
the internal spool files (db_312.itm) is dated 7 Oct 2018 17:36:06. The
next message in that thread, according to the web view, is
<firstname.lastname@example.org>, dated 7 Oct 2018 23:14:11.
The next db_*.itm file, sequentially, is db_314.itm (there's no 313),
however it contains posts from 2006. A grep through all the *.itm files
shows that <email@example.com> is, in fact, missing entirely.
(The db_*.itm files aren't always sequential; some are, but sometimes it
starts writing to a different sequence. I'm not sure of the trigger for
this, but suspect it is intentional).
I did the same grep on a backup I have locally of the server prior to
the crash and DID find the article: it's in db_1.itm, and in fact is
also the very first article.
So: at some point on 7 October 2018 the news server switched from
writing to db_312.itm to db_1.itm. There's nothing in the command log
file indicating I issued any particular command (e.g. re-index) on that
day, nor do the server logs show a reboot or anything interesting, and
finally the news server read log for the day shows articles being read
before and after the cutoff time right through the evening without any
apparent major interruption or break.
Returning to this year: grepping the nntp server log revealed that on 25
March during a re-index the server decided that db_1.idx (and a bunch of
others) was 'empty lost' (whatever that means) and nuked them.
Subsequently (31 March) it created a new db_1.itm and filled that with
new articles as they came in.
So ... the server nuked a bunch of article files for a reason that is
not clear and has subsequently started re-using the db sequence numbers.
The good news is I have all the removed .itm files, other than the
below, in a backup. The exception is that on Mar 27, Mar 31 and May 6
the server also nuked, respectively, db_0.itm, db_1.itm and db_2.itm
again, and these have subsequently been re-used. As these were
post-crash I only have the current versions of them in my backup.
TL;DR a bunch of items were lost. I can try to restore them by copying
over the nuked .itm files from a backup, re-naming them where necessary,
in the hope that when I re-start the news server it will ingest them
rather than deleting them. I'll experiment with this on a test server
rather than risk breaking this one more.
Post a reply to this message