 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 2021-05-26 8:40 AM (-4), clipka wrote:
>
> Posts as far back as 1998 are fine, as are recent posts. What I'm
> SPECIFICALLY NOT seeing anymore are any posts dating between 2018-10-07
> and 2020-04-18. And I know there were quite a lot, even a couple I had
> read just before rebuilding the index. Even a couple I had posted myself.
>
> The same happens in each and every other group: Any posts from 2019 that
> are still in the index over here, but that I hadn't read yet, all just
> prompt a message saying they no longer exist. And if I rebuild the index
> of any group, all posts from 2019 - including ones I had read just
> minutes before - are just gone, without any trace whatsoever.
>
> The web interface still seems to have them, for some reason. But the
> news server apparently doesn't.
I have the same problem.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Op 26-5-2021 om 16:12 schreef Cousin Ricky:
> I have the same problem.
>
As do I also.
--
Thomas
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 26/05/2021 22:40, clipka wrote:
> The web interface still seems to have them, for some reason. But the
> news server apparently doesn't.
Sounds like the server has a problem with re-indexing. I know it did
have an issue with XOVER for more recent dates and hence I had to turn
that off.
If it turns out 2018-10-07 refers to some 'special' number in 32-bit
seconds since 1970 (or close to one) then it's likely an internal issue.
If it doesn't then I'll have to dig further.
If it turns out I can't fix it I'll have to change to another NNTP
server (and that assumes I can find a way to get the missing messages
imported).
The message spool files are basically large text blobs so are readble
and I know I've had to do that once before maybe 20 years ago. But
that's a last resort as the spam filtering the server uses is a script
custom-written for the current server. If e.g. INN supports spam filters
I expect there will be a way to get it ported.
-- Chris
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Am 31.05.2021 um 05:14 schrieb Chris Cason:
> If it turns out 2018-10-07 refers to some 'special' number in 32-bit
> seconds since 1970 (or close to one) then it's likely an internal issue.
> If it doesn't then I'll have to dig further.
2018-10-07 00:00 UTC would be 1539734400 seconds since beginning of the
Unix epoch (i.e. 1970).
The magic rollover for 32-bit signed values (or 31-bit unsigned) will be
in early 2038.
The magic rollover for 1 bit less would have been in early 2004.
So no, that doesn't quite fit the symptoms.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 26/05/2021 22:40, clipka wrote:
> Posts as far back as 1998 are fine, as are recent posts. What I'm
> SPECIFICALLY NOT seeing anymore are any posts dating between 2018-10-07
> and 2020-04-18. And I know there were quite a lot, even a couple I had
> read just before rebuilding the index. Even a couple I had posted myself.
I can confirm I see this behavior locally now also. It's not *every*
message, but certainly a lot of them, predominantly high-traffic groups.
In terms of what's causing it, I do note that the last message in one of
the internal spool files (db_312.itm) is dated 7 Oct 2018 17:36:06. The
next message in that thread, according to the web view, is
<5bbacb83$1@news.povray.org>, dated 7 Oct 2018 23:14:11.
The next db_*.itm file, sequentially, is db_314.itm (there's no 313),
however it contains posts from 2006. A grep through all the *.itm files
shows that <5bbacb83$1@news.povray.org> is, in fact, missing entirely.
(The db_*.itm files aren't always sequential; some are, but sometimes it
starts writing to a different sequence. I'm not sure of the trigger for
this, but suspect it is intentional).
I did the same grep on a backup I have locally of the server prior to
the crash and DID find the article: it's in db_1.itm, and in fact is
also the very first article.
So: at some point on 7 October 2018 the news server switched from
writing to db_312.itm to db_1.itm. There's nothing in the command log
file indicating I issued any particular command (e.g. re-index) on that
day, nor do the server logs show a reboot or anything interesting, and
finally the news server read log for the day shows articles being read
before and after the cutoff time right through the evening without any
apparent major interruption or break.
Returning to this year: grepping the nntp server log revealed that on 25
March during a re-index the server decided that db_1.idx (and a bunch of
others) was 'empty lost' (whatever that means) and nuked them.
Subsequently (31 March) it created a new db_1.itm and filled that with
new articles as they came in.
So ... the server nuked a bunch of article files for a reason that is
not clear and has subsequently started re-using the db sequence numbers.
The good news is I have all the removed .itm files, other than the
below, in a backup. The exception is that on Mar 27, Mar 31 and May 6
the server also nuked, respectively, db_0.itm, db_1.itm and db_2.itm
again, and these have subsequently been re-used. As these were
post-crash I only have the current versions of them in my backup.
TL;DR a bunch of items were lost. I can try to restore them by copying
over the nuked .itm files from a backup, re-naming them where necessary,
in the hope that when I re-start the news server it will ingest them
rather than deleting them. I'll experiment with this on a test server
rather than risk breaking this one more.
-- Chris
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Following on from my previous post about the cause of the missing articles:
In the long term given I really don't have any insights into how our
current news server does stuff internally (it's not open-source) and the
fact it's old and unsupported I think I have no choice but to migrate to
a different server (probably INN).
I should be able to code a means of converting articles between the
respective spool formats to avoid needing to have the new server take a
feed from the current one (this would add an unnecessary new component
to the path of each article, amongst other possible tweaks).
For any articles still missing after I process the current spool and the
previously nuked but saved db_*.itm files, I should be able to fetch
them from the webview database. Once done I just need to port the spam
filter and I should then be able to bring the new NNTP server up.
I will, if I can find a way to do it, attempt to keep the article
numbers within the groups identical. If that's not possible, though, a
full re-fetch of the groups will be necessary for NNTP users.
The priority I place on doing this will depend on whether or not I can
get the missing articles back into the current server without it
throwing a hissy fit and deleting them again.
-- Chris
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Thank you very much for trying to fix this Chris !
--
Tor Olav
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 15/04/2021 16:30, ingo wrote:
> Over the years I've lost my newsgroup archive several times and currently
> don't maintain it. Could it be made available as a (always up to date)
> download in some way?
If I end up migrating to another news server I will need to collate all
the messages I can find (including getting any missing ones from the
webview DB).
If I do that I could provide the collated messages in some form of
download. I may be able to keep that updated by appending to it new
messages as the come in, provided there's enough interest in it.
One complication of a simple append scheme, though, is that
(intentionally) deleted messages won't get removed, and I think that any
archive of this sort should have deleted messages taken out of it.
It may be better to just dump the stored messages from the DB as they
will always reflect only undeleted messages. I'll decide when I look at
the migration.
BTW it's occurred to me that one useful outcome of having a downloadable
archive of this sort is that we could auto-submit it to archive.org on a
regular basis.
-- Chris
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |