public inbox for goredo-devel@lists.cypherpunks.su
Atom feed
* Question/confusion about "size differs" for ood detection
@ 2025-01-08 20:55 Niklas Böhm
2025-01-09 12:13 ` Sergey Matveev
0 siblings, 1 reply; 2+ messages in thread
From: Niklas Böhm @ 2025-01-08 20:55 UTC (permalink / raw)
To: goredo-devel
Hello,
I ran into some weird behavior that I cannot quite make sense of. I
have a target which went ood and I redid it. But to my surprise,
targets depending on this one downstream reported that they are ood, the
reason is that the size differs. But when I go to check it manually,
the size reported in the .dep file matches with what the file system
gives me, the hash also matches (which does not matter for this concrete
use case).
I found out that calling redo-depfix will somehow fix this mismatch and
make it go away. But I don't quite understand the reason for why this
showed up in the first place.
What I thought could be a reason is that I may have interrupted the
downstream target while doing this, which I suppose could lead to an
incomplete write to the .dep file. But I somehow think that is unlikely
and would also not explain why redo-depfix can then recover from this
situation. (I also tried to reproduce this with a toy example, but was
not able to do so.)
Basically, the reason the "size differs" ood was triggered is because
the first 8 bytes of the inode struct differ, because the following
evaluates to true:
!bytes.Equal(inode[:8], ifchange.Inode()[:8])
I don't understand why this would be the case, if the .dep file reports
the correct size. The file in question is only 122MB large, so that
shouldn't trigger any issues. Likewise, being on a little endian
architecture also shouldn't matter (I saw that the bytes are written as
big endian). This whole situation left me a bit stymied.
In any case, I was wondering if there is a straightforward explanation
for this behavior. Or is there something unexpected happening?
Best
Nik
PS: thanks for the randomized sleeping. I agree that this does not
require a release by itself.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Question/confusion about "size differs" for ood detection
2025-01-08 20:55 Question/confusion about "size differs" for ood detection Niklas Böhm
@ 2025-01-09 12:13 ` Sergey Matveev
0 siblings, 0 replies; 2+ messages in thread
From: Sergey Matveev @ 2025-01-09 12:13 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 2907 bytes --]
Greetings!
*** Niklas Böhm [2025-01-08 21:55]:
>I found out that calling redo-depfix will somehow fix this mismatch and make
>it go away. But I don't quite understand the reason for why this showed up
>in the first place.
redo-depfix just goes through all .dep's entries and recalculates their
information (inode's num, *times, size, etc) from the ground. It does no
any kind of validity check (of course if .dep file is not completely
unreadable/unparseable garbage).
>What I thought could be a reason is that I may have interrupted the
>downstream target while doing this, which I suppose could lead to an
>incomplete write to the .dep file.
There should not be any possibility of of truncated/broken .dep files.
Instead of direct writing to the resulting .dep, temporary file with
different name is used. It is flushed and explicitly fsync-ed before
closing. Only after that it is renamed to the final .dep path. And
moreover it is also fully read and parsed. In theory your child process
can die exactly at the time when it writes to that temporary .dep's file
descriptor and only a partial write is committed to the file. But it
will be detected during the read/parse stage after its renaming. All
.dep's file structures have explicit length suffix, so even truncation
will be detected.
Of course there is possibility of bit-rot and some other misbehaviour,
but probability is negligible.
>Basically, the reason the "size differs" ood was triggered is because the
>first 8 bytes of the inode struct differ, because the following evaluates to
>true: !bytes.Equal(inode[:8], ifchange.Inode()[:8])
Yeah, that is just comparing of 64-bit big-endian size field from the
.dep and generated one from file's stat() call.
How did you check if .dep's entry contains the same size? You called
redo-dep2rec and checked target's Size: field?
If .dep's entry has the same size like (for example) "ls -l" will show
for that file, then I have no ideas how can that be possible.
redo-dep2rec parses the binary .dep and shows in human readable form how
its size will be interpreted by software. If it is correct, then how
Stat() call can (constantly?) lie about the file? Of course there is
probability of errors, hardware errors, OS/FS buggy misbehaviour, but it
is too hard to believe that it happened. I honestly do not know how that
can happen if .dep contains really correct data.
>In any case, I was wondering if there is a straightforward explanation for
>this behavior. Or is there something unexpected happening?
According to your description, as for me, something unexpected is
happening, if redo-dep2rec shows really the same size of the file.
I do not see where goredo can make a mistake when running up to that
place in code.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-01-09 12:14 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-08 20:55 Question/confusion about "size differs" for ood detection Niklas Böhm
2025-01-09 12:13 ` Sergey Matveev