public inbox for goredo-devel@lists.cypherpunks.su
Atom feed
* redo-stamp
@ 2025-03-31 17:58 Christian G. Warden
2025-04-02 7:45 ` redo-stamp Sergey Matveev
0 siblings, 1 reply; 5+ messages in thread
From: Christian G. Warden @ 2025-03-31 17:58 UTC (permalink / raw)
To: goredo-devel
I'm exploring switching from apenwarr's redo to goredo, and found
that one of my common conventions doesn't work.
I frequently use redo for data pipelines. This typically involves
retrieving some data from a remote resource. Fetching lots of data
can be time consuming. Working with data that's an hour or a day old
if often fine.
So the convention I follow in data.csv.do looks like this:
redo-ifchange date
... fetch data > $3
And date.do looks like this:
redo-always
date +%Y%m%d | redo-stamp
When I run `redo data.csv`, data is only fetched if I haven't already
fetched data today.
If I'm doing the same analysis with data from multiple remote sources,
I'll similarly have data.csv.do include `redo-ifchange user`, where
user.do looks like:
redo-always
force active | redo-stamp
So if I change my active user, data.csv will be out of date.
I can of course generate `date` and `user` files rather than use
redo-stamp, but is there any reason to intentionally not support this
functionality? Any other suggestions?
Thanks,
Christian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: redo-stamp
2025-03-31 17:58 redo-stamp Christian G. Warden
@ 2025-04-02 7:45 ` Sergey Matveev
2025-04-02 10:54 ` redo-stamp spacefrogg
0 siblings, 1 reply; 5+ messages in thread
From: Sergey Matveev @ 2025-04-02 7:45 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]
Greetings!
*** Christian G. Warden [2025-03-31 12:58]:
>I can of course generate `date` and `user` files rather than use
>redo-stamp, but is there any reason to intentionally not support this
>functionality? Any other suggestions?
Your case with the date.do can (and should) be easily made solely by
honestly generating the "date" file:
data.csv.do:
redo-ifchange date
... fetch data > $3
date.do:
redo-always
date +%Y%m%d
That works in goredo as expected. redo-ifchange means "redo that given
target (data.csv) if date-file is changed". "date | redo-stamp" does not
lead to changing the contents of "date" file, as its stdout was empty.
As far as I remember, redo-stamp is just a hack to skip checksum/hash
computation of the file to determine if it is really changed.
apenwarr/redo thinks that "date" is changed if its inode's
metainformation (some of its fields) is altered, without checking if its
contents are still the same (unless redo-stamp was used of course).
Unlike apenwarr/redo, goredo always checks its contents (by comparing
cryptographic hash) if inode if altered.
You can treat default goredo's behaviour as "always feeding target's
output to redo-stamp". apenwarr/redo redo-stamp's it only if explicitly
asked for. I assume that:
date.do:
redo-always
date +%Y%m%d >$3
redo-stamp <$3
will work the same expected way both in apenwarr/redo and in goredo.
I am convinced that redo-stamp was just a hack to skip relatively
expensive SHA1 computation. And that hack should not exist at all.
It adds unnecessary complications to the out-of-date decision code.
redo-stamp command was left in goredo only to be able to write targets
that have to have redo-stamp-hack and be run under apenwarr/redo too.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: redo-stamp
2025-04-02 7:45 ` redo-stamp Sergey Matveev
@ 2025-04-02 10:54 ` spacefrogg
2025-04-03 22:45 ` redo-stamp Andrew Chambers
0 siblings, 1 reply; 5+ messages in thread
From: spacefrogg @ 2025-04-02 10:54 UTC (permalink / raw)
To: goredo-devel
I agree with Sergey's assessment.
> It adds unnecessary complications to the out-of-date decision code.
> redo-stamp command was left in goredo only to be able to write targets
> that have to have redo-stamp-hack and be run under apenwarr/redo too.
I want to add one thing, though. `redo-stamp` does add one small
convenience.
It allows you to create the hash over *some* data that describes the
identity of your target,
while using the original output for further computation. This makes
working with noisy data
more convenient (e.g. time stamped). But I grant you that this is a
small convenience
with the potential of confusing tool chain users down the line.
–Michael
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: redo-stamp
2025-04-02 10:54 ` redo-stamp spacefrogg
@ 2025-04-03 22:45 ` Andrew Chambers
0 siblings, 0 replies; 5+ messages in thread
From: Andrew Chambers @ 2025-04-03 22:45 UTC (permalink / raw)
To: goredo-devel
For a while I had wondered about an extension to goredo such as 'redo-impure' that disables stamps and reverts back to timestamps.
--
Andrew Chambers
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: redo-stamp
2021-11-09 13:43 ` goredo
@ 2021-11-10 12:22 ` Sergey Matveev
0 siblings, 0 replies; 5+ messages in thread
From: Sergey Matveev @ 2021-11-10 12:22 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 6315 bytes --]
*** goredo [2021-11-09 13:43]:
>I also was wondering what redo-stamp currently does, exactly. apenwarr/redo uses it to achieve the behaviour that the output of a target can be independent of it's hash. They use it like the following:
Initially goredo tried to fully resemble behaviour of apenwarr/redo and
redo-stamp had (should had) completely the same behaviour. But soon I
came to the confidence that redo-stamp is just useless and completely
unnecessary thing and complication.
The main difference between apenwarr's and my view on redo is that I am
confident that it is ok to always (cryptographically) checksum target.
https://redo.readthedocs.io/en/latest/FAQImpl/#why-not-always-use-checksum-based-dependencies-instead-of-timestamps
http://www.goredo.cypherpunks.ru/FAQ.html
In my practice, there were huge quantity of .do-s ending with something
like "command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3". I
realized (and I assume that applies to most redo users using it for
software building) that redo-stamping is the thing that is nearly always
wished for. apenwarr/redo's documentation states somewhere that mainly
always-checksumming is useful to make less false-positive OOD decisions.
That is true. But I am confident that hashing can be considered pretty
cheap operation. Even if it is sometimes slowing something down, it
greatly simplified .do-files and overall redo implementation.
apenwarr/redo basically has to ways of determining if the target is changed:
* either it has different mtime+size+whatever metainformation
* or it used redo-stamp and has different hash
goredo, as redo-c, has single way:
* it has different hash
* and just as an optimization, that check can be skipped, if ctime is
the same (goredo's REDO_INODE_NO_TRUST=1 can forcefully distrust
everything related to inode's metainformation and hash checking will
be done anyway -- most trustworthy OOD)
* and as another optimization, target is OOD if its size differs
1. Can we trust mtime+other metainformation guaranteed changing if
underlying file was definitely changed? According to
https://apenwarr.ca/log/20181113 it is good enough in practice, but
can be broken on some FUSEd filesystems. So if we want to have strong
confidence of guaranteed OOD determination, then we should check the
hash -- it will by definitely different is something is changed
(let's forget about possible hash collisions of long enough strong
cryptographic hash -- its probability is negligible)
2. Or we can use more "reliable" ctime check (again, that can also fail
on strange/broken FUSE filesystems/drivers for example).
apenwarr/redo does not use ctime, because it could create too many
false positives (like changing the number of hard links). But ctime
can also be broken/untrusted, so cryptographic hashing again will
save us here
As I saw, as I understand, redo-stamp is used mainly with redo-always
targets. Because redo-always will anyway change inode enough to satisfy
OOD decision, people use redo-stamp to skip false-positive OOD decision
and resource-wasting rebuilding. redo-c/goredo's OOD determination based
on inodes/hashes is very simple from implementation point of view.
redo-always+redo-stamp hugely complicates overall logic and code. I look
at redo-stamp as some kind of a hack to prevent redo-always targets to
OOD everything they touches (that redo-always is intended to do by
definition).
And I came to conclusion that redo-always itself is just an ugly idea.
Not the redo-always itself, but huge complications aimed to skip
rebuilding of everything all the time, because OOD definitely should say
"it is OOD, because it depends on always-target, that is always OOD by
definition". redo-always just should be used. At least as a way many
people (I saw and I assume) uses: to create some kind of target:
redo-always
env | sort
command -v redo-stamp > /dev/null || exit 0 ; redo-stamp <$3
# command check is for compatibility with implementations without redo-stamp
I used to do that all the time. But I tired of that stamps (for
preventing rebuilding of literally everything, because everything
depends on environment variables, for example) and of all of that
complications introduced with redo-always. For me, that is just harmful
idea (redo-always). All of that I tried to note in
http://www.goredo.cypherpunks.ru/FAQ.html
Another issue with hashes/stamps is that you do not always want to
checksum the target's value itself. If someone decides that hash of
unexistent target equals to empty string, and if redo implementation
creates resulting file even if nothing was sent to stdout, then of
course there is not way make that target always OOD (possibly that was
the reason people invented redo-always?). But with goredo (and redo-c,
as I remember) there is not problems: if nothing was sent to stdout,
then no output file is created -- unexistent file is always OOD. But if
you wish to explicitly create an empty file, then you can just always
touch "$3". Constant hashing won't harm you here anyhow.
If you really really wish to check only for some metainformation (only
check for mtime), then nothing prevents you to create some intermediate
target that contains output of (stat -f %m $1) and depend not on the
(probably) huge file, but on that intermediate metainformation file
having only the necessary data you wish to check.
>redo-ifchange $input_files
>cmd $input_files >$3
>for f in $input_files; do
> redo-stamp <$f
>done
I do not understand where is the catch :-). redo-ifchange "$input_files"
clearly explicitly states: rebuild that target (do cmd $input_files) and
everyone who depends on it, if any of $input_files are changed. If
$input_files are not changed, then that target won't be OOD, won't be
rebuild and noone who depends on it won't be rebuild too (if it is the
only dependency of course). In you example redo-stamps literally tells:
this target is OOD if hash of all $input_files data is changed.
redo-ifchange $input_files (with implicit hashing) tells exactly that
too. Is not it?
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-04-03 23:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-31 17:58 redo-stamp Christian G. Warden
2025-04-02 7:45 ` redo-stamp Sergey Matveev
2025-04-02 10:54 ` redo-stamp spacefrogg
2025-04-03 22:45 ` redo-stamp Andrew Chambers
-- strict thread matches above, loose matches on Subject: below --
2021-10-27 17:18 Multiple calls to redo-* for same target results in multiple .rec entries goredo
2021-10-31 8:21 ` Sergey Matveev
2021-11-04 15:35 ` goredo
2021-11-09 9:13 ` Sergey Matveev
2021-11-09 13:43 ` goredo
2021-11-10 12:22 ` redo-stamp Sergey Matveev