public inbox for goredo-devel@lists.cypherpunks.su
Atom feed
* Handling EINTR in unix.FcntlFlock
@ 2025-01-04 9:59 Niklas Böhm
2025-01-04 12:43 ` Sergey Matveev
0 siblings, 1 reply; 4+ messages in thread
From: Niklas Böhm @ 2025-01-04 9:59 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 1932 bytes --]
Greetings everyone,
I was using goredo on an NFS and noticed that I sometimes ran into
issues where my program would fail with the following error:
run.go:234: interrupted system call /gpfs01/.../folders/.redo/1.zip.lock
After doing some digging, it seems like the problem is that calling
unix.FcntlFlock with F_SETLKW can be too slow over an NFS and will get
interrupted (see `man 2 flock`, Section on errors [1]). Apparently
there is an automatic restart mechanism [2], but it's also unreliable,
so I thought it's better to handle it explicitly and basically extend
the error check from
if errors.Is(err, unix.EDEADLK) {
to
if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {
This seems to resolve the interrupted system call error above.
Unfortunately I cannot realiably reproduce this error, but since the fix
is reasonaly easy, I was hoping that it could be incorporated into
goredo proper.
I have attached the small diff, and here it is also reproduced, in case
the explicit line is not clear:
diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced,
traced bool) error {
tracef(CLock, "LOCK_EX: %s", fdLock.Name())
LockAgain:
if err = unix.FcntlFlock(fdLock.Fd(),
unix.F_SETLKW, &flock); err != nil {
- if errors.Is(err, unix.EDEADLK) {
+ if errors.Is(err, unix.EDEADLK) ||
errors.Is(err, unix.EINTR) {
time.Sleep(10 * time.Millisecond)
goto LockAgain
}
Cheers and happy belated new year
Nik
[1]: https://www.man7.org/linux/man-pages/man2/fcntl.2.html#ERRORS
[2]:
https://unix.stackexchange.com/questions/509375/what-is-interrupted-system-call
[-- Attachment #2: goredo-eintr.diff --]
[-- Type: text/x-patch, Size: 493 bytes --]
diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced, traced bool) error {
tracef(CLock, "LOCK_EX: %s", fdLock.Name())
LockAgain:
if err = unix.FcntlFlock(fdLock.Fd(), unix.F_SETLKW, &flock); err != nil {
- if errors.Is(err, unix.EDEADLK) {
+ if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {
time.Sleep(10 * time.Millisecond)
goto LockAgain
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Handling EINTR in unix.FcntlFlock
2025-01-04 9:59 Handling EINTR in unix.FcntlFlock Niklas Böhm
@ 2025-01-04 12:43 ` Sergey Matveev
2025-01-04 14:04 ` Niklas Böhm
0 siblings, 1 reply; 4+ messages in thread
From: Sergey Matveev @ 2025-01-04 12:43 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 282 bytes --]
Greetings, Niklas!
*** Niklas Böhm [2025-01-04 10:59]:
>[...] I have attached the small diff [...]
Thanks for finding the issue and fixing it! Released in 2.6.4.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Handling EINTR in unix.FcntlFlock
2025-01-04 12:43 ` Sergey Matveev
@ 2025-01-04 14:04 ` Niklas Böhm
2025-01-07 11:05 ` Sergey Matveev
0 siblings, 1 reply; 4+ messages in thread
From: Niklas Böhm @ 2025-01-04 14:04 UTC (permalink / raw)
To: goredo-devel
Thank you for the quick response and release. I have updated the
package on AUR.
Now that I was looking at the implementation, would it maybe make sense
to change the sleep time from 10ms to something a bit more randomized?
So that two processes don't repeatedly run into the same deadlock error?
(And I think it also would not hurt for an NFS to have the file calls
de-synchronized.)
Could take a uniform distribution between 0 and 20ms so that on average
they will sleep for 10ms still.
On 1/4/25 1:43 PM, Sergey Matveev wrote:
> Greetings, Niklas!
>
> *** Niklas Böhm [2025-01-04 10:59]:
>> [...] I have attached the small diff [...]
>
> Thanks for finding the issue and fixing it! Released in 2.6.4.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Handling EINTR in unix.FcntlFlock
2025-01-04 14:04 ` Niklas Böhm
@ 2025-01-07 11:05 ` Sergey Matveev
0 siblings, 0 replies; 4+ messages in thread
From: Sergey Matveev @ 2025-01-07 11:05 UTC (permalink / raw)
To: goredo-devel
[-- Attachment #1: Type: text/plain, Size: 626 bytes --]
*** Niklas Böhm [2025-01-04 15:04]:
>Now that I was looking at the implementation, would it maybe make sense to
>change the sleep time from 10ms to something a bit more randomized?
I am not sure that goroutine scheduler, OS'es schedulers delays are not
jitter-ed enough, but adding randomised sleep time is easy thing, so why not.
Added in d0d5eb2605461569e985f9d502162b0f54392873 commit master branch.
Not sure if it is so important to make a release for that change, so it
will appear in the next one.
--
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967 FD75 CB82 0563 2107 AD8A
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-01-07 11:06 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-04 9:59 Handling EINTR in unix.FcntlFlock Niklas Böhm
2025-01-04 12:43 ` Sergey Matveev
2025-01-04 14:04 ` Niklas Böhm
2025-01-07 11:05 ` Sergey Matveev