public inbox for goredo-devel@lists.cypherpunks.su
Atom feed
* Handling EINTR in unix.FcntlFlock
@ 2025-01-04  9:59 Niklas Böhm
  2025-01-04 12:43 ` Sergey Matveev
  0 siblings, 1 reply; 4+ messages in thread
From: Niklas Böhm @ 2025-01-04  9:59 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 1932 bytes --]

Greetings everyone,

I was using goredo on an NFS and noticed that I sometimes ran into 
issues where my program would fail with the following error:

	run.go:234: interrupted system call /gpfs01/.../folders/.redo/1.zip.lock

After doing some digging, it seems like the problem is that calling 
unix.FcntlFlock with F_SETLKW can be too slow over an NFS and will get 
interrupted (see `man 2 flock`, Section on errors [1]).  Apparently 
there is an automatic restart mechanism [2], but it's also unreliable, 
so I thought it's better to handle it explicitly and basically extend 
the error check from

	if errors.Is(err, unix.EDEADLK) {

to

	if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {

This seems to resolve the interrupted system call error above. 
Unfortunately I cannot realiably reproduce this error, but since the fix 
is reasonaly easy, I was hoping that it could be incorporated into 
goredo proper.

I have attached the small diff, and here it is also reproduced, in case 
the explicit line is not clear:

diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced, 
traced bool) error {
                         tracef(CLock, "LOCK_EX: %s", fdLock.Name())
                 LockAgain:
                         if err = unix.FcntlFlock(fdLock.Fd(), 
unix.F_SETLKW, &flock); err != nil {
-                               if errors.Is(err, unix.EDEADLK) {
+                               if errors.Is(err, unix.EDEADLK) || 
errors.Is(err, unix.EINTR) {
                                         time.Sleep(10 * time.Millisecond)
                                         goto LockAgain
                                 }


Cheers and happy belated new year
Nik

[1]: https://www.man7.org/linux/man-pages/man2/fcntl.2.html#ERRORS
[2]: 
https://unix.stackexchange.com/questions/509375/what-is-interrupted-system-call

[-- Attachment #2: goredo-eintr.diff --]
[-- Type: text/x-patch, Size: 493 bytes --]

diff --git a/run.go b/run.go
index 506fd35..5423b49 100644
--- a/run.go
+++ b/run.go
@@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced, traced bool) error {
 			tracef(CLock, "LOCK_EX: %s", fdLock.Name())
 		LockAgain:
 			if err = unix.FcntlFlock(fdLock.Fd(), unix.F_SETLKW, &flock); err != nil {
-				if errors.Is(err, unix.EDEADLK) {
+				if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) {
 					time.Sleep(10 * time.Millisecond)
 					goto LockAgain
 				}

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Handling EINTR in unix.FcntlFlock
  2025-01-04  9:59 Handling EINTR in unix.FcntlFlock Niklas Böhm
@ 2025-01-04 12:43 ` Sergey Matveev
  2025-01-04 14:04   ` Niklas Böhm
  0 siblings, 1 reply; 4+ messages in thread
From: Sergey Matveev @ 2025-01-04 12:43 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 282 bytes --]

Greetings, Niklas!

*** Niklas Böhm [2025-01-04 10:59]:
>[...] I have attached the small diff [...]

Thanks for finding the issue and fixing it! Released in 2.6.4.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Handling EINTR in unix.FcntlFlock
  2025-01-04 12:43 ` Sergey Matveev
@ 2025-01-04 14:04   ` Niklas Böhm
  2025-01-07 11:05     ` Sergey Matveev
  0 siblings, 1 reply; 4+ messages in thread
From: Niklas Böhm @ 2025-01-04 14:04 UTC (permalink / raw)
  To: goredo-devel

Thank you for the quick response and release.  I have updated the 
package on AUR.

Now that I was looking at the implementation, would it maybe make sense 
to change the sleep time from 10ms to something a bit more randomized? 
So that two processes don't repeatedly run into the same deadlock error? 
  (And I think it also would not hurt for an NFS to have the file calls 
de-synchronized.)

Could take a uniform distribution between 0 and 20ms so that on average 
they will sleep for 10ms still.

On 1/4/25 1:43 PM, Sergey Matveev wrote:
> Greetings, Niklas!
> 
> *** Niklas Böhm [2025-01-04 10:59]:
>> [...] I have attached the small diff [...]
> 
> Thanks for finding the issue and fixing it! Released in 2.6.4.
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Handling EINTR in unix.FcntlFlock
  2025-01-04 14:04   ` Niklas Böhm
@ 2025-01-07 11:05     ` Sergey Matveev
  0 siblings, 0 replies; 4+ messages in thread
From: Sergey Matveev @ 2025-01-07 11:05 UTC (permalink / raw)
  To: goredo-devel

[-- Attachment #1: Type: text/plain, Size: 626 bytes --]

*** Niklas Böhm [2025-01-04 15:04]:
>Now that I was looking at the implementation, would it maybe make sense to
>change the sleep time from 10ms to something a bit more randomized?

I am not sure that goroutine scheduler, OS'es schedulers delays are not
jitter-ed enough, but adding randomised sleep time is easy thing, so why not.
Added in d0d5eb2605461569e985f9d502162b0f54392873 commit master branch.
Not sure if it is so important to make a release for that change, so it
will appear in the next one.

-- 
Sergey Matveev (http://www.stargrave.org/)
OpenPGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-07 11:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-04  9:59 Handling EINTR in unix.FcntlFlock Niklas Böhm
2025-01-04 12:43 ` Sergey Matveev
2025-01-04 14:04   ` Niklas Böhm
2025-01-07 11:05     ` Sergey Matveev