public inbox for nncp-devel@lists.cypherpunks.ru Atom feed
* Avoiding double writes @ 2021-11-01 17:47 John Goerzen 2021-11-01 19:46 ` Sergey Matveev 0 siblings, 1 reply; 7+ messages in thread From: John Goerzen @ 2021-11-01 17:47 UTC (permalink / raw) To: nncp-devel Hi, Awhile back, we were discussing the temporary files that were needed for reading stdin for nncp-exec or nncp-file. I believe the reason for this is that the header contains a signature of the data that follows, and it's not practical to seek back and write that later. That raises a question... since the signature can't be verified without reading the entirety of the data anyhow, why not put the signature after the data instead of before it? To do that, there needs to be some way of recognizing the end of the data. I'm not sure how that happens now, since we also don't know the size in advance in those cases. Is there some sort of blocking for data chunks? If time permits, I may see about adding this feature if there's a design path that would be suitable. - John ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Avoiding double writes 2021-11-01 17:47 Avoiding double writes John Goerzen @ 2021-11-01 19:46 ` Sergey Matveev 2021-11-02 0:11 ` John Goerzen 0 siblings, 1 reply; 7+ messages in thread From: Sergey Matveev @ 2021-11-01 19:46 UTC (permalink / raw) To: nncp-devel [-- Attachment #1: Type: text/plain, Size: 4074 bytes --] Greetings! *** John Goerzen [2021-11-01 12:47]: >Awhile back, we were discussing the temporary files that were needed for >reading stdin for nncp-exec or nncp-file. I believe the reason for this is >that the header contains a signature of the data that follows, and it's not >practical to seek back and write that later. http://www.nncpgo.org/Encrypted.html It is not because of signature, but mainly because of SIZE field. Signed header contains everything you need to authenticate remote side and create encryption keys to process ENCRYPTED data. There is not signature over the whole packet. ENCRYPTED data contains encrypted size as a first short block, and then pile of 128KiB blocks. Each block is AEAD-encrypted, so it is authenticated. SIZE holds the payload size, that can be smaller than the whole packet, because of added junk to hide the actual payload size. What can we do with the size, that anyway has to be known somehow. Probably add its size as the very last block. But that way we have to seek to the end (to read/decrypt it), then seek back to decrypt and copy the actual payload. We can not do it in advance, because we do not know where the actual payload stops and junk begins. Not an option. We can store some structure inside each block (single signalling byte), telling is it the payload one, or junk one. Now we can process it sequentially. But junk in that case has to be some real data we will really encrypt and authenticate. Junk can be zeroes, but AEAD-encryption is much more expensive than current junk-generator made as BLAKE3-XOF output. Much complication, more expensive (CPU). Ok, we know that if junk block is started, then we can be sure that everything after will be the junk, so we have to authenticate only the single block with the junk and then quickly generate it without real AEAD-processing. Except for the very last block with the size. This is the option. That adds additional metadata to each block and moves encrypted SIZE to the end. However we can not sequentially read the packet and determine its size immediately: either we do seek, or decrypt the data parsing it. So my main uncertainty is: is it worth of it? Previously another stop-issue was the fact, that the whole generated encrypted packet was hashed immediately, so we could not, for example, leave fixed-sized SIZE block and fill it (seek back, then write) when the whole data was read from stdin. Currently with MTH (http://www.nncpgo.org/MTH.html) it can be done pretty efficiently: we can hash only part of the data, and then hash another missing parts. But NNCP allows encapsulating of transitional packets. So when I do: nncp-file -via alice,bob - carol:dst, three encrypted packets are generated on the fly feeding one to another: one is for carol, and another two ones are transitional. So it complicated task of rewriting SIZE field after the header more. >To do that, there needs to be some way of recognizing the end of the data. >I'm not sure how that happens now, since we also don't know the size in >advance in those cases. Is there some sort of blocking for data chunks? Storing of the whole data in temporary file is exactly intended for gaining knowledge of the resulting size in advance. Plaintext is split on blocks, which are independently AEAD-encrypted. 1) Most AEAD ciphers interfaces allow only the whole data to be processed, without intermediate updates, so we have to split it; 2) It allows exiting decryption process immediately if one of the blocks already failed (unauthenticated), without waiting till the whole packet was processed and we saw invalid MAC. Initially NNCP used ordinary block cipher mode with ordinary MAC function at the very end -- modern AEAD like ChaCha20-Poly1305 is just faster and overhead of 16*8=128 bytes per MiB of payload is negligible. And of course probably I am just missing some damn simple solution. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Avoiding double writes 2021-11-01 19:46 ` Sergey Matveev @ 2021-11-02 0:11 ` John Goerzen 2021-11-02 10:03 ` Sergey Matveev 0 siblings, 1 reply; 7+ messages in thread From: John Goerzen @ 2021-11-02 0:11 UTC (permalink / raw) To: Sergey Matveev; +Cc: nncp-devel Hi Sergey! On Mon, Nov 01 2021, Sergey Matveev wrote: > *** John Goerzen [2021-11-01 12:47]: >>Awhile back, we were discussing the temporary files that were needed for >>reading stdin for nncp-exec or nncp-file. I believe the reason for this is >>that the header contains a signature of the data that follows, and it's not >>practical to seek back and write that later. > > http://www.nncpgo.org/Encrypted.html > It is not because of signature, but mainly because of SIZE field. > Signed header contains everything you need to authenticate remote side > and create encryption keys to process ENCRYPTED data. There is not > signature over the whole packet. ENCRYPTED data contains encrypted size > as a first short block, and then pile of 128KiB blocks. Each block is > AEAD-encrypted, so it is authenticated. SIZE holds the payload size, > that can be smaller than the whole packet, because of added junk to hide > the actual payload size. Got it. So, from a glance at the code, this size is primarily used for: - differentiating encrypted data from padding - display output > We can store some structure inside each block (single signalling byte), > telling is it the payload one, or junk one. Now we can process it > sequentially. But junk in that case has to be some real data we will > really encrypt and authenticate. Junk can be zeroes, but AEAD-encryption > is much more expensive than current junk-generator made as BLAKE3-XOF > output. Much complication, more expensive (CPU). Ok, we know that if > junk block is started, then we can be sure that everything after will be > the junk, so we have to authenticate only the single block with the junk > and then quickly generate it without real AEAD-processing. Except for > the very last block with the size. Is it even really necessary to store a size? I was thinking of something along these lines also, with the signaling byte (or, perhaps more accurately, a blocksize indicator). For instance, each block would begin with a size of the block of encrypted data, and after all the encrypted data, a block size of zero could be used. There would be no need to explicitly give a size because the stream of blocks would contain all the needed information. But I'm also confused about the signature - since that comes before the encrypted blocks, isn't that also a problem? Thanks for the conversation, John ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Avoiding double writes 2021-11-02 0:11 ` John Goerzen @ 2021-11-02 10:03 ` Sergey Matveev 2021-11-02 15:26 ` John Goerzen 0 siblings, 1 reply; 7+ messages in thread From: Sergey Matveev @ 2021-11-02 10:03 UTC (permalink / raw) To: nncp-devel [-- Attachment #1: Type: text/plain, Size: 2558 bytes --] Greetings! *** John Goerzen [2021-11-01 19:11]: >So, from a glance at the code, this size is primarily used for: >- differentiating encrypted data from padding >- display output Exactly. >Is it even really necessary to store a size? I was thinking of >something along these lines also, with the signaling byte (or, perhaps >more accurately, a blocksize indicator). Well, actually currently I do not see the strong need of the size too. Of course it is nice to be able to quickly (just decipher single block of data) determine real payload's size, but for years no code/command did this, except for nncp-pkt for debugging. So I agreed with you that signalling will be more than enough, allowing "streaming" creation of encrypted packets. However with MTH it is possible (if I am not wrong) to write everything encrypted except for the first block(s), then returning back and preappending those first blocks with the known size. I am not sure, but that will require changing of much NNCP internal code, aimed for streaming exclusively. But that won't require changing of the existing encrypted format. But! When I woke up, I realized that junk (padding) is not authenticated anyhow now! If packet has two bytes of padding, then you can process it, then strip off the last byte, because of MTH-hash change it will bypass .seen-check, and it will be processed successfully again. Then you can strip another byte and see if it was processed successfully (for example, as an adversary, you see if some message/file appears somewhere after you sent the modified encrypted packet). And do that until you strip off all junk padding, thus knowing the real payload's size. So format change is definitely needed now :-). It is vulnerability, but of course not the crucial and hardly seriously exploitable, leading only to possible real payload's size leak. Padding length has to be authenticated. But as we are going to change the format anyway, I think that it is safe to get rid of SIZE field completely, adding some "signalling" metainformation to the blocks, not forgetting about padding authentication. And that won't affect much code I presume. >But I'm also confused about the signature - since that comes before the >encrypted blocks, isn't that also a problem? Signature is made over the encrypted packet's header only. It comes after the header itself, so do not see any problems :-) -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Avoiding double writes 2021-11-02 10:03 ` Sergey Matveev @ 2021-11-02 15:26 ` John Goerzen 2021-11-02 17:49 ` Sergey Matveev 2021-11-02 20:48 ` Sergey Matveev 0 siblings, 2 replies; 7+ messages in thread From: John Goerzen @ 2021-11-02 15:26 UTC (permalink / raw) To: Sergey Matveev; +Cc: nncp-devel On Tue, Nov 02 2021, Sergey Matveev wrote: > Well, actually currently I do not see the strong need of the > size too. > Of course it is nice to be able to quickly (just decipher single > block > of data) determine real payload's size, but for years no > code/command > did this, except for nncp-pkt for debugging. Yes, and for giving user output, the size of the encrypted packet could be used too. > However with MTH it is possible (if I am not wrong) to write > everything > encrypted except for the first block(s), then returning back and > preappending those first blocks with the known size. I am not > sure, but > that will require changing of much NNCP internal code, aimed for > streaming exclusively. But that won't require changing of the > existing > encrypted format. But I don't think even this is needed, since as we're saying, we don't have a strong need for the size. > So format change is definitely needed now :-). It is > vulnerability, but > of course not the crucial and hardly seriously exploitable, > leading only > to possible real payload's size leak. Padding length has to be > authenticated. Interestingly, AFAIK, OpenPGP has no provision for padding and its packet headers are unencrypted, so agreed that it isn't a big deal. > But as we are going to change the format anyway, I think that it > is safe > to get rid of SIZE field completely, adding some "signalling" > metainformation to the blocks, not forgetting about padding > authentication. And that won't affect much code I presume. Yes, exactly. This metadata could be as simple as a u32 indicating how much of the following block is actual data. Any u32 value beneath 128K (including zero) would indicate we've reached EOF of the original data and everything past that should be authenticated but discarded, I think. >>But I'm also confused about the signature - since that comes >>before the >>encrypted blocks, isn't that also a problem? > > Signature is made over the encrypted packet's header only. It > comes > after the header itself, so do not see any problems :-) Ahh, that makes sense. I'm not all up on my crypto algorithms, but if I understand correctly, each encrypted block is authenticated with the BLAKE3 hash of that block plus the unsigned portion of the header? And since that portion of the header contains the public part of the session key, that prevents data injection attacks, right? So it would be possible with this streaming approach to still determine with certainty if we have received the entire file's data, and the correct data, by processing the hash and header for each block, right? Thanks, - John ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Avoiding double writes 2021-11-02 15:26 ` John Goerzen @ 2021-11-02 17:49 ` Sergey Matveev 2021-11-02 20:48 ` Sergey Matveev 1 sibling, 0 replies; 7+ messages in thread From: Sergey Matveev @ 2021-11-02 17:49 UTC (permalink / raw) To: nncp-devel [-- Attachment #1: Type: text/plain, Size: 2034 bytes --] *** John Goerzen [2021-11-02 10:26]: >Yes, and for giving user output, the size of the encrypted packet could be >used too. Agreed. >Yes, exactly. This metadata could be as simple as a u32 indicating how much >of the following block is actual data. Any u32 value beneath 128K >(including zero) would indicate we've reached EOF of the original data and >everything past that should be authenticated but discarded, I think. I assume something like that, indeed. Plus included size of the padding/junk, to be sure that we received all of it, packet is not stripped. We have got holidays soon, so I hope will implement that. >Ahh, that makes sense. I'm not all up on my crypto algorithms, but if I >understand correctly, each encrypted block is authenticated with the BLAKE3 >hash of that block plus the unsigned portion of the header? And since that >portion of the header contains the public part of the session key, that >prevents data injection attacks, right? Mostly you are right. Each (128KiB and SIZE) block uses BLAKE3 hash of the unsigned part of the header as an associated data, used as an additional input to AEAD-encryption of the block. So each block is "tied" to the context it is used with: exactly that sender, recipient and ephemeral public key. And each block uses implicit increasing nonce counter, so blocks can not be reordered, thrown away or injected. Blocks can not be taken from another "context" (another encrypted packet), because of associated data. >So it would be possible with this streaming approach to still determine with >certainty if we have received the entire file's data, and the correct data, >by processing the hash and header for each block, right? Do not understand you clear :-(. Yes, we can be sure if we have received the whole data or not, by looking if we reached "final" payload block, containing also the size of the padding. -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Avoiding double writes 2021-11-02 15:26 ` John Goerzen 2021-11-02 17:49 ` Sergey Matveev @ 2021-11-02 20:48 ` Sergey Matveev 1 sibling, 0 replies; 7+ messages in thread From: Sergey Matveev @ 2021-11-02 20:48 UTC (permalink / raw) To: nncp-devel [-- Attachment #1: Type: text/plain, Size: 4387 bytes --] *** John Goerzen [2021-11-02 10:26]: >Yes, exactly. This metadata could be as simple as a u32 indicating how much >of the following block is actual data. Any u32 value beneath 128K >(including zero) would indicate we've reached EOF of the original data and >everything past that should be authenticated but discarded, I think. Well, after some thoughts I came to the following construction. * I do not like the fact that each payload-block will hold constant u32/whatever integer, that will differ only in single block in the whole stream. It is waste of space. Of course I understand that we are talking about mostly negligible 32-bits per every 128KiB of data, but I still do not like that waste :-) * It can be replaced with "signalling" bit, or actually single byte (for convenience), telling that either current block is fully "payloaded" or holds an additional metadata, signalling the reaching end of the payload stream * So actually we just have to differentiate single special block with metadata inside. It can be done by using different encryption key for it. That hack is used in widely-used CMAC for example: it uses one key to encrypt block with the pad, and another to signal that encrypted block has no padding. CMAC deals with just single 64-128 bit block, but NNCP with huge 128KiB one -- I think it is still acceptable CPU burn, because anyway excess 128KiB symmetric AEAD-decryption is much more cheaper that any of curve25519/ed25519 operations So we derive two encryption keys: "ordinary" and "signalling" ones. When block is encrypted with signalling key, that means that it holds two 64-bit integers at the beginning: full payload and padding sizes. Period. That is completely enough change to the packets format. Let's assume that each block holds 128-bytes of plaintext: * If we are sending 200-bytes of data, then we generate two blocks: 0: key=ordinary, 128-bytes of payload 1: key=signalling, 64-bit integer with value 200 (full payload size) 64-bit integer with value 0 (no padding) 72-bytes of remaining payload * If we wish to pad it with 30-bytes, then: 1: key=signalling, 64-bit integer with value 200 64-bit integer with value 30 72-bytes of remaining payload 30-bytes of zeros * If we are sending 128-bytes of data, then: 0: key=ordinary, 128-bytes of payload 1: key=signalling, 64-bit integer with value 128 64-bit integer with value 0 (no padding) nothing else, 0 payload bytes remaining to read * If we are sending 10 bytes of data, then: 0: key=signalling, 64-bit integer with value 10 64-bit integer with value 0 (no padding) 10-bytes of payload * If we are sending 126 bytes of data, plus 50 padding bytes then: 0: key=signalling, 64-bit integer with value 126 64-bit integer with value 50 110-bytes of payload 1: key=ordinary, 16-bytes of remaining payload 50-bytes of padding If pad size exceeds free space inside the block, then I wish to use current BLAKE3-XOF as a generator of random sequence. No real AEAD-encrypted blocks, but just a stream of XOF output. But we do not need to use cryptographic authentication, because that XOF is completely deterministic (when we know session keys of course, adversary does not), so we just can generate that stream too and compare them byte-by-byte: it is much more faster. And we know exact pad size, to be sure that noone stripped it off. Slightly more bigger code (that is actually very simple, just some state transitioning), slightly more CPU time spent on failed (initial) decryption of signalled block, but minimal waste of additional space in packets. If we see that first block is already less that 128(KiB), then we can decrypt it with signalling key immediately: so for very short packets everything will be even more compact and faster than current implementation, because in that "new" one there is only single encrypted block, instead of two (one for SIZE encryption, another for the payload). -- Sergey Matveev (http://www.stargrave.org/) OpenPGP: CF60 E89A 5923 1E76 E263 6422 AE1A 8109 E498 57EF [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-11-02 20:48 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-01 17:47 Avoiding double writes John Goerzen 2021-11-01 19:46 ` Sergey Matveev 2021-11-02 0:11 ` John Goerzen 2021-11-02 10:03 ` Sergey Matveev 2021-11-02 15:26 ` John Goerzen 2021-11-02 17:49 ` Sergey Matveev 2021-11-02 20:48 ` Sergey Matveev