Linux’s strange memory allocation strategy

As I learned today, Linux has a real strange memory allocation strategy. Have a look at this piece of code:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
  while(1) {
    char *foo = (char*)malloc(1024);
    if(foo == NULL) {
      printf("Couldn't alloc\n");
      fflush(stdout);
      return 0;
    }
  }
  return 0;
}

According to the malloc reference it should return NULL if it is not able to allocate memory:

“If the function failed to allocate the requested block of memory, a null pointer is returned.”

So I thought my process would write Couldn’t alloc to stdout and exit in a proper way. But it didn’t act that way. When my system ran out of memory the process got killed by the kernel with signal SIGKILL. So why does it act like this? Wikipedia writes about Out of memory:

“A process which exceeds its per-process limit and then attempts to allocate further memory – with malloc(), for example – will return failure. A well-behaved application should handle this situation gracefully; however, many do not. An attempt to allocate memory without checking the result is known as an “unchecked malloc”.”

Well… Yes… Of course, you always should check malloc() if it returned NULL, but under normal conditions it never will return NULL because of Linux’s memory overcommitting. By default Linux has an optimistic memory allocation strategy. When allocating memory, malloc() will return a pointer. The space on the heap which is needed for that pointer gets allocated at the first read/write operation to that address. In my opinion, this is a really strange behaviour, because when memory gets allocated, it will actually be used. Here’s a short example to see the effects of that strange behaviour:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MALLOC_SIZE (1024*1024*10)

int main(void) {
  int i = 0;
  while(1) {
    char *foo = (char*)malloc(MALLOC_SIZE);
    //memset(foo, 0xaa, MALLOC_SIZE);
    printf("Allocation No. %d\n", i++);
    if(foo == NULL) {
      printf("Couldn't alloc\n");
      fflush(stdout);
      return 0;
    }
  }
  return 0;
}

With the memset line commented out it results to:

ralf@Pegasus:C$ ./test|tail
Allocation No. 1841101
Allocation No. 1841102
Allocation No. 1841103
Allocation No. 1841104
Allocation No. 1841105
Allocation No. 1841106
Allocation No. 1841107
Allocation No. 1841108
Allocation No. 1841109
Allocation Nzsh: killed     ./test |
zsh: done       tail

And with the memset commented in it results to:

ralf@Pegasus:C$ ./test|tail
Allocation No. 1275
Allocation No. 1276
Allocation No. 1277
Allocation No. 1278
Allocation No. 1279
Allocation No. 1280
Allocation No. 1281
Allocation No. 1282
Allocation No. 1283
Allocazsh: killed     ./test |
zsh: done       tail

This is really funny. You can allocate tons of space you don’t have. That’s creepy. But this behaviour can be switched by

echo 2 > /proc/sys/vm/overcommit_memory

“Since 2.5.30 the values are: 0 (default): as before: guess about how much overcommitment is reasonable, 1: never refuse any malloc(), 2: be precise about the overcommit – never commit a virtual address space larger than swap space plus a fraction overcommit_ratio of the physical memory. Here /proc/sys/vm/overcommit_ratio (by default 50) is another user-settable parameter. It is possible to set overcommit_ratio to values larger than 100. (See also Documentation/vm/overcommit-accounting.)” Source

Then, both results (with and without memset) lead to:

ralf@Pegasus:C$ ./test|tail
Allocation No. 433
Allocation No. 434
Allocation No. 435
Allocation No. 436
Allocation No. 437
Allocation No. 438
Allocation No. 439
Allocation No. 440
Allocation No. 441
Couldn't alloc

And this is exactly what I originally expected to be the normal way. Without this bypass, malloc will never return NULL and a process will not be able to shut down properly if the system runs out of memory.

Here‘s a nice anecdote on that topic.

Authenticated block cipher mode of operation

Common symmetric block cipher mode of operation (such as CBC, OFB, CTR) only provide confidentiality. That means that your plaintext is not readable without the knowledge of a secret key. When ciphertext gets changed by unauthorized third parties, it is not replicable if that change happened intentionally by an authorized party or by an unauthorized one.

But there are several ways to provide confidentiality as well as authenticity. One possibility to guarantee confidentiallity and authenticity  is to use message authentication codes. MACs tag messages with additional data. Only authorized parties having information about the key which was used for the generation of the the MAC are able to generate and check the tag. If the tag or the message gets changed by an unauthorized party, one is able to recognize it.

Opinions are divided wether to tag plaintext or ciphertext data. Both opportunities have advantages and disadvantages: Another reason to MAC ciphertext, not plaintext , When authenticating ciphertexts, what should be HMACed?

As it is explained here, the correct usage of CBC-MAC is also a well-known issue 🙂 I personally prefer using HMAC for authenticating messages. It is quite easy to use and fool proofed. But all MAC’s have one big disadvantage: When decrypting or encrypting data, one has to iterate two times over the data, no matter if plaintext or ciphertext got tagged. One iteration is for de/encryption and one is for the generation of the MAC. This is convenient for small amounts of data but it leads to high loads when working with bulk data. So this method is inappropriate for hard disk encryption.

Therefor “authenticated encryption with associated data (AEAD)” or simpy “authenticated encryption (AE)” could be used. Data gets de/encrypted AND tagged simultaneously. Here are some examples for authenticated encryption modes. Those methods encrypt and authenticate a secret message m and also authenticates “additional authenticated data” a, which gets not encrypted (a can also stay empty). Unfortunatelly, those modes of operation are not very popular up to now. Some authenticated mode of operation are not able to finish decryption if some (authentication) error occurs during decryption. This could lead to problems in some cases, especially when it is necessary to recover faulty data.

Authenticated block cipher mode of operation could actually be used much more frequently instead of HMAC or other ways to provide authenticity. They are much more performant, elegant and straightforward. But even modern disk encryption software like TrueCrypt or Linux’s cryptsetup (LUKS) do not support authenticated encryption. I had a nice discussion on that topic on the
dm-crypt mailing list: [dm-crypt] Authenticated Encryption for dm-crypt