Protecting the world

from criminally bad code

memcmp

synopsis

#include <string.h>

int memcmp(const void* s1,const void* s2, size_t n);

Description

The memcmp function performs a byte-by-byte comparison of two equal-sized blocks of raw memory. It returns:

  • zero if the two blocks are identical,
  • a positive value if the first non-matching byte is greater in s1, or
  • a negative value if the first non-matching byte is greater in s2.

Pitfalls

x==y does not imply that memcmp(x,y)==0

It is possible for two objects of the same type to be equal according to operator== but have a different representation in memory. There are several ways in which this can happen:

  • Types can contain padding that does not contribute to their value[1]. The content of such padding is unspecified[2].
  • Numeric types can have positive and negative representations for zero[3].
  • In C++, operator== can be defined to have any behaviour that is expressible as a function if one or both of its arguments is a user-defined type.

An example of a C++ type that cannot be safely compared using memcmp is std::string. Instances of this class usually contain a pointer through which the character data is accessed. Strings with the same value sometimes share a single copy of the character data, but they are not required to do so. If the character data is not shared then the strings will point to different locations in memory, so their representations as seen by memcmp can differ even though their values are the same.

There are a few types that can be safely compared because none of the issues listed above are applicable:

  • unsigned char[4],
  • unsigned exact-width integer types (such as uint32_t)[5],
  • arrays of types that can be safely compared using memcmp, and
  • structures of types that can be safely compared memcmp, if they can be shown not to contain any padding.

It is also safe to compare character strings (even if char is signed) because the null character used as a terminator is defined to be a normal zero [6].

memcmp(x,y)==0 does not imply that x==y

Objects with an indentical representation in memory are usually considered to be equal by operator==, but there are some circumstances where this is not the case:

  • Floating point types often have a value called NaN (‘not a number’) with the property that NaN==NaN is false.
  • In C++, operator== can be defined to have any behaviour that is expressible as a function if one or both of its arguments is a user-defined type.

Portability

  •  C90
  •  C99
  •  C++98
  •  C++11

In C++, use of the header <string.h> is deprecated in favour of <cstring>.

Notes

  • [1] C99 §6.7.2.1 ¶13,15
  • [2] C99 §6.2.6.1 ¶6
  • [3] C99 §6.2.6.2 ¶2
  • [4] C99 §6.2.6.1 ¶3 (implies that an unsigned char cannot have padding)
  • [5] C99 §7.18.1.1 ¶1,2 (implies that exact-width integer types cannot have padding)
  • [6] C99 §5.2.1 ¶2

Further reading

  • The memcmp function, Programming languages — C, ISO/IEC 9899:1999, §7.21.4.1, p327
  • memcmp, The Open Group Base Specifications, Issue 7, The Open Group, 2008
  • memcmp(3), Linux Programmer’s Manual, The Linux man-pages project
  • String/Array Comparison, The GNU C Library Reference Manual, The GNU Project