Bit operations

Problem to discuss

We need to sort a big file containing a lot of 7-digit integer numbers. The whole file cannot fit into the memory. The problem is to sort this file with the best performance. More precisely:
Input:   A file containing at most n positive integers, each less than n, where n=107. It is a fatal error if any integer occurs twice in the input. No other data is associated with the integer.
Output:   A sorted list in increasing order of the input integer.
Constraints:   At most (roughly) megabyte of storage is available in main memory; ample disk storage is available. The run time can be at most several minutes; a run time of ten seconds need not to be decreased.

Binary operation

In this section we will discuss bitwise operations we can perform against integer numbers. Note that all those operators are applied separately to each bit of the integer number. For example, if we talk about an integer number 103 stored in one byte memory cell:
unsigned char n = 103;
then this number has binary representation 01100111. Notice the leading zero. We need to remember about it because bitwise operations will be applied to this bit as well.

Bit OR

Binary operation OR (we use one vertical line for this operation) should not be confused with logical OR, for which we use two vertical lines. This operation takes two integer numbers and performs bitwise OR. The results of the bit operations are below:
  • 0 | 0 = 0
  • 0 | 1 = 1
  • 1 | 0 = 1
  • 1 | 1 = 1
    Thus, if we have two 1-byte numbers 103 and 77, then binary OR will produce
    n = 01100111 = 103
    m = 01001101 = 77
    n | m = 01101111 = 111

    Bit AND

    Binary AND works similar way, but performs AND operation on bits. The result of the bit-AND operations are:
  • 0 & 0 = 0
  • 0 & 1 = 0
  • 1 & 0 = 0
  • 1 & 1 = 1
    Please note that we use singe ampersand (&) for binary AND and double ampersand (&&) for logical AND. Being used on the same numbers n=103 and m=77 this operation produces
    n = 01100111 = 103
    m = 01001101 = 77
    n & m = 01000101 = 69

    Bit exclusive-OR

    Exclusive-OR does not have a logical analogy. This operation being performed on two bits produces 1 only if the values of these bits are different, and 0 otherwise. C/C++ uses carrot sign (^) for this operation. The values for all possible bit combinations are:
  • 0 ^ 0 = 0
  • 0 ^ 1 = 1
  • 1 ^ 0 = 1
  • 1 ^ 1 = 0
    Being used on the same numbers n=103 and m=77 this operation produces
    n = 01100111 = 103
    m = 01001101 = 77
    n ^ m = 00101010 = 42

    Bit NOT

    We use tilde sign ~ for binary NOT, for logical NOT the exclamation sign is used. The binary NOT operation substitutes each bit with its opposite:
  • ~ 0 = `
  • ~ 1 = 0
    Note that this is unary operator. It's applied to one integer number only. Being applied to each of the numbers 103 and 77 it produces:
    n = 01100111 = 103
    ~ n = 10011000 = 152
    m = 01001101 = 77
    ~ m = 10110010 = 178

    Bit shifts

    Two more bitwise operations are left and right shifts. C/C++ uses operators << (left shift) and >> (right shift) for them. As you can guess from their names, these operators shift all the bits of the number to either left or right. These operators take two integer arguments:
  • the left one - is the number to perform the shift on
  • and the right one - how many positions to shift for. For example, the following operations produce
    n = 01100111 = 103
    n << 2 = 10011100 = 156
    m = 01001101 = 77
    m >> 4 = 00000100 = 4

    Please note that the result of bitwise operations depend on how many bits are allocated to store the number. For example, if we perform the same left shift operation on the same number 103 stored in 16-bit memory cell, then the result would be 00000000 01100111 << 2 = 00000001 10011100 = 412.

    Using combinations of these bitwise operators we can read/write each bit in a memory location. Here you can find a simple HTML page that allows you to play with binary operators.

    Accessing bits through a structure

    Another way to access every bit of a memory location is to use the union construct of C/C++. This construct allows us to access the same memory location in a different way. Sometimes it's convenient to think that the union creates two (or more) different variables, but stores them in the very same memory cells. For example, we can define a new type TInt by:
    union TInt {
       int int_value;
       struct {
          unsigned char byte_0;  
          unsigned char byte_1;  
          unsigned char byte_2;  
          unsigned char byte_3;  
       } bytes ;
    };
    
    Now, each variable data of the new type TInt can be accessed either as an integer value by data.int_value or bytewise through the structure bytes located in the same physical memory. For example, to access the lowest byte of the number we need to write data.bytes.byte_0.

    If we also use the additional opportunity given by the C++ language and specify the amount of bits each element of a structure takes, we can provide bitwise access to the memory. To specify the bit capacity of a structure element, we need to put a colon followed by the desired number of bits after the declaring a structure member. For example, in the following structure element a takes only 3 bits, and element b only 5 bits:

    struct Test {
      unsigned int a:3;
      unsigned int b:5;
    }:
    
    Using both of these constructions together we can define the following data type:
    union ByteBitMap {
       unsigned char byte_value;
       struct {
          unsigned int b0:1;
          unsigned int b1:1;
          unsigned int b2:1;
          unsigned int b3:1;
          unsigned int b4:1;
          unsigned int b5:1;
          unsigned int b6:1;
          unsigned int b7:1;
       } bits;
    };
    
    Now, we can access elements of the type ByteBitMap as byte values through the byte_value member
    ByteBitMap b;
    b.byte_value = 59;
    
    or bitwise through the bits stucture:
    b.bits.b4 = 1;
    
    Follow the link to find the complete example.

    Sieve of Eratosthenes. Count all prime numbers less than 128,000,000. Hint: create a bitmap of 128M bits and use each bit as an indication if the corresponding number prime or not.