TAGS :Viewed: 5 - Published at: a few seconds ago

[ Hash function for good sorting/comparison for 6 integers ]

I was playing with a problem here... I managed to solve it using the following:

 struct wrap
 {
    char grid[7];
    bool operator==(const wrap &segundo) const
    {
        return !strcmp(segundo.grid, grid);
    }
    bool operator<(const wrap &segundo) const
    {
         return strcmp(grid, segundo.grid) < 0;
    }
    //other stuff here
 }

Now I'm trying to see if working with 6 integers instead of that array would improve the speed:

 struct wrap
 {
    int x1, x2, x3, y1, y2, y3; 

    bool operator==(const wrap &segundo) const
    {
        return x1 == segundo.x1 && x2 == segundo.x2 && x3 == segundo.x3 && y1 == segundo.y1 && y2 == segundo.y2 && y3 == segundo.y3;
    }
    bool operator<(const wrap &segundo) const
    {       
        return ??? // I want to know a good thing to put here... 
                       // I have some Ideas but would appreciate suggestions
    }

Let me explain a little better the situation: I'm using a set and I started playing with sets yesterday so I don't know much about it...

I know it requires operator< to know how to sort the elements... but does it use my definition of operator== when using the find method? or it would be doing inversion in the order of elements while comparing and that is it?

If the answer to the previous question is second possibility, will it behave differently if the struct implement operator== (since the < is provided by the programmer, it is feasable that it might take advantage of other operator that might be provided to avoid some negations?)

Going back to my code, I thought about:

 return 100000*x1 + 10000*y1 + 1000*x2 + 100*y2 + 10*x3 + y3 - (same sum for the second) &gt; 0; 

One last question: the values that those 6 ints can have is in [0, 9], changing them to shorts or chars will give any speed improvement or just memory?

Answer 1


does it use my definition of operator== when using the find method?

No.

would be doing inversion in the order of elements while comparing and that is it?

That's right. It finds a key that's equivalent under the comparison operation, not necessarily equal, to its argument. Keys k1 and k2 are equivalent if

!(k1 < k2) && !(k2 < k1)

will it behave differently if the struct implement operator==

No. Ordered associative containers only use a single comparison operation - by default std::less<key_type>, which by default uses operator<. Even if you provide an equality operator, it won't necessarily give the same result as the equivalence test that the container needs, so the container can't use it.

changing them to shorts or chars will give any speed improvement or just memory?

There's a good chance that it will, but it's something you'll need to measure. Likewise, the only way to tell whether the branch-free arithmetic comparison you suggest would be better than a lexicographical comparison (like strcmp/memcmp or tuple comparisons) is by measuring.

Another possibility is to compare them all in a single 64-bit operation:

union {
    uint8_t bytes[8]; // Make sure unused bytes are set to zero
    uint64_t all;
} grid;

bool operator<(const wrap &segundo) const {
    return grid.all < segundo.grid.all;
}

You might also like to investigate std::unordered_set, a hashed container which can be more efficient if you don't care about the order of the elements.

Answer 2


All the container knows about your object is that it has an operator< so. operator== if it exists will not be used.

The operator< that you supply must be a vaild one in the sense that:

if not x<y and not y<x ==> y==x

The simplest operator< for a series is a dictionary compare.

The sorted_set (default c++ set) will use operator< keep the set in order and will find from the set in O(lgN) unorderd_set will use a hash function which assuming a decent hash function will perform better then set on all but the smallest sets and will not allow to make any assumptions about item order.

EDIT: The problem with your operator< is likely one of overflow. unless you know the MAX value for x1, x2, x3 etc.. then your suggestion is likely to produce wrong results.

x1*1000000 might pass MAX_INT in which case it will become a huge negative value. x2 might be larger then 10,000 in case it is big and x1 is small then again you will get wrong results.

My first attempt will be to use something like

   std::tie(x1, x2, x3 ... ) < std::tie(o.x1, o.x2, o.x3, ...) 

which keeps the code readable and in essence it is a dictionary compare:

if (x1== o.x1) {
    if (x2== o.x2) { 
       if (x3 == o.x3 {
          ...
       } else { 
           return x3 < o.x3;
       }
    } else {
      return x2 < o.x2;
    }
} else {
    return x1 < o.x1;
}