We’re kicking off a programming contest today that is sure to challenge even the most comp-sci heavy engineers out there, and we’re excited to see what you all come up with. With the difficulty of the challenge in mind, we’ve got some great prizes for the winner: an iPhone 3GS AND $2,000 of Cloud (Flex or Solo) credit. Now to jump right in and answer all your questions…
What is the contest?
You must tweet a sequence of twelve words that when hashed is bit-wise closest to a hash of a challenge phrase that we will announce the morning of July 20th. All words must be from a 1,000 word dictionary we will provide at that same time. You are allowed to append up to five random characters to the end of your entry. We’re pretty confident you’ll want to write a program to automate the finding of close matches, so announcing this a week in advance should give you enough time to get your programs up and running.
How do I enter?
To enter the contest, follow @engineyard on Twitter and tweet your best candidate word sequence before 6pm Pacific Time on July 21st. This means you have about 30 hours between the availability of the challenge phrase and dictionary, and the entry submission cut-off time.
As previously mentioned, the winner of the contest will get an iPhone 3GS AND $2,000 of Cloud (Flex or Solo) credit). [You can also choose an alternative of load test credits worth $2,000 from browsermob]. Second prize is another iPhone 3GS.
So how does it work exactly? (Update! Example Now Clearer!)
Let’s take an example: a dictionary excerpt is: “Cloud, Ruby, DHH, one, eight, six, active, record, controller, data, rspec, mongrel, MySQL, postgresSQL, tokyo, MRI, jruby, rubinius, memcached, exception, metaprogramming, reflection.” Let’s also say we announce that the challenge phrase is
I am not a big believer in fortune telling
To submit a contest entry, you would follow us on Twitter and tweet your best entry, e.g: “@engineyard Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3”
We will take the SHA-1 hash of this phrase:
Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3 which hashes to
cd36b6dc8d4ed51b36dd7fce08f500392a7fb782 and compare it to the SHA-1 hash of
I am not a big believer in fortune telling (which hashes to:
When we say “compare,” we mean that we will take the Hamming distance between the two hashes; the sum of the count of dissimilar bits when the hex hashes are converted to binary.
For example, here the binary of
1100110100110110...etc. and the binary of
So calculating the Hamming distance is done as follows:
- first two bits (1 vs 0) don't match -> +1 to Hamming distance
- second two bits (1 vs. 1) do match -> no change to Hamming distance
- third two bits (0 vs. 1) don't match -> +1 to Hamming distance
In the case of the complete example hashes above, the Hamming difference is 74. If you are the submitter with the lowest Hamming distance, you win the prizes - it’s that simple ;)
Extra Prize: If you manage to achieve a Hamming distance of zero, we’ll throw in a MacBook Pro: you are either highly improbable, have mad algorithm cracking skills or you work for the NSA, any of which makes you cool enough to deserve random goodness and recognition. Note: we know the probability of anyone getting to a zero Hamming distance is truly vanishingly small, but we wanted to acknowledge anyone making it there!
There are some obvious brute force strategies to win this prize. We’d suggest building a really fast word permutation algorithm, and finding a fast SHA-1 hash algorithm. Then find a way to get your hands on a whole bunch of computation for the 24 hours that the contest will run (hmm… perhaps the cloud would be useful).
More details and conditions:
- Only US ASCII printable characters in your custom five character string please (we really want to avoid Unicode rat-holes)
- The words in your string must be single space separated; no other punctuation is allowed
- Spamming new entries as you find better ones is a fail whale; limit your contest tweets to a maximum of five
- The dictionary will be a Macintosh TextEdit file in RTF format, with each word on a separate line, and no white-space.
- In the case of a tied Hamming distance (entirely possible), the winner will be chosen by lottery among people with the best distance
- You may permute capitalization for the dictionary words (i.e. you may use Ruby, rUby, RUBY, and RUBy)
- Please scrub your custom five character string for the five words you can’t say on television
- If the exact same string is submitted multiple times, only the first submission counts
- Employees and contractors of Engine Yard, and their family members are not eligible
Okay, so maybe “It’s that simple” was a bit, well, over-simplified, but we’re confident we’ll get some great submissions. The Ruby community is nothing if not persistent, creative and intelligent – so show us what you’ve got!
1) When you convert the hexadecimal hash to binary – you need to convert the hex to the equivalent number in binary e.g. “c” = “12” in decimal = “1100” in (big endian) binary. Make sure that your binary conversion function is NOT treating “c” as ASCII letter “c” – which gets you the completely different answer of “63” in decimal or “01100011” in binary. 2) Be careful if you end up using string hashing functions in C. Remember that C strings are null terminated, and from reports we’re getting, at least some string functions out there take the Null string terminator ( “�”) as an input to the hash function. This will get you in trouble because we will not be including a null terminator when we calculate the hash of your tweets. Naturally, we will treat your tweet as a (sane) Ruby string.
Another example for folks
People have asked for another example to test their hashing algorithms so here it is:
Example challenge phrase #2:
What you write today will become legacy which hashes to:
7f83e6b422af5ca4e3112486aea3e702e98a894e or in hex to binary (big-endian):
0111 1111 1000 0011 1110 0110 1011 0100 0010 0010 1010 1111 0101 1100 1010 0100 1110 0011 0001 0001 0010 0100 1000 0110 1010 1110 1010 0011 1110 0111 0000 0010 1110 1001 1000 1010 1000 1001 0100 1110
Example contest entry tweet #2: @engineyard RuBy one eight six rspec mongrel MRI jruby jruby memcached exception reflection utf8E
We will take the hash of
RuBy one eight six rspec mongrel MRI jruby jruby memcached exception reflection utf8E which hashes to:
075a32acb1816b570607189475ebbbaccce8b79f or in hex to binary (big-endian): ` 0000 0111 0101 1010 0011 0010 1010 1100 1011 0001 1000 0001 0110 1011 0101 0111 0000 0110 0000 0111 0001 1000 1001 0100 0111 0101 1110 1011 1011 1011 1010 1100 1100 1100 1110 1000 1011 0111 1001 1111`
The hamming distance between these two hashes (again, remember treating the hash as HEXADECIMAL, NOT ASCII) is 80.
Example dictionary file: sampledictionary