The scrypt() algorithm has at its core a routine called ROMmix. Basically, it defines
V(1) = hash(message)
V(2) = hash(hash(message))
V(3) = hash(hash(hash(message)))
...
and it calculates
V(V(V(V(...(message))))
Since computing V(n+1) requires computing V(n) first, the most efficient way to do this is to cache all of the previously-computed values. Once you’ve generated a large enough table, the V(V(V(…))) is just a bunch of lookups.
Caching all the previously computed values requires lots of memory, and since each lookup depends on the previous one it’s sensitive to memory latency (although if you’re mining you can work on several blocks in parallel and pipeline the requests to get around this).
GPUs can perform far more integer operations per second than a normal CPU, but have roughly the same memory bandwidth/latency as a CPU. So an algorithm that is memory-dominated should “level the playing field” between CPUs and GPUs.
I still don’t understand why the Tenebrix folks consider this to be an important goal. It just “equalizes” GPUs and CPUs, but you can still build custom hardware that does scrypt() much faster and cheaper than a CPU. So it’s just going from “GPUs are best” to “custom printed circuit boards covered in memory buses are best”. Nobody’s been able to explain why this change is worth all the trouble.










