I would be carefull with expectations to performance. It could be the same effect that sometimes happens when you use inline assembler.
The compiler has to manage around your machine code and some optimizations are now impossible and you lose more performance than you gain by your asm magic.
It is really difficult to beat modern compilers