The delay depends on how the player plug-in works. Some decoder APIs are able to render directly into a render target, others will return frame buffers on some random thread.
We do not currently have a decoder that decodes directly on the GPU. I believe that Bink is the only solution that can do this at the moment, but it is not free.
I think that with the current implementation the problem is not so much frame delay, but performance, because we’re copying frame buffers several times. The delay should be at most one frame either way. The new Media API will allow for two different video sink modes: triple-buffered render targets, and direct-write to render target. The latter is used for decoders that can render or need to do post-processing on the Render thread, and the former is used for everything else. When writing to render targets on the Render thread, there should be no frame delay. When using the triple-buffered mechanism, there can be a delay of at most one frame (the texture resource will grab the latest buffer the next time the Render Tick is executed, which may be before or after the current frame finished rendering).
The main focus for Media 2.0 is to eliminate extra frame buffer copies.