Speed up heavy math calculation on compute shader

I am trying to make the custom postprocessing shader.

This include iteration of forloop 20000 times and for loop has sin, cos, all sort of math.

Currently it takes almost 150ms per frame.

Is there anyway to speed up this calculation? I did some optimization on math.

But I think there is more to try.

Any wildest suggestion will do.

Thank you.

Let me show you my shader code in a nut shell,

Texture2D SceneColorInput;
Texture2D SceneDepthInput;
FooMainCS {
  for(;i<200;i++){
    for(;j<100;j++){
      colorTmp = SceneColorInput[uint2(i, j) * stride];
      depthTmp = SceneDepthInput[uint2(i, j) * stride];

      finalColor += cos(sin(all sort of math));
    }
  }
  OutColor = finalColor;
}

This is the situation.
I am not sure what I could do besides build distributed system.

Thank you.

Can you show the shader? How did you get a loop in a material?