View Projection Matrix -> How to use

Hello there,
first of all, I’d like to remark that I am pretty much a total beginner with UE, however I have to use it for a project I am working on.

Background:
I have got a translating object in my UE world, that is being filmed by a static camera. This camera exports the frames that I further use for processing. I need to track certain points of the object in my exported frames with the pixel coordinates of the picture. So basically I need to convert the world coordinates from UE into pixels of my frames, as I got a funcion for my movement.

Problem:
I want to use the view* projection matrix of the camera-view, that I obtained via a bp, as in theory, those transformation should pretty much give me exactly the relation between the two coordinates. I could not find any proper documentation on those matrices btw. But I struggle to use this matrix, as the elemts of the output vector when multiplicated wih my coordinate vector are couple of thousands. From my understanding all of the values should be in between -1 and 1 to represent the camera space. Also scaling it down does not provide any reasonable values.

additional info:
my camera is facing in positive x-direction and this is my view*projection-Matrix:
[0,0,0,1]
[3.53535, 0, 0, 0]
[0, 6.28507, 0, 0]
[-3535.35, -754.209, 10, -1475]

Would be really nice if anyone could help me or tell me where I went wrong :))
Basically I just want to get the realtion between world coordinates and the pixels of my recorded frames

I am exactly working on similar stuff. I am running a metahuman simulation, capturing it in a cinecamera, rendering the frames out. Need to extract the pixel coordinates of the bones, for training the CV model.

In the beginning, I was extracting the world coordinates of the bones. Then to convert it to camera view space, I took the camera’s world transform, inversed the same, multiplied it with the world coordinates of the bone.

Now I have camera view space. But what we need is the Clip Space

So, multiply the camera view space matrix, with the projection matrix of the camera(for which you need to feed the intrinsic parameters of the camera like the aspect ratio, FOV, far point, near point).

Once the Clip Space is calculated, normalize the same, to get the pixel coordinates.

I am stuck now at converting the camera view space to clip space, since the minimal view info node, with which we can feed the intrinsic parameters, is not spitting out as a matrix. Working some solution around it and will update if the entire thing is working

Hope this is of some help, and does not confuse you more!