Some ECS questions

Im studding theory of entity component system. I get core concept (or at least think I am) of its data oriented design, why array of structs are better than struct of arrays, and so on.
But I have several questions that I didnt able to find answers
Most examples in the google consider as an example cases with 1 million entities, so all my questions would have this number of components as “default”

  1. What with the components access? Say I have Collision component which have 3 floats for position along axis, 3 floats for velocity along axis, 3 floats for rotation and 3 floats for size of an object. So its (3 + 3 + 3 + 3) * 4 = 48 bytes. 48 * 1.000.000 = 4.8 megabytes of raw data. Thats more than 2 times bigger than L3 cache of all nowadays CPU. Even with lower level caches it still unable to fit into CPU completely.
    To move component properly it should check for colisions of this component with other Collision components in dedicated octree (or any other) colision detection subsystem. But problem is, id of components in single octree block may not be same as components loaded in memory. I mean, components in octree array may not be in same part of components array which currently in the cache, so CPU should constantly swap memory from/to to correctly find is collision is possible.
    Or such cases happens very rarely and they wouldnt create any noticeable performance issues?

  2. What with the different type of components? Say, I have 2 components for each entity, Transform component (6 floats for position and rotation) and Model component (two integers for model id and frame of that model). What is a proper way to send all that data to videocard for rendering? I mean, CPU should use two completely different type of components in this case. So it need swap cache each now and then to get corresponding data. And since it should happens each frame, isnt it create performance problem?
    Yes, its possible to put Transform and Model components in single continuous stream in memory, but not everything (nothing actually) will use thous two types of components together, except renderer. Or for example wait until other threads finish their work and only then swipe cache to check this one component to reduce cache miss time to minimum. But isnt it a missed opportunity to speed up things? Or its necessary evil?

  3. Isnt ECS, in its core, create **HUUUUUGE **problems for future developing of a system? I mean structural problems, like scalability.
    Like yeah, you group each type of components together, great job man, thumb up. But now you need read some data from 2 different types of components in single iteration. So you should break entire system and build it from scratch or suffer from performance issues because of cache swap. You done it already? Great job, but now you should read variables from 3 type of components.
    So each such change pierce entire code and can, and will, create new bugs and errors in previously debugged code.

Or other problem is code readability. With simple OOP its (if you follow clean code instructions) simple. Here int i, here it ++, here it used, EZ.
But with ECS it would become a mess. Here components created, there it checked is i == null, over there it jumps five times, otherwise network will fall apart, whats going on AAAAAAAAAAAAAAAAAA

And main, IMO, is it even worth it? Every code will be transformed from human readable form to assemblers jmp call mov and other interesting constructions by compiler-linker. Every compiler have some native optimization in it. In C++ some optimization even forced and you cant turn them off without rewriting whole compiler. So isnt it over-engineering?