I think the problem is not the solutions we can come up, but in fact the large map you want to have. Is it really ideal?
The streaming levels are the way to handle large environments thinking on performance constraints, but I am not sure what you will put there in each tile, so maybe it is overestimated.
If I would build a game similar, I would use a quadtree system if I only care about the plane (X and Y) and not the depth (Z) with each quad being a streaming level. But again, I would define a maximum size for the map if it is multiplayer, big maps in multiplayer are taxing, and some solutions let one server per map just for the sake of consistency and performance.