It needs a fair bit of implementation.
I think the best bet is level streaming with a tiled landscape that has your interiors as sublevels. So that all the levels are set to non distance streaming, streaming is initiated by player overlap and the server keeps a list of which players streams which levels. The server handles the visibility of actors in other levels through the list by replicating only those in the player’s level. You can of course do all the graphical stuff locally and only replicate collision and other important events only on the dedicated server, which will allow you to handle a lot of stuff. This is basically the same as your number two, but a bit more finesse.