We are working with a multipurpose room that is used for lectures and showing movies. It’s messy from a control standpoint, but we can use an A/V receiver and
switch to a separate audio input during computer presentations, such as spread sheets, but this does not allow talking over HDMI video. There is a projector and a Large screen TV, each with external speakers, possibly showing different sources.
I’m not aware of a ‘box’ that will allow me to mix an audio feed with HDMI audio. Fortunately we don’t need to deal with HDMI-ARC. Using the A/V receiver we can easily accommodate whatever the presenters show up with, such as AppleTV, Airplay 2, Bluetooth, or a direct input. It’s the audio mixing that is the challenge.
How many speakers will; be used? If the room is fairly large, does it really need to be a surround system? It would seem very difficult to provide a real surround experience when the listeners are far from some of the speakers.
If you agree about the idea that it doesn't need to be a surround system, this should be fairly easy since audio extractors for HDMI are pretty easy to find. If several HDMI sources will be used, a 2 channel receiver like the Denon DRA-800H can be used as a switching device, sending the audio to a mixer so mics, BT, Airplay, etc can have their own channels in the mixer.