The ECL EKF chooses the reference frame implicitly based on the used heading source. If you want to use the ev_yaw heading the reference frame will be orientated such that its matches the body frame at 0 degree yaw, (and zero pitch and roll of course). Your provided position information should then agree with the EKF reference frame. Meaning that for example if you move your vehicle with 0 degree yaw in forward direction (this is then also x axis of reference frame), your x position measurement should increase, not your y value.
If the reference frame to which you measure the heading does not roughly align with the reference chosen by the EKF, the current implementation of the Kalman Filter will be confused since he will have trouble to align current position and veloctiy information with the heading information.
For this reason if you use GPS the heading should be computed with respect to north. Therefore GPS fusion is not performed when using ev_yaw. See here.
If you can compute the heading angle with respect to the VO reference frame, I would proceed as follows:
Use new EV interface and set ev_sample.pos and ev_sample.vel as well as ev_sample.q together with uncertainty. Build the quaternion from the heading angle, zero pitch and roll should be fine.
Set EKF_AID_MASK bits for EV_YAW, EV_VEL, EV_POS. Your position information should be in your local reference frame (e.g. [0.1,0.3,-0.4]), keep in mind z axis is pointing down.
I would also suggest to use your VO system for primary height: vdist_sensor_type = 3. I think that should do it for you.
I agree for this use case it does not make sense to use the mag for initialization. But as soon as you receive ev_data it will reset the heading to your information, and will stop using the mag at all. See here
Let me know if this works!