Hi,
I think that’s tricky because for position or velocity control we use a different message SET_POSITION_TARGET_LOCAL_NED than for attitude control where we use the message SET_ATTITUDE_TARGET. In theory you could be sending both messages with most fields set as ignored except the altitude in the first message and the attitude in the second. If I was you I would try that in SITL and see what happens, or alternatively look at the position controller to try to figure out how it deals with that.
What I’m reading from your question is that you basically would want to do the same as altitude control, so roll, pitch, yaw and altitude. Could you explain to me what the thought or use case behind this is? Is the problem that you don’t have GPS or flow or is the movement by velocity control to jerky?