Change detection precision

We have just completed a drone-based wildlife detection project that flew DJI Phantoms over the same region on two days. We pixel-aligned pairs of orthos covering the same ground to detect animal presence vs. absence. We want to do a followup project that aligns flight 1 shapshots with. flight 2 snapshots more closely. If we were flying a drone with the best available Pixhawk flight controller, intervalometer, RTK GNS/GNS , etc. combination with 0.5 cm GSD pixel resolution, how closely (in numbers of pixels) could we expect flight 1 snapshots to be aligned with flight 2 snapshots, timed and controlled so that each pair of snapshots cover the same region as closely as possible? What would be the best combination?