Error-state Kalman filter (mismatch between state prediction and covariance prediction)


I am interested in developing my own version of EKF-based UAV state estimator, and currently analyzing PX4-Autopilot’s implementation, especially the optimization(codegen-ed) part using python symforce.

It seems that covariance prediction part of EKF is substituted with error-state version since this (and this), as well as all codegen-ed H matrices and Kalman gains. (adapted to 23-dim)

The attitude part was mainly corrected (from 4-dim quaternion to 3-dim) while the rest are mostly remained the same (since they are all in Euclidean space). You can see the equivalence inside by comparing the two elements: A[3:, 3:] and A_old[4:, 4:] where,

  • A (latest version)

= VTangent(state_error_pred.to_storage()).jacobian(state_error).subs(zero_state_error).subs(zero_noise)

  • A_old (old version)

= VState(state_pred.to_storage()).jacobian(vstate)

Here, I denote the first argument of predict_covariance() function as vstate, that is before modified to its tangent-space counterpart, (i.e., before state = vstate_to_state(state))


Nonetheless, the state is predicted still in its total-state version as in here (Ekf::predictState(…)).

My questions are:

  1. Shouldn’t the state (of EKF of PX4-Autopilot) be defined as error-state for the exactitude?

  2. Accordingly, almost all models should be re-written in terms of error-state (perhaps?)

  • For instance, the prediction model

[from]: x_{k+1} = f(x_{k}) + w_{k} →
[to]: e_{k+1} = f_error(e_{k};x_{k}^{INS}) + w_{k}

where error-state (=x_k^true - x_k^estimate) is denoted in e_k. I guess f_error() is what “state_error_pred” of should produce when codegen-ed. Also, as widely known, f_error() exhibits more linearity than f() as stated in here. I used {INS} for an independent integrator, i.e., dead-reckoning.

  • measurement model

[from] y_{k} = h(x_k) + v_{k} →
[to] either (1) y_{k} = h(x_k^{INS} + e_k^{filter_state}) + v_{k} or (2) y_error{k} = h_error(e_k) + v’_{k}

  1. In summary, I don’t get the reason why state itself is not converted to its error-state form while covariance matrix is presented in error-state form and treated accordingly.
    The “mismatch” I wrote in the topic title means this: total-state form implementation of state and its propagation vs. error-state form implementation of covariance and all related uncertainty matrices, gains, etc. Current covariance matrix P seems not (exactly) presenting the uncertainty characteristic of StateSample _state of Ekf:EstimatorInterface, or am I missing something?
1 Like

Hi, good to see someone interested in the derivation.
I’ve seen that on the internet that there is a lot of confusion with error-state Kalman Filter and that good and clear resources are not easy to find. (I added 2 of them below)

There is actually no difference on the state prediction part of an error-state EKF vs a total state EKF.
In the error-state version we often say that we:

  1. start with zero error: dx = 0
  2. predict the total state using the IMU: x = f(x, u)
  3. estimate the error: dx = K*(y - h(x))
  4. and then do the “reset” step: x[k+1] = x[k] + dx ; dx = 0

However, this is exactly the same as predicting and correcting the total state:

  1. predict the total state using the IMU: x = f(x, u)
  2. correct the total state x[k+1] = x[k] - K*(y - h(x))

The real difference lies on how the jacocbians are derived. For the covariance prediction, we use the fact that the covariance of the state evolves as the covariance of the error. Since the error is always small, we can do a lot of simplifications and the covariance prediction is -almost- invariant from the state.

For the measurement update, the same nonlinear measurement model can be used and the only difference is that the jacobian needs to be taken with respect to the error state. To do this, one can use the chain rule as described in section 6.1 (h is function of x) or with other tricks (symforce uses an approximation of the derivative equation). Note that in SymForce, the function .jacobian(x) is by default on the tangent space (i.e.: w.r.t. the error state). One should use .jacobian(x, tangent_space=False) to compute the “usual” jacobian).

I would encourage you to read this really, really good article:
And another one that made me understand the beauty of Lie groups: (Here you’ll see in section V. A. that the error-state filter is just a total state filter where the jacobians are computed on the tangent space)

I hope this helped, good luck!


I appreciate your quick response.
Now I get the point thanks to the comparison and the key references.
Few years back, I actually read the articles you suggest. Yet, I kind of skimmed it :frowning: shame on me. I should carefully go through them again.

The chain rule really helped, and thank you for the notice about the API

which I was not aware of.

Thank you!

For those who may have similar question, I leave here a note.

Maybe my question and the misconception were derived from a combination of

  1. error dynamics equation of strapdown INS.
    e.g., Chapter 12.3 of the book “Strapdown Inertial Navigation Technology”: [R1]
    or Chapter 14.2 of the book “Principles of GNSS, Inertial, and Multi-sensor Integrated Navigation Systems (2nd)”: [R2]
    where error states are actually propagated using linearized equations.
  2. (error-state) particle filtering, where (for instance) N(0, R) is approximated using nonzero particles that is distributed across the state-space and propagate.

Regarding 1,
In the field of traditional GPS/INS fusion, I guess linearizing the equation
dx[k+1] = f(x[k]+dx[k], u[k]+du[k]) - f(x[k], u[k])
dx[k+1] ~= F dx[k] + G du[k]
was quite common [R1, R2]. They were developed analytically without aids of symbolic computation tools.
As far as I know, the filter tracks dx using the above (linearized) prediction model and is corrected by measurement equation (something like) dy[k] = [I_{3,3}, 0_{3, ~}]dx[k], which seems a little bit obsolete. Or I suppose it is the matter of scale difference of domains of dynamics model between MAVs and airplanes/missiles.

The point here is that, in those designs, there is an independent INS: x-tilde, and a filter that tracks dx, and the output of the whole system is a composition of both for the best estimate x-hat of the true state, i.e., x-hat = x-tilde (+) dx . A feedback structure where dx resets to 0 and the INS shifts by dx is mostly applied.

Regarding 1+2,
when it comes to implement INS/XXX fusion using error-state particle filter, each particle dx is nonzero and propagates (simulated) even if to represent a zero-mean distribution. The whole collection of particles and associated weights is equivalent to a distribution.
Individual (error-state) particle must propagates to nonzero values (unlike dx=0 of (E)KF setup), and the measurement update step.

should become somewhat like reweighting i-th particle dx[k+1]^(i) (nonzero, and propagated from k to k+1) using residual y[k+1]-h(x[k+1]+dx[k+1]^(i)) to represent N(0, (I-KH)P). Take this study as a collective illustration, though not directly applied to this matter.