Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kalman filter prototype #75

Merged
merged 20 commits into from
Feb 19, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 77 additions & 28 deletions include/chilitags.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -363,10 +363,9 @@ typedef std::map<std::string, TransformMatrix> TagPoseMap;

To first detect th tags in the image, Chilitags3D creates a Chilitags
instance, which can be accessed through the getChilitags() accessors. This
Chilitags instance is set to have a persistence of 0, but Chilitags3D sets
a default persistence of 5 frames for poses it estimates. It also sets a
gain of .5 to avoid the detected objects to "shake". Please refer to the
documentation of Chilitags3D::setFilter() for more detail.
Chilitags instance is set to have a persistence of 0, because Chilitags3D
uses a more advanced Kalman filter. See enableFilter() and setPersistence()
for more details.

You can also create yourself a separate instance of Chilitagsfor the 2D
detection of tags and use it by calling
Expand All @@ -384,28 +383,6 @@ typedef std::map<std::string, TransformMatrix> TagPoseMap;
*/
Chilitags3D_(cv::Size cameraResolution = cv::Size(640, 480));

/**
Parameters to paliate with the imperfections of the detection.

\param persistence the number of frames in which a 3d object, i.e. a tag or
a rigid object containing several tags, should be absent before being
removed from the output of estimate().

\param gain a value between 0 and 1 corresponding to the weight of the
previous (filtered) position in the new filtered position. 0 means that the
latest position of the object is returned.

Chilitags3D manages persistence separately from the 2D detection of
Chilitags, because an object can be composed of several tags. If one or
more of these tags are not detected, the pose of the object is estimated
with the remaining, detected tags. In this case, it would hurt the pose
estimation to make the individual tags persistent. Chilitags3D thus sets
the 2D detection to have no persistence, and applies its persistence
mechanism only after the estimation of the pose. The same goes for the
position filtering. Note that for independent tags, it should not matter.
*/
void setFilter(int persistence, float gain);

/** Accessor to the underlying (2D) Chilitags detection. */
const Chilitags &getChilitags() const;

Expand All @@ -424,8 +401,18 @@ Chilitags &getChilitags();
0 , 0 , 0 , 1 }
\endverbatim
\param tags a list of tags, as returned by Chilitags::find().

\param camDeltaR Rotation from the previous camera frame to
the current camera frame, i.e rotation of the current camera frame in the
last camera frame. Quaternion format (scalar, vx, vy, vz).

\param camDeltaX Translation from the previous camera frame
to the current camera frame, i.e position of the current camera frame in
the last camera frame.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have the same 4x4 transform matrix that we use in the rest of the API, and convert it to quaternions behind the scene.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the only interaction with this now is qml-imu, and this would only incur vec+quat -> mat4x4 and mat4x4 -> vec+quat conversion for a user API benefit for the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the only interaction with this ever would be things like IMU, visual odometry, depth sensor odometry etc. At no point would the user ever enter values by hand there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only interaction with this now is qml-imu

you're not exactly giving other users a chance of being represented here ;)

At no point would the user ever enter values by hand there.

How about a scenario where users would film a rotating object for example (e.g. a 3D scanner) ? Or tests actually ;) Where you would have inputs simulating camera movement.

Even if the user does not set values by hand, the question is whether we're 100% sure that all these IMU/odometries will have quaternions as output. If not it can be quite confusing to have a mixed quaternion/4x4 interface... Is the conversion of one matrix per frame so expensive ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm being quite realistic here in saying that I'll be the one and only one developing those things (IMU, visual odometry, depth odometry) in the following year :)

If it becomes very problematic and "users" start to complain (I won't :) we can make it a mat4x4 in the future. Otherwise, I don't want to incur any unnecessary computation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool stuff :))))
so many good memories.

2014-12-19 13:58 GMT+01:00 Ayberk Özgür [email protected]:

In include/chilitags.hpp
#75 (diff)
:

@@ -424,8 +383,18 @@ Chilitags &getChilitags();
0 , 0 , 0 , 1 }
\endverbatim
\param tags a list of tags, as returned by Chilitags::find().
+

  • \param camDeltaR Rotation from the previous camera frame to
  • the current camera frame, i.e rotation of the current camera frame in the
  • last camera frame. Quaternion format (scalar, vx, vy, vz).
  • \param camDeltaX Translation from the previous camera frame
  • to the current camera frame, i.e position of the current camera frame in
  • the last camera frame.
    */

IMU works with quaternions because it comes directly from the state. There
is no 4x4 transform matrix at any stage.


Reply to this email directly or view it on GitHub
https://github.com/chili-epfl/chilitags/pull/75/files#r22103161.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about looking nice, it's about being coherent. If you use chilitags to measure the move of one camera for example, this output is not usable as input.
And no, you're not the only user of chilitags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to mention the loss of precision due to sqrt and divisions when converting quat to 3x3 rot matrix back and forth...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My wish is to choose the option with most benefits:

  1. Mat4x4 that is coherent
  2. Vec3 + quat that plugs very nicely into IMU (and most possibly any other odometry estimator) that uses its own KF that dictates quat in its state vector; and that bypasses quat-rotmat conversion (and conversion back)

If someone else comes along and complains that that input would be better suited to their application as a mat4x4, I'll be happy to change it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use chilitags to measure the move of one camera for example, this output is not usable as input.

I tried that by the way in https://github.com/chili-epfl/cellulo, either I failed to do it properly or it's not suitable for a handheld camera. This remark was also made by Stéphane Magnenat of Thymio who did it with feature tracking on image tags. The solution to this is visual/depth odometry, partially inertial odometry or any kind of solid odometry.

TagPoseMap estimate(const TagCornerMap & tags);
TagPoseMap estimate(const TagCornerMap & tags,
cv::Vec<RealT, 4> const& camDeltaR = cv::Vec<RealT, 4>(1,0,0,0),
cv::Vec<RealT, 3> const& camDeltaX = cv::Vec<RealT, 3>(0,0,0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens if the camera moves and but we don't know how ? Only a degradation of the filtering ? (that's a candid question)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "not knowing how" do you mean that we don't know the source? Or how much the camera moved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how the camera moves. I think you actually answer that in the documentation of the Filter, (which I saw later), but maybe we can reassure users here that if they don't know camDeltaR and camDeltaX, it's fine.


/**
This is a convenience variant of estimate() which also takes care of the
Expand All @@ -449,10 +436,20 @@ TagPoseMap estimate(const TagCornerMap & tags);
best return tags previously found; it won't find new ones, but
can lose some. See Chilitags::DetectionTrigger for a description of the
possible values.

\param camDeltaR Rotation from the previous camera frame to
the current camera frame, i.e rotation of the current camera frame in the
last camera frame. Quaternion format (scalar, vx, vy, vz).

\param camDeltaX Translation from the previous camera frame
to the current camera frame, i.e position of the current camera frame in
the last camera frame.
*/
TagPoseMap estimate(
const cv::Mat &inputImage,
Chilitags::DetectionTrigger detectionTrigger = Chilitags::DETECT_ONLY);
Chilitags::DetectionTrigger detectionTrigger = Chilitags::DETECT_ONLY,
cv::Vec<RealT, 4> const& camDeltaR = cv::Vec<RealT, 4>(1,0,0,0),
cv::Vec<RealT, 3> const& camDeltaX = cv::Vec<RealT, 3>(0,0,0));

/**
Chilitags3D can also detect rigid assemblies of tags. This allows for a
Expand Down Expand Up @@ -494,6 +491,58 @@ bool readTagConfiguration(
*/
void setDefaultTagSize(RealT defaultSize);

/**
* @brief Enables/disables Kalman filtering on tag pose (enabled by default)
*
* @param enabled Whether to enable Kalman filtering
*/
void enableFilter(bool enabled);

/**
* @brief Sets the persistence of tags against being discarded when not
* observed (10 by default)
*
* @param persistence Persistence value, roughly correponds to number of frames
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roughly ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than "roughly corresponds to", "on the order of" would be a more correct statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You accidentally linked the Chinese version of wikipedia I think ;) By in the order of, proportional ? Or order of magnitude.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like persistence = c*number_of_frames where c is a "small" constant, like 3-4 at most.

*/
void setPersistence(RealT persistence);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the default (10 if I get it right, but it would be nice to document it here and in the constructor)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


/**
* @brief Sets the process noise covariance matrix a.k.a Q for the Kalman filter
*
* The state is described by (x,y,z,qw,qx,qy,qz) where x,y,z is the tag
* position and qw,qx,qy,qz is the tag orientation in quaternion
* representation. Therefore, the input matrix should correspond to
* cov((x,y,z,qw,qx,qy,qz)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, ideally the rotation noise would be given coherently to the rest of the API, i.e. a conversion to quaternion happens behind the scene.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no conversion of rotation noise to quaternion?

*
* When IMU data is present, the process moves the tag according to camera
* movement. When IMU data is not present, the process tries to keep the tag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMU data are the two matrieces that are default parameters of estimate right ? I don't think we can assume everyone know about IMU, so a slower explanation could be nice (imagine you're explaining me ;))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMU data is completely external to chilitags; it is the orientation/linear motion estimate by inertial sensors such as accelerometer, gyroscope or magnetometer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see what you're talking about by the default parameters of estimate now. Yes, that's correct. But there's detailed explanation of this here: https://github.com/chili-epfl/chilitags/pull/75/files#diff-2e8946adbafc893038750d7bc1b975e8R26. We can add a "see Filter3D.hpp documentation" note.

* still. Smaller values in the process covariance matrix causes the tags to
* resist motion more. By default, this matrix has
* 1e-3, 1e-3, 1e-3, 1e-4, 1e-4, 1e-4, 1e-4
* on its diagonal.
*
* @param covariance 7x7 covariance matrix
*/
void setFilterProcessNoiseCovariance(cv::Mat const& covariance);

/**
* @brief Sets the observation noise covariance matrix a.k.a R for the Kalman filter
*
* The observation (done by image processing on the camera image) is described
* by (x,y,z,qw,qx,qy,qz) where x,y,z is the tag position and qw,qx,qy,qz is
* the tag orientation in quaternion representation. Therefore, the input
* matrix should correspond to cov((x,y,z,qw,qx,qy,qz)).
*
* Larger values in the observation noise covariance matrix indicates noisier
* measurements and causes observations to affect the state less, making tags
* more resistant to motion. By default, this matrix has
* 1e-3, 1e-3, 1e-1, 1e-3, 1e-2, 1e-2, 1e-5
* on its diagonal.
*
* @param covariance 7x7 covariance matrix
*/
void setFilterObservationNoiseCovariance(cv::Mat const& covariance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, candid question: may the users want to set Q without R (or vice versa), or can we merge the two methods into a setFilterNoiseCovariance(Q,R) ? At least the documentation is partially common.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial plan, but I scrapped it. The reason is that observation noise/process noise in some cases might be desired to set at every frame before calling estimate, independently of one another.


/**
For accurate results, Chilitags3D can be provided the calibration data of
the camera detecting the chilitags. See
Expand Down
209 changes: 209 additions & 0 deletions samples/3dfiltering/filter3d-gui.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
/*******************************************************************************
* Copyright 2013-2014 EPFL *
* Copyright 2013-2014 Quentin Bonnard *
* *
* This file is part of chilitags. *
* *
* Chilitags is free software: you can redistribute it and/or modify *
* it under the terms of the Lesser GNU General Public License as *
* published by the Free Software Foundation, either version 3 of the *
* License, or (at your option) any later version. *
* *
* Chilitags is distributed in the hope that it will be useful, *
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
* GNU Lesser General Public License for more details. *
* *
* You should have received a copy of the GNU Lesser General Public License *
* along with Chilitags. If not, see <http://www.gnu.org/licenses/>. *
*******************************************************************************/

#include <iostream>

#include <chilitags.hpp>

#include <opencv2/core/core.hpp> // for cv::Mat
#include <opencv2/core/core_c.h> // CV_AA
#include <opencv2/highgui/highgui.hpp> // for camera capture
#include <opencv2/imgproc/imgproc.hpp> // for camera capture

using namespace std;
using namespace cv;

int main(int argc, char* argv[])
{
cout
<< "Usage: "<< argv[0]
<< " [-c <tag configuration (YAML)>] [-i <camera calibration (YAML)>]\n";

const char* intrinsicsFilename = 0;
const char* configFilename = 0;

for( int i = 1; i < argc; i++ )
{
if( strcmp(argv[i], "-c") == 0 )
configFilename = argv[++i];
else if( strcmp(argv[i], "-i") == 0 )
intrinsicsFilename = argv[++i];
}

/*****************************/
/* Init camera capture */
/*****************************/
int cameraIndex = 0;
cv::VideoCapture capture(cameraIndex);
if (!capture.isOpened())
{
cerr << "Unable to initialise video capture.\n";
return 1;
}

/******************************/
/* Setting up pose estimation */
/******************************/
#ifdef OPENCV3
double inputWidth = capture.get(cv::CAP_PROP_FRAME_WIDTH);
double inputHeight = capture.get(cv::CAP_PROP_FRAME_HEIGHT);
#else
double inputWidth = capture.get(CV_CAP_PROP_FRAME_WIDTH);
double inputHeight = capture.get(CV_CAP_PROP_FRAME_HEIGHT);
#endif

chilitags::Chilitags3D chilitags3D(Size(inputWidth, inputHeight));

if (configFilename) chilitags3D.readTagConfiguration(configFilename);

if (intrinsicsFilename) {
Size calibratedImageSize = chilitags3D.readCalibration(intrinsicsFilename);
#ifdef OPENCV3
capture.set(cv::CAP_PROP_FRAME_WIDTH, calibratedImageSize.width);
capture.set(cv::CAP_PROP_FRAME_HEIGHT, calibratedImageSize.height);
#else
capture.set(CV_CAP_PROP_FRAME_WIDTH, calibratedImageSize.width);
capture.set(CV_CAP_PROP_FRAME_HEIGHT, calibratedImageSize.height);
#endif
}

cv::Mat projectionMat = cv::Mat::zeros(4,4,CV_64F);
chilitags3D.getCameraMatrix().copyTo(projectionMat(cv::Rect(0,0,3,3)));
cv::Matx44d projection = projectionMat;
projection(3,2) = 1;

/*****************************/
/* Go! */
/*****************************/
cv::namedWindow("Pose Estimation");

char keyPressed;
bool filterEnabled = true;

cv::Mat Q = (cv::Mat_<float>(7,7) <<
1e-3f, 0, 0, 0, 0, 0, 0,
0, 1e-3f, 0, 0, 0, 0, 0,
0, 0, 1e-3f, 0, 0, 0, 0,
0, 0, 0, 1e-4f, 0, 0, 0,
0, 0, 0, 0, 1e-4f, 0, 0,
0, 0, 0, 0, 0, 1e-4f, 0,
0, 0, 0, 0, 0, 0, 1e-4f);
float alphaQ = 1.0f;

cv::Mat R = (cv::Mat_<float>(7,7) <<
1e-3f, 0, 0, 0, 0, 0, 0,
0, 1e-3f, 0, 0, 0, 0, 0,
0, 0, 1e-1f, 0, 0, 0, 0,
0, 0, 0, 1e-3f, 0, 0, 0,
0, 0, 0, 0, 1e-2f, 0, 0,
0, 0, 0, 0, 0, 1e-2f, 0,
0, 0, 0, 0, 0, 0, 1e-5f);
float alphaR = 1.0f;

while ('q' != (keyPressed = (char) cv::waitKey(1))) {

switch(keyPressed){
case 'f':
filterEnabled = !filterEnabled;
chilitags3D.enableFilter(filterEnabled);
break;
case 'w':
alphaQ *= 10.0f;
chilitags3D.setFilterProcessNoiseCovariance(alphaQ*Q);
break;
case 's':
alphaQ /= 10.0f;
chilitags3D.setFilterProcessNoiseCovariance(alphaQ*Q);
break;
case 'e':
alphaR *= 10.0f;
chilitags3D.setFilterObservationNoiseCovariance(alphaR*R);
break;
case 'd':
alphaR /= 10.0f;
chilitags3D.setFilterObservationNoiseCovariance(alphaR*R);
break;
}

cv::Mat inputImage;
capture.read(inputImage);
cv::Mat outputImage = inputImage.clone();

cv::putText(outputImage, cv::format("Filtering %s, press 'f' to toggle",filterEnabled ? "ENABLED" : "DISABLED"),
cv::Point(8,20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255));

cv::putText(outputImage, cv::format("Process covariance multiplier: %f, press 'w' or 's' to modify", alphaQ),
cv::Point(8,36), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255));

cv::putText(outputImage, cv::format("Observation covariance multiplier: %f, press 'e' or 'd' to modify", alphaR),
cv::Point(8,52), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255));

for (auto& kv : chilitags3D.estimate(inputImage, chilitags::Chilitags::DETECT_PERIODICALLY)) {

static const float DEFAULT_SIZE = 20.f;
static const cv::Vec4d UNITS[4] {
{0.f, 0.f, 0.f, 1.f},
{DEFAULT_SIZE, 0.f, 0.f, 1.f},
{0.f, DEFAULT_SIZE, 0.f, 1.f},
{0.f, 0.f, DEFAULT_SIZE, 1.f},
};

cv::Matx44d transformation = kv.second;
cv::Vec4f referential[4] = {
projection*transformation*UNITS[0],
projection*transformation*UNITS[1],
projection*transformation*UNITS[2],
projection*transformation*UNITS[3],
};

std::vector<cv::Point2f> t2DPoints;
for (auto homogenousPoint : referential)
t2DPoints.push_back(cv::Point2f(
homogenousPoint[0]/homogenousPoint[3],
homogenousPoint[1]/homogenousPoint[3]));

static const int SHIFT = 16;
static const float PRECISION = 1<<SHIFT;
static const std::string AXIS_NAMES[3] = { "x", "y", "z" };
static const cv::Scalar AXIS_COLORS[3] = {
{0,0,255},{0,255,0},{255,0,0},
};
for (int i : {1,2,3}) {
cv::line(
outputImage,
PRECISION*t2DPoints[0],
PRECISION*t2DPoints[i],
AXIS_COLORS[i-1],
#ifdef OPENCV3
2, cv::LINE_AA, SHIFT);
#else
2, CV_AA, SHIFT);
#endif
}
}
cv::imshow("Pose Estimation", outputImage);
}

cv::destroyWindow("Pose Estimation");
capture.release();

return 0;
}

4 changes: 4 additions & 0 deletions samples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,7 @@ if(WITH_PTHREADS)
target_link_libraries( async-detection ${OpenCV_LIBS} )
endif()

add_executable( filter3d-gui 3dfiltering/filter3d-gui.cpp)
target_link_libraries( filter3d-gui chilitags )
target_link_libraries( filter3d-gui ${OpenCV_LIBS} )

11 changes: 4 additions & 7 deletions src/Chilitags.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ Impl() :
mDetect(),
mTrack(),

mCallsBeforeDetection(15),
mCallsBeforeNextDetection(0)
mCallsBeforeNextDetection(0),
mCallsBeforeDetection(15)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about that, how about switching to the

: member1()
, member2()
, member3()

style ? I mean "what do you think of it?", not "do it!" ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a pain in the ass for my autoindenter :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but vim's ddp would love it so much ;)
whatever, it was just to chat ;)

{
setPerformance(FAST);
}
Expand Down Expand Up @@ -260,7 +260,7 @@ EnsureGreyscale mEnsureGreyscale;

Decode mDecode;

Filter<int, Quad> mFilter;
Filter mFilter;

Detect mDetect;

Expand Down Expand Up @@ -296,10 +296,7 @@ void Chilitags::setMinInputWidth(int minWidth) {
mImpl->setMinInputWidth(minWidth);
}

TagCornerMap Chilitags::find(
const cv::Mat &inputImage,
DetectionTrigger trigger
) {
TagCornerMap Chilitags::find(const cv::Mat &inputImage, DetectionTrigger trigger) {
return mImpl->find(inputImage, trigger);
}

Expand Down
Loading