Solving general MRF MAP problems on heterogeneous systems (BSc / MSc)

Our general-purpose MRF MAP solver “mapMAP” [1] solves large-scale, irregular shaped Markov Random Fields in short amounts of time by exploiting the massive parallelism available in today's hardware.

Originally designed with the GPU in mind [and implemented thereon], our current release uses SIMD units and multiple cores on the CPU.

The tree-decomposition scheme that mapMAP is based on decomposes the MRF graph in each iteration and solves each piece separately.

This natural “higher-level” parallelism enables us to make use of a relatively fresh area of computing: heterogeneous systems.

In this thesis, you will enbale mapMAP to do just that: using our CPU and your GPU code, you will dynamically distribute work between CPU and GPU and find optimal tradeoffs for large-scale MRF datasets.

mapmap Link


Evaluating the Optimization Potential of Compute-intensive Open Source Libraries (BSc / MSc)

Compute-intensive open source libraries such as OpenCV, PCL and Eigen are very popular and widely used in many applications. They provide hand-tuned algorithms for a wide range of computer vision, 3d geometry processing and linear algebra tasks. Unfortunately, tuning an algorithm based on the runtime on a single CPU can lead to suboptimal performance on different CPUs. Moreover, the optimal choice of optimizations to apply may also depend on the data set the algorithm operates on. Therefore it is very difficult to manually optimize these libraries for all use cases.

We suggest the use of auto-tuning to improve the situation. Using this technique, the library authors can implement the algorithm with potential optimization dimensions in mind. An auto-tuner can then be used to automatically generate many implementation variants. By profiling these variants on multiple CPUs and data sets we can automatically adapt our optimization choices to them.

The main goal of this thesis is to evaluate to potential speed-up gained by using this approach. This involves finding a set of interesting algorithms to optimize and potential optimizations to apply to them. Additionally, the algorithms need to be implemented in an existing auto-tuning system, which is able to find a very good implementation variant by profiling a small subset of all variants. Finally, the speed-ups compared to the original versions need to be evaluated on multiple CPUs and data sets. This evaluation is expected to provide insights into the following questions:

Does auto-tuning provide a significant speed-up compared to hand-tuned code?

How strong is the influence of the used CPU and data set on the runtime?

What optimizations are important across many different algorithms?

What optimizations are often missed by hand-tuned code?


Massively-parallel primal heuristics for (M)IP solvers

Globally optimal mixed integer programming (MIP) solvers such as Gurobi or CPLEX rely on primal heuristics to

i. create decent starting solutions

ii. create descent upper bounds on the objective for early bounding.

Some rely on LP solvers, but some especially popular heuristics are purely combinatorial such as genetic algorithms, local search or taboo search.

In this thesis, your goal is to implement a given set of heuristics for general-purpose (M)IPs on GPUs and compare yourself to those packages w.r.t solution quality and processing time. Ideally, this results in a software package, that can be added to academic solvers.


Live SfM on an UAV on Tegra (BSc / MSc)

Cheaply available drones with high-resolution cameras have been becoming quite popular recently. Despite all their downsides, they are excellent in flying over an area and quickly cartographing it using SLAM. This way, rough maps are created quickly for in- and outdoor navigation.

SfM (structure from motion) creates a point cloud from a set of images or a video. In this project, you will pair our drone with a NVIDIA tegra device and implement a live SfM on the drone's video stream.

Ideally, the resulting point cloud (its density and shape) will give the user some ideas of where he needs to fly next and take more images when reconstructing a larger (outdoor) scene.

The algorithm itself will be implemented in CUDA.


Static / dynamic QR-preconditioning on the GPU (BSc / MSc)

Solving hard least-square problems or normal equations in general using iterative [linear algebra] methods such as conjugate gradient requires good preconditioners to keep the specturm of the problem matrix under control and guarantee timely convergence.

Today, simple diagonal preconditioning or incomplete Cholesky factorizations are the state-of-the-art. The incomplete QR, which in theory offers some benefits, however has only received very little attention. In fact, only one wider-known software package is available [1] that does not live up to its expectations [2].

In this thesis, you will develop an alternative approach using fine- grained parallelization on static fill-in patterns on the GPU. You will learn how sparse factorizations work and -- hopefully -- help us provide the community with a mature software package.


Li, Na, and Yousef Saad. “MIQR: A multilevel incomplete QR preconditioner for large sparse least-squares problems.” SIAM journal on matrix analysis and applications 28.2 (2006): 524-550.

Gould, Nicholas, and Jennifer Scott. “The State-of-the-Art of Preconditioners for Sparse Linear Least-Squares Problems.” ACM Transactions on Mathematical Software (TOMS) 43.4 (2017): 36.


Triple-iterative matrix-free normal equation solvers on the GPU (BSc /MSc)

Large sparse linear systems of normal equations are solved by state- of-the-art iterative methods such as preconditioned conjugate gradient (PCG).

Quite often, preconditioners such as an incomplete Cholesky factorization are computed once and reused via triangular solves in every iteration. While the sparse matrix-vector multiplication inside PCG parallelizes naturally, the computation of the preconditioner itself relies on level-scheduling, whose success strongly depends on the input matrix pattern.

Recently, people investigated replacing these methods by purely iterative approaches -- the factorization [1] as well as the triangular solves [2]. All three combined result in an interesting method that almost fully relies on matrix multiplication. This should be especially interesting for verly large linear least squares problems.

In this thesis, you will combine the three methods (developing 2 building blocks yourself) purely on the GPU and compare the resulting performance to traditional approaches -- both in accuracy as well as speed.


Chow, Edmond, and Aftab Patel. “Fine-grained parallel incomplete LU factorization.” SIAM journal on Scientific Computing 37.2 (2015): C169- C193.

Anzt, Hartwig, Edmond Chow, and Jack Dongarra. “Iterative sparse triangular solves for preconditioning.” European Conference on Parallel Processing. Springer Berlin Heidelberg, 2015.


Image Correspondence Matching via Convolutional Neural Network (BSc / MSc)

Correspondence matching is one of important processes in image-based 3D reconstruction pipelines, and usually it is employed the structure of motion (Sfm) or the Multi-ViewStereo (MVS) to match the spatial relations (likes geometry, semantic matching and etc.) amongs the multiple images. Typically, for example, in the initial step of SfM, feature detectors and descriptors such as SIFT find distinctive points in the input images, but SIFT has some flaws that makes it impossible to find enough correspondence, i.e. sharking object, non-Lambertian surfaces, weakly textured surfaces, and so on.

Recently, there are a lot of works: [2] uses random forest to classify the SIFT feature vectors then predicting the image key-points and matching those to the image pairs. However, due to the limitations of SIFT feature representation, this approach cannot replace SfM or MVS in general performance. [3] contains a work to measure learning similarity by using convolutional neural network, [4] proposes a deep learning framework for accurate visual correspondences and demonstrate its effectiveness for both geometric and semantic matching, and [5] proves that the correspondence can be learned by convolutional neural network.

In this thesis, you will find the correspondence matching by using deep learning technologies and

overcome the defects of SfM/MVS. Our goal is that replacing the parts of correspondence matching in image-based 3D reconstruction pipelines.


[1] Furukawa, Yasutaka, and Carlos Hernández. “Multi-view stereo: A tutorial.” *Foundations and Trends® in Computer Graphics and Vision* 9.1-2 (2015): 1-148.

[2] Hartmann, Wilfried, Michal Havlena, and Konrad Schindler. “Predicting matchability.” *Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*. 2014.

[3] Zbontar, Jure, and Yann LeCun. “Stereo matching by training a convolutional neural network to compare image patches.” *Journal of Machine Learning Research* 17.1-32 (2016): 2.

[4] Choy, Christopher B., et al. “Universal correspondence network.” *Advances in Neural Information Processing Systems*. 2016.

[5] Long, Jonathan L., Ning Zhang, and Trevor Darrell. “Do convnets learn correspondence?.” *Advances in Neural Information Processing Systems*. 2014.


Multi-view 3D Models Completion with Generative Adversarial Networks (BSc / MSc)

[1] proposes a method to infer 3D representation of object from a single image. it fushes into a full 3D point cloud based on the single image which is generated the unseen views and their depth maps, then furhter optimizes the 3D point cloud to obtain mesh.

[3] recently has incredible performance on semi-supervised and unsupervised image generation. Given images and noise, it can generate a seemingly reasonable image, such as face generation, scene generation, digital generation and so on. We hope that using GANs to improve the previously unseen view prediction.

In this thesis, you will imporve [1] method which focus on the more applied dataset and unseen view prediction via GANs but it is not limited to other parts, and compare the resulting performance with [1] and existing work.


[1] Tatarchenko M., Dosovitskiy A., Brox T. (2016) Multi-view 3D Models from Single Images with a Convolutional Network. In: Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol 9911. Springer, Cham

[2] Code from [1],

[3] Goodfellow, Ian, et al. “Generative adversarial nets.” *Advances in neural information processing systems*. 2014.