Google Summer of Code 2020:
Implement Mesh R-CNN in TensorFlow Graphics

Project Summary

GSoC Projecet Page

During my Google Summer of Code experience, I implemented the main components of Mesh R-CNN and a TensorFlow Datasets Provider for Pix3D for TensorFlow Graphics. It was a great experience and I have learned a lot during the three months, not least because of the good support from my mentors. The following lines contain an overview of my achievements together with links to all work that I have submitted to the TensorFlow Graphics GitHub Repository.

Objectives
  • Pix3D Dataset feature connectors (PR #386)
  • Pix3D Dataset implementation (PR #410)
  • Mesh R-CNN Core ops: Cubify and VertAlign (PR #437)
  • Mesh R-CNN Network heads: Mesh Refinement (PR #441)
  • Mesh R-CNN Loss functions: Mesh Losses (PR #449)
  • Mesh R-CNN Network heads: Voxel branch (PR #450)
  • Mesh R-CNN 3D Prediction integration (PR #454)

Note: Since the project was very extensive, I had to limit myself to implementing the main components of Mesh R-CNN. In consultation with my mentors, we decided to focus on high-quality, well tested implementations of the core ops of the 3D part of Mesh R-CNN rather than to set up an end-to-end pipeline with a 2D backbone.
Main Challenges

The following list contains some of the main challenges I faced during the summer:

  1. Pix3D Dataset FeatureConnectors: The encoding schema for TensorFlow Datasets FeatureConnectors only supports one unknown dimension per feature. This is a problem for datasets containing multiple objects per image, which would usually store a list of all objects with corresponding labels together with one image. Nesting multiple objects to a list with unknown length (like the number of objects per image, which could vary for different images), forces the TFDS encoding to introduce additional dimensions of unknown size. As Pix3D has only one object per image, this is not particulary a problem, but most of the 2D backbones in e.g. the TensorFlow Object Detection API assume the provided data in a nested structure, as e.g. defined by the COCO Dataset Provider. Thus, these models could not be used out of the box with the Pix3D Dataset provider.
  2. Erros in Pix3D Dataset: While exploring the Pix3D dataset and preparing the TFDS DatasetBuilder, we found that two samples of Pix3D are flawed. More precisely, two of the segmentation masks are of wrongly rendered. I raised an Issue in the official Pix3D GitHub Repository and hopefully, the creators of Pix3D will fix this. Removing those samples from the new DatasetBuilder fixed the issue and makes the dataset usable. However, we still wanted to give users the ability to use the full dataset. Thus, we rendered the two wrong masks with the correct parameters(See here) and provide a documentation on how users could integrate those in their dataset in the DatasetInfo specification.
  3. Working with meshes: Using meshes in an deep learning approach comes with many difficulties, as meshes representing different object may have different topologies and thus, a different number of vertices and faces each. One way to mitigate this is to pad vertex and face tensors to the same length and pass them through the ops. However, simply padding the meshes is not enough, since especially face and edge lists contain indices into the vertex list. Adding padding to these lists could lead to degenerate meshes. To mitigate this, I implemented a Meshes class, which supports transformations of meshes into different representations while keeping track of all padded elements. This allows to use meshes of different size in a batch. The class also contains some additional functionality to efficiently extract vertex adjacencies from the batch of meshes. For more information have a look at the implementation.
  4. High quality implementations: Writing cumputational efficient, generalizable (in the means of supporting arbitrary batch dimension) and well tested code can be very challenging. Especially the implementation of cubify, which converts voxel grid occupancy probabilities to triangle meshes and the Mesh R-CNN mesh loss functions were very challenging. The Meshes class simplified a lot of those conversions between different 3D data representations. But I definately learned a lot during the discussions with my mentors.
Future Work

Although the Google Summer of Code 2020 is officially over, I would love to continue working on this project. There are a few things that are still left to do, like the connection to a 2D backbone and running some experiments on the Pix3D dataset. Additionally, I will continue to address all comments in the pull requests and provide minor refactorings in the Meshes class, as I think this would be a really cool feature to have in the core TF Graphics repository.

Special thanks

My Google Summer of Code Experience was great and a large part of this great experience was the good mentoring of Avneesh Sud, Abhijit Kundu and last but not least, Paige Bailey, who made me aware of the program and encouraged me to apply.