China’s latest AI research presents ‘OMMO’: a large-scale outdoor multimodal dataset and benchmark for novel view synthesis and implicit scene reconstruction

Novel photorealistic view synthesis and high-fidelity surface reconstruction have been made possible by recent developments in implicit brain representations. Unfortunately, most of the approaches used now focus on a single element or interior scene, and when used in exterior situations, their synthesis performance could be better. Today’s outdoor scene data sets are created at a modest geographic scale by rendering virtual scenes or collecting basic scenes with few items. The absence of standard benchmarks and large-scale outdoor scene data sets makes it impossible to assess the performance of certain fairly modern approaches, even though they are well designed for large scenes and attempt to address this issue.

Scene photographs of reconstructed or virtual scenes, which differ from the genuine scene in elements of texture and appearance, are included in the BlendedMVS and UrbanScene3D collections. Gathering images from the Internet can create incredibly efficient data sets like ImageNet and COCO. Still, these techniques are not suitable for NeRF-based job evaluation due to the constantly changing objects and lighting conditions in the scene. The standard for realistic outdoor landscapes taken by a high-precision industrial laser scanner, for example, is provided by Tanks and Temples. However, its scene scale is still too small (463 m2 on average) and only concentrates on a single object or external structure.

Source: https://arxiv.org/pdf/2301.06782.pdf

An illustration of a city scene from our dataset, taken with a circle-shaped camera path in low lighting. We show the camera trail, written explanations of the scene, and calibrated photos with multiple views. Our dataset can deliver high-fidelity, realistic texture details; some features in colored boxes are magnified to show this.

Its approach to collecting data is comparable to drone mega-use NeRFs for recording broad real-world scenarios. However, Mega-NeRF only offers two repeatable scenarios, preventing it from serving as a generally accepted baseline. Therefore, large-scale NeRF research for outdoor environments needs to catch up with individual items or indoor scenes as, to their knowledge, no standard and well-recognized large-scale scene dataset has been developed for NeRF benchmarking. They present a multi-modal aerial view dataset carefully chosen to address the paucity of large-scale real-world outdoor scene datasets. As seen in the figure above, the dataset consists of 33 scenes with quick annotations, tags, and calibrated 14K photos. Unlike the existing approaches mentioned above, its scenes come from various sources, including ones we’ve acquired from the internet and ourselves.

In addition to being complete and representative, collection indications include a variety of scene types, scene sizes, camera paths, lighting conditions, and multimodal data that must be contained in previous data sets. They also provide comprehensive dataset-based benchmarks for innovative view synthesis, scene renderings, and multimodal synthesis to assess the suitability and performance of the generated dataset for evaluating standard NeRF approaches. More importantly, they offer a general process for producing real-world NeRF-based data from online drone videos, making it easy for the community to expand their dataset. To offer a detailed evaluation of each approach, they also include several secondary benchmarks specific to each of the aforementioned tasks based on various scene types, scene sizes, camera paths, and lighting conditions.

In summary, his main contributions are the following:

• To promote large-scale NeRF research, they present an outdoor scene dataset with multimodal data that is more abundant and diverse than any comparable outdoor dataset currently available.

• Provide several reference mappings for popular outdoor NeRF approaches to establish a unified reference standard. Extensive tests show that its data set can support typical NeRF-based tasks and provide quick annotations for the next investigation.

• To make their dataset easily scalable, they offer a low-cost pipeline for converting freely downloadable Internet movies into training data for NeRF.


review the Paper Y project page. All credit for this research goes to the researchers of this project. Also, don’t forget to join our reddit page, discord channel, Y electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.


Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.


Leave a Reply

Your email address will not be published. Required fields are marked *