Google AI Introduces TensorStore, a High-Performance Open-Source Library for Array Storage

MMS Founder
MMS Daniel Dominguez

Article originally posted on InfoQ. Visit InfoQ

Google has introduced TensorStore, an open-source C++ and Python framework intended to speed up the design for reading and writing large multidimensional arrays.

Modern computer science and machine learning applications frequently make use of multidimensional datasets that span a single wide coordinate system. Such datasets are difficult to work with because users may receive and write data at unpredictable intervals and different scales, and they frequently want to do studies utilizing many workstations working in parallel.

In order to avoid problems with data storage and manipulation, Google Research created TensorStore, a library that gives users access to an API that can handle massive datasets without the need for powerful hardware. This library supports several storage systems like Google Cloud Storage, local and network filesystems, among others.

TensorStore offers a straightforward Python API that can be used to load and manipulate huge arrays of data. Due to the fact that no actual data is read or maintained in memory until the specific slice is needed, arbitrary large underlying datasets can be loaded and edited without having to store the entire dataset in memory.

The syntax for indexing and manipulation, which is much the same as that used for NumPy operations, makes this possible. TensorStore also supports virtual views, broadcasting, alignment, and other advanced indexing features such as data type conversion, downsampling, and lazily on-the-fly generated arrays.

TensorStore also has an asynchronous API that lets a read or write operation continue in the background. At the same time, a program completes other tasks and customizable in-memory caching which decreases slower storage system interactions for frequently accessed data.

Processing and analyzing large numerical datasets requires a lot of processing power. Typically, this is done by parallelizing activities among a large number of CPU or accelerator cores dispersed across multiple devices. A core objective of TensorStore has been to enable parallel processing of individual datasets while maintaining high performance.

TensorStore application cases include PaLM, Brain Mapping and other sophisticated large-scale machine learning models.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.