Operational AI WIKI

All 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Apache Arrow

Apache Arrow is a software development framework designed to improve the performance of analytical processing and the efficiency of moving data from one system or programming language to another. Apache Arrow’s in-memory columnar format is a standardized, language-agnostic specification for representing structured, table-like datasets in-memory.

 

Arrow’s libraries implement the format and provide building blocks for a range of use cases focused on high-performance analytics. Many popular projects use Arrow to ship columnar data efficiently or as the basis for analytic engines.

 

Apache Arrow is an open source project licensed under Apache License 2.0. Notable contributors to Apache Arrow include Dremio, Voltron Data, InfluxData, DataStax, Cloudera, MapR, Anyscale, and others.

Chart featuring systems that use or support Apache Arrow

Without a standard columnar data format, every database and language has to implement its own internal data format. This generates a lot of waste. https://arrow.apache.org/overview/

Chart featuring systems that use or support Apache Arrow

Systems that use or support Arrow can transfer data between them at little-to-no cost. They don’t need to implement custom connectors for every other system, and a standardized memory format facilitates reuse of libraries of algorithms. https://arrow.apache.org/overview/

 

At Molecula

Traditionally, there have been two main types of database structures: row-oriented and column-oriented. Each system has pros and cons, and the appropriate format will depend on the specifications of any given project. Arrow is an in-memory application of the column-oriented format and is typically used for large analytical workloads.

 

Molecula has developed a database platform that is neither row-oriented nor column-oriented. Molecula’s feature-oriented database, FeatureBase, has been shown to perform better on datasets with massive, real-time, extremely complex analytical workloads. Molecula’s FeatureBase is particularly useful for ML and AI workloads due to the nature of their size, speed, volume, and real-time data requirements. To learn more, see FeatureBase.

 

Related Terms

Column-oriented database

Row-oriented database

Feature-oriented database

Apache

Protobuf

Parquet

FlatBuffers

 

Learn More About Apache Arrow 

Apache Arrow website

Wikipedia entry: Apache Arrow

 

Other Occurrences

WIKI Contribution

  • Have a new word to add?
  • Have an updated definition?
Become a contributor