Kodsnack 567 - Arrow straight through, with Matt Topol and Lars Wikman

Kodsnack 567 - Arrow straight through, with Matt Topol and Lars Wikman

0 Calificaciones
0
Episodio
64 of 89
Duración
1H 23min
Idioma
Inglés
Formato
Categoría
No ficción

Fredrik has Matt Topol and Lars Wikman over for a deep and wide chat about Apache Arrow and many, many topics in the orbit of the language-independent columnar memory format for flat and hierarchical data. What does that even mean? What is the point? And why does Arrow only feel more and more interesting and useful the more you think about deeply integrating it into your systems?

Feeding data to systems fast enough is a problem which is focused on much less than it ought to be. With Arrow you can send data over the network, process it on the CPU - or GPU for that matter- and send it along to the database. All without parsing, transformation, or copies unless absolutely necessary.

Thank you Cloudnet for sponsoring our VPS!

Comments, questions or tips? We are @kodsnack, @tobiashieta, @oferlund and @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive.

If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi.

Links

Lars Matt Øredev • Matt’s Øredev presentations: State of the Apache Arrow ecosystem: How your project can leverage Arrow! • and Leveraging Apache Arrow for ML workflows Kallbadhuset Apache Arrow Lars talks about his Arrow rabbit hole in Regular programming SIMD/vectorization Spark Explorer • - builds on Polars Null bitmap Zeromq Airbyte Arrow flight Dremio Arrow flight SQL Influxdb Arrow flight RPC Kafka Pulsar Opentelemetry Arrow IPC format • - also known as Feather

ADBC • - Arrow database connectivity

ODBC • and JDBC Snowflake DBT • - SQL to SQL

Jinja Datafusion Ibis Substrait Meta’s Velox engine Arrow’s project management committee • (PMC)

Voltron data Matt’s Arrow book - In-memory analytics with Apache Arrow Rapids • and Cudf The Theseus engine • - accelerator-native distributed compute engine using Arrow

The composable codex The standards chapter Dremio Hugging face Apache Hop • - orchestration data scheduling thing

Directed acyclic graph UCX • - libraries for finding fast routes for data

Infiniband NUMA CUDA GRPC Foam bananas Turkish pepper - Tyrkisk peber Plopp Marianne

Titles

• For me, it started during the speaker’s dinner

• Old, dated, and Java

• A real nerd snipe

• Identical representation in memory

• Working on columns

• It’s already laid out that way

• Pass the memory, as is

• Null plus null is null

• A wild perk

• Arrow into the thing

• So many curly brackets you need to store

• Arrow straight through

• Something data people like to do

• So many backends

• The SQL string is for people

• I’m rude, and he’s polite

• Feed the data fast enough

• A depressing amount of JSON

• Arrow the whole way through

• These are the problems in data

• Reference the bytes as they are

• Boiling down to Arrow

• Data lakehouses

• Removing inefficiency


Escucha y lee

Descubre un mundo infinito de historias

  • Lee y escucha todo lo que quieras
  • Más de 1 millón de títulos
  • Títulos exclusivos + Storytel Originals
  • Precio regular: CLP 7,990 al mes
  • Cancela cuando quieras
Suscríbete ahora
Copy of Device Banner Block 894x1036 3
Cover for Kodsnack 567 - Arrow straight through, with Matt Topol and Lars Wikman

Otros podcasts que te pueden gustar...