No ficción
"Apache DataFusion: Building Custom Analytics Engines in Rust"
This book is for experienced Rust developers and data infrastructure engineers who want to build fast, embeddable analytics systems—without reinventing a query engine from scratch. Using Apache DataFusion as the core, it shows how to turn Arrow’s zero-copy columnar memory model into production-grade pipelines, and how to make deliberate architectural choices around extensibility, isolation, and predictable performance in real services.
You’ll learn how DataFusion’s SQL/DataFrame front-ends map into logical plans and expressions, where semantic analysis and type coercion boundaries belong, and how to shape and optimize plans safely. The book then goes deeper into physical planning and the ExecutionPlan contract: partitioning and ordering requirements, streaming execution semantics, and physical optimization techniques that reduce data movement. You’ll also implement the key extension points—TableProviders and catalogs, object-store and file-format I/O, Arrow-native UDF/UDAF/UDWFs, and custom physical operators—while avoiding the common “wrong answer” and performance pitfalls.
Prerequisites include strong Rust fluency, comfort with async/concurrency, and basic familiarity with relational query processing. The emphasis is on engine-authoring workflows: capability contracts, cost and correctness trade-offs, and operational readiness through metrics, spilling, concurrency control, and rigorous testing and diagnostics.
© 2026 NobleTrex Press (Ebook): 6610001179229
Fecha de lanzamiento
Ebook: 9 de marzo de 2026