🚜 Query.Farm Update 2025-07-18
Progress at Query.Farm continues. This week has been focused heavily on advancing the Airport extension. Here's what's been accomplished:
Python Ecosystem Expansion
Two new Python modules have been released to support the Query.Farm ecosystem:
query-farm-sql-manipulation - A Python library for SQL predicate manipulation using SQLGlot. This library provides tools to safely remove specific predicates from SQL WHERE clauses and filter SQL statements based on column availability.
query-farm-sql-scan-planning - A Python library for file filtering using SQL expressions and metadata-based scan planning. This library enables efficient data lake query optimization by determining which files need to be scanned based on their statistical metadata.
Airport Testing Infrastructure
A comprehensive Airport test server has been developed in Python and integrated into the build process to test all Airport functionality. The server is deployed on Google Cloud using Google Cloud Functions for scalability and reliability.
Initially, the test server kept data in memory, but concurrent GitHub builds created scaling challenges. To address this, new serialization code has been implemented that persists data to disk, enabling proper handling of concurrent requests.
Developers working with Airport will benefit from using this test server, as it powers all Airport testing workflows.
Continuous Integration Improvements
The v1.3 branch experienced Windows build issues that have been resolved. The root cause was forward declarations of C++ namespaces causing linking problems on Windows platforms. The v1.3 branch now builds cleanly across all supported platforms.
The main branch currently has build issues due to recent Arrow functionality changes in DuckDB, but resolution will happen soon.
Telemetry Infrastructure Creation
To better understand Airport adoption patterns, telemetry has been implemented within the extension. When the extension loads, minimal tracking information is transmitted to a Query.Farm server. The collected data includes:
extension_name
airport_version
airport_user_agent
duckdb_platform
duckdb_library_version
duckdb_release_codename
duckdb_source_id
This telemetry data flows through a CloudFlare pipeline and is stored in R2, making it queryable via DuckDB for analysis. This infrastructure is planned for rollout across other Query.Farm extensions, as download counts alone don't provide sufficient usage insights.
As part of this telemetry implementation, the cURL dependency has been removed in favor of DuckDB's built-in HTTPUtils. This change makes the httpfs extension a required dependency since it provides the necessary HTTPS support.
Stay tuned for more updates!