Rethinking GeoDX

Trends and enabling technologies driving innovation in geospatial platforms and workflows

No items found.
TL;DR GeoDX will blossom at the intersection of low-code, Large Language Models, DuckDB, and Web Assembly

Remote sensing data and cloud compute will become commoditized and promise to unlock billions through bespoke geospatial products — but suboptimal tools and processes still hold back developers and analysts. It’s time to improve the developer experience and reduce time to insight.

A Call To Improve GeoDX

We need to talk…

Today, data scientists manually conduct multistep processes fragmented across tools and standards just to render data on a map. As a representative workflow, they might do the following:

  • Load satellite imagery from Google Earth Engine and georeferenced timeseries from Snowflake
  • Align CRS and temporal units with a combination of GDAL CLI commands and Python, then export them as shapefiles to import into QGIS to visually confirm the data makes sense
  • Load everything into a Jupyter Notebook to train one of several candidate ML models to be deployed to production.

If it’s complex to begin Exploratory Data Analysis (EDA), it’s even more cumbersome to productionize. Data scientists hand off algorithms to data engineers, who then need to make Python code talk Lineage tools, orchestrators, and a Spark system maintained by an entire DevOps team.GeoDX is a design that either eliminates or tucks away the complexities that historically characterized the field of geospatial analytics.

Industry Today

How data and compute became commoditized

Today, geospatial software flourishes at the intersection of GIS, cloud compute, and AI. In the last decade, the landscape evolved from point-and-click tooling to draw polygons over rendered satellite imagery toward cloud-native capabilities that promise automated insights from raw terabyte-scale datasets. In the same decade, climate awareness and maturing industries sparked a demand for improved earth observation data to fuel critical high-frequency business decisions.

Nascent startups leverage plummeting costs of remote sensing and cloud compute to crunch data to meet the emerging demand of growing markets. These enable the provision of georeferenced and analysis-ready datasets to serve burgeoning markets. These include autonomous transport fleets (looking to understand, e.g., the effect on road conditions), insure-tech and private equity pressured by climate adaptation (think ESG and disclosures), and spatially-aware consumer technologies (think the Garmins, Ubers, and Instacarts). You can read more about these trends in “Approaching Geospatial 2.0” by Sust Global’s Josh Gilbert.

The commoditization of data is evident in a representative list of 2023’s “top geospatial companies,” which can be segmented into the following broad categories:

  • Data as a service — collecting and selling high-level data to multiple sectors.
  • Sector-specific data aggregators — focused partnerships to source, harmonize, and analyze data in specific verticals.
  • Platforms — offering GIS analysis tools orbited by auxiliary services such as analysis-ready datasets and collaborative offerings.

The geospatial sector is seeing a virtuous cycle of technological push and market pull — and the big opportunities to create and capture value lie in the ability to empower developers and analysts to quickly and repeatedly derive vertical-specific insights. Geospatial data volumes and the demand for faster and more accessible analytics will continue to rise. The tools to serve these needs must be accessible to a broader range of user archetypes looking to interrogate geo data.In the chain from data collection to the end data products, the value will emanate in the intermediate points where humans interact with the data.

If one accepts the premise that geospatial data and cloud infrastructure will increasingly be commoditized (and is averse to incumbent crowding), it stands to reason that new opportunities to unlock spatial analytics at scale lie in the foundational capabilities that derive the last mile of vertical-specific insight while writing less code— in a way that is generalizable across industries.

There Must Be a Better Way

How the next wave of geospatial will enhance how developers and analysts think

Empowering domain-expert yet non-coding personasTo map the next frontier, one can look at recent trends that empower the personas of domain experts that are non-developers. Imagine an organization’s science or finance advisor with an academic formation that derives algorithms and insights to hydrate a data product or business strategy. These personas are strategically positioned close to executives and leads of data products to guide the programmatic design of workflows that answer business-critical questions.

While they are deeply versed in their field and understand data structures (and maybe SQL), their time is not geared toward the onerous setup of data workflows or dashboard design. They depend on their team to do the heavy lifting.

Their ability to enhance the speed and accuracy of decision-making is therefore augmented by ancillary teams that deploy infrastructure and algorithms to surface complex data relationships across disparate datasets. When done correctly, this accelerates cycles of conjecture and refutation that advance science and discovery — as posited by Karl Popper — and creates actionable business value.

In human-computer interaction, technological enablers of localized reasoning are psychological aspects that nourish human creative enquires. When interrogating maps and data, humans benefit from the following:

  • Immediate and unambiguous feedback.
  • Negligible penalty of being wrong — by reducing the penalty of “clicking the wrong thing” or “loading the ‘wrong’ data structure.”
  • Sensorial and embodied cognition — with dynamic graphic elements that interpret and interact with underlying data.

Meeting these human needs saves time and money but, most importantly, nourishes creativity and promotes the quality of insight. Humans with these needs will gravitate to software that introduces immediacy to the process of interrogating data by creating a great UI that simplifies otherwise-complex setups and scripts. The next generation of geospatial software will emphasize these needs with features that provide rapid iteration, localized reasoning, minimal user setup, and cross-functional collaboration.

Rapid iteration

Traditional software development cycles can take days or weeks to modify a dashboard or refactor and rerun a data pipeline. The turnaround time of a data engineer impacts the cadence at which an analyst in a multifaceted organization can ask questions. Tools that empower the data engineer and every other profile to update interfaces and workflows enable the team to refine and deepen inquiries by quickly adapting their tooling to answer the next layer of questions.

In 2019, Unfolded (now Foursquare Studio) showed it was possible to bring beautiful geospatial visuals to the browser and run powerful client-side transformations to interrogate the data ad-hoc. It showed that map analytics could actually be enjoyable to use (sorry, QGIS) and that today’s web browsers can now power complex geographic visuals.

Unfolded Studio.

Localized reasoning

Software that tightens the feedback loop with quick turnaround times and minimal context switching help analysts enter and remain in a state of flow. Pausing to refactor code and rerun workflows that feed dashboards (or chasing down engineers to help) disrupts the flow.

Traditional software often requires users to understand the entire system to make even small changes. Today’s low-code tools allow users to reason about analytical results and the code that produced them on the same screen — then make modifications. Jupyter computational notebooks are a paragon of a UX than promotes localized reasoning by showing code and visual results side-by-side. It enables users to see the outcome of code modifications immediately.

Minimal user setup

The day will come when analysts can quickly find, access, and harmonize datasets without friction — without scampering across data swamps of fuzzing with API or DB credentials. Developers increasingly expect a seamless startup and a set-it-and-forget-it configuration that a single team member can handle for the rest of the team. This also means interoperability. Platforms should meet developers where they’re at and natively integrate data tooling across the modern data stack with ease.

Platforms that address this low-hanging fruit with remarkable UX will stand out for their ability to reduce friction and time to insight.

Cross-functional collaboration

Today’s organizations expect tools to facilitate collaboration and cross-functional workflows by providing a common platform for users with different skill sets to work together. These tools often offer collaboration features, such as version control, sharing and commenting, and role-based access control, which enable teams to work collaboratively. In the context of geospatial, domain experts, data scientists, and data engineers collaborate seamlessly, leveraging their respective skills and expertise to compose powerful insights that lead to more informed decision-making and better outcomes.

Technical enablers

How these four trends and tools will change GeoDX

Data warehouses and geospatial SQL provide foundational tooling for spatial analytics on the cloud. These four technologies will make it easier for users of all skill levels and roles to interact with these tools. (You can read more on this in the State of Spatial SQL Report.)

Low-code

Low-code tools for geospatial analysis mean teams can rapidly update interfaces and workflows and reason about underlying data transformations with ease. They offer rapid application development due to flexibility and customization options that allow users to create applications tailored to bespoke and shifting requirements. Apps like Retool and bubble offer pre-built components and widgets that can be configured or extended with custom logic to suit specific use cases. Users can also import data from their own data sources or external APIs. This flexibility enables organizations to create highly customized geospatial applications that align with their specific business processes or workflows, enhancing their ability to visualize and interact with data in new ways.

CARTO workflows provide a visual language to design and run multistep spatial transformations on the geospatial front. This gives cross-functional teams a common interface to contribute their skill and reason about each other’s work. Platforms will strive to make it easy to productionize workflows by providing features such as RBAC, lineage, and scheduling out of the box.

CARTO workflows enable localized reasoning on geo data transformation.

Large language models (LLM)

LLMs are task-agnostic pre-trained models that can be adapted to downstream tasks. One-shot models, also known as one-shot learning or few-shot learning models, are a type of machine learning approach that aims to enable models to learn from limited data, typically with just a few examples, unlike traditional machine learning approaches that require large amounts of labeled data for training.

Despite the success of LLMs in language and vision tasks, it’ll still be a few years until we see foundational models dedicated to geospatial, given GEo’s multimodality nature. Notwithstanding, computer vision and text 2 SQL LLMs will rapidly make their way into geospatial tools by offering to simplify workflows drastically.

Computer vision

Traditionally, annotating geospatial data with labels has been expensive and time-consuming. One-shot models offer the potential to achieve high accuracy in geospatial analysis tasks, such as image classification, object detection, and segmentation for land cover mapping. They can be a foundational building block to help analysts parse terabytes of satellite imagery and automatically identify emerging patterns.

As an example, Meta recently released Segment Anything model that offers accurate image segmentation with a single click. This is transformational for anyone who’s done image segmentation or land use classification with QGIS.

CARTO CEO’s LinkedIn comment on “Segment Anything” applied to geo imagery.

Text 2 SQL

Data scientists, GIS analysts, and developers use Spatial SQL to work with geospatial data types such as geometry and geography. Non-coders will gravitate to tools that help them structure and run SQL queries, and immediately visualize the output. Tools such as Hex already feed project schemas to Text 2 SQL models such as OpenAI Codex to create SQL queries from scratch using natural language. We can expect 2023’s autogenerative AI boom to trickle into geospatial applications soon.

Hex Magic.

DuckDB

Spatial SQL is foundational for geospatial analytics, but the amount of data processed and rendered in interactive geospatial analytics apps is almost certainly smaller than overall dataset sizes. This results from visualizations often being built from aggregate data or showing only data for a single date at a time. Since most data is rarely brought to the screen, modern geospatial apps should, in theory, query data selectively to mitigate the costs of compute and data transfer.

DuckDB is a modern SQL-based relational database system with unique capabilities that stand out in the context of geospatial analytics.

  • DuckDB uses vectorized query execution which leverages modern CPU architectures to process data in parallel which, when combined with efficient compression and encoding, results in faster query performance. This makes DuckDB well-suited to handle large geospatial datasets.
  • DuckDB is highly portable because it’s an embedded database that runs in-process.
  • DuckDB makes it easy to connect to datasets, whether local files or remote data warehouses

These three facts make DuckDB a great option for cloud-based deployment scenarios. It results in simplified architectures. Data engineers will be relieved when they no longer need to set up Spark or scale-out infrastructure, and CFOs will smile when they see the reduced dependence on cloud warehouses for compute. In the words of the serverless DuckDB offering, MotherDuck, “Big Data is dead.”

While these are promising benefits, it’s still the early days. DuckDB’s support for geospatial data types is still underway.

Notwithstanding, as DuckDB extensions mature, they make it easy to merge popular geospatial (think GDAL) and AI frameworks into the database to radically simplify the process of bringing transformations and ML closer to the data.Furthermore, downstream tooling such as DuckDB-Wasm will bring these capabilities to the Browser behind an efficient analytical SQL engine.

Web Assembly (WASM)

Lastly, let’s discuss the unspoken enabler that most people already use unwittingly.Visualizing the result of geospatial transformations interactively calls for heavy compute and graphics processing.

Web Assembly simplifies how questions are answered by enabling cloud-first applications to run compute on users’ laptops. It’s a binary format designed to execute code in a variety of languages on web browsers — at near-native speeds.

Web Assembly 101.

Many modern web apps today power their UIs with Web Assembly. Figma, a popular UI design tool, runs its graphics rendering engine with local resources to handle large complex designs easily. Other apps include ESRI’s client-side projection engine and AutoDesk’s AutoCAD web app.

Traditionally, complex simulations or data analyses would call for the data visuals UI to run on a separate environment from the infrastructure and workflows that produced the data. Today’s laptops are powerful enough to handle heavy workloads, especially when bypassing the traditional challenges of “Big Data” with optimized database operations, as done by DuckDB above.

Serving full-fledged applications client-side eliminates the need to download software and data to the desktop. This gives app creators flexibility in security, collaboration, and pricing models.

Delegating data transformations to the client also foregoes the latency of passing data over the wire and the compute cost otherwise incurred in data warehouses.

Looking Ahead

How platforms and workflows will make GeoDX feel like magic

The commoditization of data and compute has opened new opportunities for geospatial analytics platforms. To unlock their full potential, it’s crucial to empower more cross-functional roles to use these tools and simplify the job of supporting data engineers. That’s why I posit that the next frontier — and differentiator — is in GeoDX. Design that either eliminates or tucks away the complexities that historically characterized the field.

Low-code tools, LLMs, DuckDB, and Web Assembly will revolutionize platforms and workflows with their ability to reduce barriers to entry and enable faster and more accessible insights from geospatial data.

This blog post was originally published on Medium April 17th 2023.

Dowload My Library

I have a massive library of even more videos and images on this topic that will keep you enthralled for days. You can download them below for free, becasuse it's worth sharing.