Apache Geospatial Track
Tuesday 14:10 UTC
Twelve OGC/ISO standards used in Apache SIS and other projects
Martin Desruisseaux
The Open Geospatial Consortium (OGC) conjointly with the International Organization for Standardization (ISO) define international standards that make interoperability possible between different geospatial applications. Many standards are articulated around data formats (Well Known Text, Geographic Markup Language, etc.) and web services (Web Map Service, Web Feature Service, etc.). But there is also an increasing number of standards for API.
Apache SIS is a Java library for helping developers to create their own geospatial application. SIS follows closely a dozen of OGC/ISO standards. This presentation will do a quick overview of the following standards and give for each of them a few entry points in the API of Apache SIS 1.0 or later, together with PROJ (via JNI) and UCAR netCDF library when applicable.
- ISO 19115 — Metadata, for answering “what, where, when”.
- ISO 19157 — Data Quality, for answering “how reliable”.
- ISO 19111 — Referencing by coordinates, which includes map projections.
- ISO 19112 — Spatial referencing by geographic identifiers.
- ISO 19162 — Well-Known Text representation of coordinate reference systems.
- ISO 19136 — Geography Markup Language (reference systems only in Apache SIS).
- OGC GeoAPI 3.0.1 — Java interfaces derived from OGC/ISO abstract models.
Above-cited standards are well established in Apache SIS 1.0. The following standards are either new or got substantial improvements in the SIS development branch:
- ISO 19143 — Filter encoding, for specifying a subset of resources.
- ISO 19109 — Rules for application schema (i.e., “Features”).
- OGC 18-075 — Moving Features, adding temporal variability to the above.
- OGC 10-092 — NetCDF binary encoding, for raster data and moving features.
- OGC 19-008 — GeoTIFF format, for raster data.
Martin holds a Ph.D in oceanography, but has continuously focused on developing tools for analysis work. Programming experience was C/C++ before to switch to Java in 1997. He develops geospatial libraries for more than 25 years, as a contributor to Apache SIS since 2013. He follows Open Geospatial Consortium (OGC) meetings since 2002 in the hope to improve Apache SIS conformance to those standards. Martin works in a small IT services company (Geomatys) specialized in development of geoportals, which uses Apache SIS as a foundation.
Tuesday 15:00 UTCAn introduction to the OGC API - Processes and practical examples
Benjamin Pross
The Open Geospatial Consortium (OGC) API - Processes - Part 1: Core is a standard that specifies a Web API that enables the execution of computing processes and the retrieval of metadata describing their purpose and functionality.
Typically, these processes combine raster, vector, coverage and/or point cloud data with well-defined algorithms to produce new raster, vector, coverage and/or point cloud information.
The standard builds on the OGC Web Processing Service (WPS) 2.0 Standard and defines the processing interface to communicate over a RESTful protocol using JSON encodings.
Like all OGC APIs, the API - Processes is a modular specification consisting of multiple parts. Besides the Core, two additional Parts are currently worked on:
Part 2: Transactions for deployment
Part 3: Workflows and Chaining
This talk gives an overview about the API and shows pratical examples using the 52°North javaPS framework. The framework enables users to deploy new processes based on Docker images. Examples will be given using Docker images for processing of Earth Observation Data.
Benjamin Pross holds a diploma degree in geoinformatics. He specialises in Web-based geoprocessing. He is the chair of the API - Processes standard working group within the Open Geospatial Consortium (OGC), editor of the OGC API - Processes - Part 1: Core and co-editor of the OGC WPS 2.0 standard.
Tuesday 15:50 UTCDatacubes: Enabling Space/Time Analytics and AI
Peter Baumann
Datacubes today are an accepted cornerstone in Earth Science, acting as an enabling paradigm for offering massive spatio-temporal Earth data - such as 1D sensor, 2D imagery, 3D x/y/t image timeseries and x/y/z geological data, 4D x/y/z/t atmospheric simulation, and nD statistics data - in an analysis-ready way. This is achieved by combining zillions of individual files into single, homogenized objects offering uninhibited navigation in space and time, analytics, and fusion. On top of this, APIs and clients can be crafted which save substantial work to the experts and unleash these data for non-experts in IT and Remote Sensing. Recent research investigates on a tight coupling of spatio-temporal AI with datacubes.
In OGC and ISO standardization, coverages provide the unifying concept for spatio-temporal datacubes, with the streamlined service model of Web Coverage Service (WCS) including Web Coverage Processing Service (WCPS), OGC's geo datacube analytics language. A large, continuously growing number of open-source and proprietary tools support the coverage standards. In parallel, in 2019 ISO has enhanced the SQL standard with domain-independent datacube functionality.
Actionable datacubes have been coined by the rasdaman Array Database engine, an idea subsequently taken up by many epigons, often by extending libraries like xarray or adding array layers on top of Hadoop and Spark. Operational datacube services exist on multi-Petabyte assets. EarthServer represents the first location-transparent federation of massive SAR, optical, and simulation datacubes.
In our talk we introduce key concepts of datacube services with emphasis on analytics and AI, inspect relevant standards, and discuss implementation in distributed, federated contexts. This will be exemplified with real-life demos which participants can recap and modify on their Internet-connected laptop.
Peter Baumann is Professor of Computer Science at Jacobs University, researching on datacube services and their application in science and engineering. With the rasdaman engine he and his team have pioneered datacubes and Array Databases, with over 160 scientific publications and international patents. The rasdaman datacube engine is successfully commercialized internationally and has received a series of innovation awards. Peter Baumann is editor of the core datacube standards in ISO and OGC.
Tuesday 17:10 UTCIntroduction to new GeoSPARQL features with Apache Jena
Marco Neumann
Linked Data has now been widely adopted as the preferred option for creating, publishing and reuse for data on the Web. Spatial searches in Jena have been introduces now almost 20 years ago to the project. But now with the most recent Apacha Jena release the projects continues to consolidate and streamline the use of GeoSPARQL in a simple to use package.
In the presentation we will give you a brief introduction to geospatial searches with Apache Jena and and update on current developments.
Marco Neumann is an Information Scientist with keen interest in distributed information syndication and contexts for the Semantic Web, dynamic schema evolution in structured data, information visualization, ontology based knowledge management, reputation based ranking in Semantic Social Networks (augmented collaborative online communities such as http://www.lotico.com), and last but not least the Semantic GeoSpatial Web. Since 2005 Marco applies his experiencing to large-scale information management projects in international cultural heritage institutions and the private sector.
Tuesday 18:00 UTCProcessing Geospatial Data at Scale with Python and Apache Sedona (incubating).
Paweł Kociński
Nowadays we collect a lot of data from IoT sensors, mobile applications. We need somehow manage, and process the data to acquire knowledge which in data driven world lead to correct decisions and predictions.
Apache Sedona (incubating) is an open source library which extends Apache Spark with geospatial capabilities. It has Scala, Java, Python and R API's which helps to process and analyze geospatial data at scale.
This presentation focuses focus on Python API with examples (Spark 2.4 and 3.0)
- How to read the data from various spatial data sources including, geojson, shapefile, wkt, wkb.
- How to integrate other Python libraries with Apache Sedona (incubating) such as geopandas, shapely, pandas numpy, folium.
- How to join the data based on spatial relation like intersecting, containing, within distance using OSM data.
- How to read the data from external databases like PostgreSQL
- Geospatial indexes, spatial partitioning and why it is important to use them.
- Best practices while using Python API to avoid additional computing and memory issues.
Paweł Kociński is currently working at allegro.pl, ASF and Apache Sedona (incubating) committer, main author of Sedona Python API. Big Data, Stream processing, Scala and Python enthusiast.
Worked at various projects to help European and US companies expand their businesses based on geospatial data. Warsaw University of Technology Graduate.
Streaming Geospatial Vector Data with Apache Projects
Jim Hughes
As IoT-based use cases increase, so has the need to create real-time geospatial views of the data generated. In this talk, we will describe how Apache Kafka and Apache NiFi can be leveraged to enable spatial data streaming and data management.
The first part of the talk will dive into the details of indexing observation data for entities moving through space and time. Examples will show how a web-tier can be polled to show a live picture of multiple moving entities.
The second part of the talk will focus on using NiFi to route data through an enterprise. Apache NiFi provides a visual programming interface to create, represent, and monitor data flows. The GeoMesa-NiFi project provides Processors which allow for handling spatial data with GeoMesa and GeoTools DataStores.
Jim Hughes applies training in mathematics and computer science to build distributed, scalable system capable of supporting data science and machine learning. He is a core committer for GeoMesa, which leverages HBase, Accumulo and other distributed database systems to provide distributed computation and query capabilities. He is also a committer for the LocationTech projects JTS and SFCurve and serves a mentor for other LocationTech and Eclipse projects. He serves on the LocationTech Project Management Committee and Steering Committee. Through work with LocationTech and OSGeo projects like GeoTools and GeoServer, he works to build end-to-end solutions for big spatio-temporal problems.
Jim received his Ph.D. in Mathematics from the University of Virginia for work studying algebraic topology. He enjoys playing outdoors and swing dancing.
Enabling Mars Exploration with AHTSE
Dr. Lucian Plesea
Apache HTTPD Tile Serving Environment (AHTSE) is a loose collection of Apache HTTPD modules that can be combined to build highly scalable geospatial raster tile services. Cloud storage is a core feature of AHTSE, and when necessary, lightweight image processing and even map projections are done on the fly, in memory, greatly reducing the cost and effort to manage large collection of maps and support client applications with different requirements.
Last year's AHTSE presentation was a very quick overview of existing capabilities. For this year, in addition to an AHTSE status update, details of the AHTSE architecture and configuration will be presented, as used to build the Esri Explore Mars application, available online at https://explore-mars.esri.com. These AHTSE based Mars services are used as a testing ground for AHTSE, with the goal of preserving the qualitites of the original data, for applications that can make use of those qualities, while preserving compatibility with legacy applications.
Dr. Lucian Plesea career has always been intertwined with geospatial mapping and high performance web services. During his 20 years of work at the Jet Propulsion Laboratory, he transitioned from working on scientific supercomputing applications to processing satellite imagery and remote sensing data, and ultimately to developing high performance web services. While at JPL, he participated in some of the early OGC standardization efforts, especially the OGC WMS.
While still contributing to NASA Earth and Planetary mapping projects, Dr. Plesea has transitioned to Esri, where he is the architect of the public basemap services, used by millions of users worldwide every day.
He is also an active contributor to various geospatial open source projects, especially GDAL.