Humanitarian GIS Data Cleaning for Crowdsourcing Data Capture

From Random Hacks of Kindness
Jump to: navigation, search


Contents

[edit] GIS Data cleaning algorithms for Humanitarian Crowd-Sourcing data capturing

[edit] Owner

Proposed by: United Nations World Food Programme/Logistics Cluster

Contact (name, email, phone, skype): Christophe.bois@wfp.org, peter.singler@wfp.org, patrick.fitzgerald@wfp.org

Best way and times to contact during RHoK 2.0 Dec 4/5 2010: peter.singler@wfp.org, skype: kartopete

[edit] Summary

With the United Nations Logistics Cluster Geoportal we try to establish a public crowd-sourcing portal for emergency GIS data. Freelancer and NGO partners can participate to collect data for the cooperate logistic infrastructure and transportation database of the humanitarian community. The pilot project is available on geoportal.logcluster.org with public access through username and password public/public. For specific access to see the editing capabilities of the portal, please refer to peter.singler@wfp.org for a generic account.

The aim is a data integration and synchronization platform for an Humanitarian Interagency Database (PostgreSQL/PostGIS)

[edit] Use Case/User Story/Scenario

In case of an emergency, inter-agency data capturing and dissemination and a standardized basis is always a bottleneck and most of the time, due to non-standardization, efforts are duplicated. Data sources are varying and data coming in on the main database varying in all kind of formats. The time to integrate and synchronize this data with an inter-agency mother-database, to compile the common standard makes the data old and most of the time not up to date any more.

[edit] Scenario 1

During an emergency like Pakistan, many NGO's drive to remote areas in the mountains where they don't have any geographical data or precise maps for navigating in that regions. So they start with maps which are outdated and take their GPS units to record whatever situation they find. Once they are back at base, they integrate this tracks into their core dataset and use it for printing new maps. In case of the Logistics Cluster, we try to coordinate those different NGO's, sometimes up to hundreds (Haiti). They all need maps but only six to seven have real GIS officers in the field for integrating and using GIS data. To support all people with the latest data from the ground there needs to be a common datastore from which you can get latest maps and data on demand. For this, a datacentre with download and upload options needs to be available for any user (GIS or non-GIS). The Logistics Cluster with hundreds of partners is trying to achieve this goal, with a platform for getting the latest data from the community. This UN NGO crowd-sourcing tool has two big limitations. A corporate standard needs to be used (UNSDI-T) and people need to have a real simple way to provide data to this common pool and therefore being able to share it with all partners and people who need data. With the Logistics Cluster Geoportal we try to achieve this goal. But as the resources to integrate and standardize all the information is limited, we cannot provide close to real-time information to the field all the time. Up to 10 operations at the time don't allow this. For this a platform is needed where the user can upload a file, doing some attribute mapping and then by clicking a button integrating and synchronize it with the main database, which then provides the data through Geoportal to all people again in form of shapefiles, KML, gpx, or a pure operation map. With this, people are able to share their data in the most efficient way during the emergency and people can focus on helping the victims of the emergency.

[edit] Scenario 2

With my NGO I drive in a truck up in the mountains delivering food. On the way, I have a GPS unit on and I track the way. We arrive at a damaged bridge where we can only pass with trucks less than 10 tons. We do that and drive further up the valley. Once arrived at the food distribution point I've tracked a few landslides as well. So once I'm back at base I have all this useful information, but don't know how to integrate with existing data because I don't know how to work on a GIS system or even connect with this to a common datastore. So I grab the phone and call 3 other NGO's because I know that they want to go to the same area in 2 days time. I tell them there are; 2 broken bridges, a few landslides and verbally where they are. I offer them my GPS tracks, but they tell me they don't know how to use them. For integrating my data and shoot a map I don't have time till the end of the week. So that's done! If I could upload them into the community database simply and tell the others I have new data, they just go to the Geoportal and and print a map from the latest data available and they have a reference.

[edit] Description and Constraints

What else do they need to know? What constraints

To solve this issue, the UN Humanitarian Sector in the field of logistics, represented by the Logistics Cluster Support Cell of the World Food Program based in Rome, needs a data integration platform and synchronization algorithms. The Logistics Cluster maintains the inter-agency database of the transportation layer of the UN spatial data infrastructure, which is the basic source for all humanitarian agencies participating in emergency response. The web-based platform should allow users to import and synchronize data coming from ESRI Shapefiles, KML, GPX , OSM and Excel Sheets. The data should be proofed if the attributes do match and provide the option to assign non-matching fields to existing once. Most of the attributes of the Spatial Data Infrastructure – Transport (SDI-T) are POINT – data. However, the road dataset builds the exception and should be handled with priority. Here the synchronization needs to be joined with a data clearing algorithm, considering attributes and geometry to later on have the biggest length of single segments possible (related to attributes and geometry topology) to fit to a road network standard which can be used as a basis for routing algorithms. The automated integration of data would save a lot of time in very critical and intense times during the first weeks of an emergency, especially for GIS people in the field and in headquarters around the world. This would allow a big change of shifting the focus on data analysis and mapping for decision making, instead of working on data capturing and synchronization.

The main constraint is the use of PostgreSQL/PostGIS, which is the platform the UNSDI-T is build on. The main focus should be set on the roads layer as editing functionality for POINT-features like bridges, ports, helipads, .... are available on the Geoportal. Access to the main data source will be provided only on demand.

[edit] Extra Credit

Fabulous extensions that would rock the world if completed

[edit] Similar projects and Resources

Our Projects public version: http://geoportal.logcluster.org

The SDI-T is a PostgreSQL/PostGIS database and will be opened to all participating people. Technologies can be chosen freely.

There is some overlap with this RHoK project with Red Cross, CARITAS & Oxfam:

  • http://wiki.rhok.org/Sahana_Eden_Offline_Mapping
    • We should aim to be able to share data easily
      • WFS best for being able to edit the data live?
      • KML best for being able to import/export the data to/from offline instances?
    • We should aim to share code where possible (we're both using PostGIS/GeoServer/GeoExt)

Here are some general platforms for sharing GIS data which could be used or ideas borrowed from:

[edit] What next and Sustainability

If this will be a successful approach or implementation and it looks promising, it will be available withing the community platform of the new Geoportal version(approx. release in May/June 2011) next to blogs, forums, a data centre and the mapping platform of the Logistics Cluster. Once there it will be maintained and extended by the Logictsic Cluster GIS Developing Unit within the World Food Programme.

[edit] Current State and Solutions

For all interested people lets collect skype contact and email addresses here to see who its interested to have some conversations during the RHoK. Main purpose to see what can be done, how to organize and how a structure could look like. Collecting ideas about what features should be implemented/provided and how the main platform should look like. If someone can organize a communication platform for the weekend, this would be great. Or we all agree here on a way to communicate at the RHoK.

Peter Singler Skype: kartopete Email: peter.singler@wfp.org Private: psingler@gmx.org

Sara Farmer Skype: sj.farmer Email: sara.farmer at btinternet dot com

Om Goeckermann Skype: okomaloko Email: om.imap at gmail


Chat channel available on webchat.freenode.net Channels: #crowdsourcing and #rhok

Personal tools