CLAVIN (Cartographic Location And Vicinity INdexer) is an award-winning open source software package for document geotagging and geoparsing that employs context-based geographic entity resolution. It extracts location names from unstructured text and resolves them against a gazetteer to produce data-rich geographic entities.
CLAVIN does not simply “look up” location names – it uses intelligent heuristics to identify exactly which “Springfield” (for example) was intended by the author, based on the context of the document. CLAVIN also employs fuzzy search to handle incorrectly-spelled location names, and it recognizes alternative names (e.g., “Ivory Coast” and “Côte d’Ivoire”) as referring to the same geographic entity.
The CLAVIN ecosystem continues to grow, and contains four open source projects:
- CLAVIN — CLAVIN (Cartographic Location And Vicinity INdexer) is an open source software package for document geotagging and geoparsing that employs context-based geographic entity resolution.
- CLAVIN-REST — A quick and dirty DropWizard RESTful micro-service demonstration of CLAVIN, GeoNames, and OpenNLP or CLAVIN-NERD (with Stanford NER).
- CLAVIN-NERD — The Stanford NLP Implementation of the CLAVIN LocationTagger.
- CLAVIN-CONFIG — A collection of CLAVIN experimental releases: a Scala Server, a Python script, GDelt (http://gdeltproject.org/) parser, Apache Tika integration, IP Tweet georesolver and Language identification