firmmatchr

firmmatchr is an R package for linking messy, user-generated company names to official firm registries like Zefix and Orbis. It implements a “waterfall” matching strategy — moving from exact matches after normalization, through token-based and fuzzy string matching, to LLM-assisted verification for uncertain cases. The normalization is tuned for the DACH market (umlauts, German legal forms like GmbH/AG, etc.) but the logic generalises to other countries.

I go into much more detail on the approach in the blog post.