Microplastics and Trash Cleaning and Harmonization: Semantic Data Ingestion and Harmonization Using Artificial Intelligence

MaTCH

MaTCH deployed RShiny app web interface screenshot

Description

 

Overview of MaTCH

Microplastics and trash datasets are notoriously difficult to combine due to inconsistent terminology, reporting units, size classes, and categorical descriptors. Even when high-quality methods are used, semantic heterogeneity across studies severely limits synthesis, modeling, and risk assessment.

The Microplastics and Trash Cleaning and Harmonization (MaTCH) framework addresses this challenge using artificial intelligence, semantic embeddings, and rule-based standardization to harmonize heterogeneous datasets into a unified, interoperable structure (DOI: 10.1021/acs.est.4c02406).

Importantly, MaTCH is not just a concept or algorithm—it is a publicly deployed web application with fully open-source code, making it immediately usable by the research and monitoring community.

Live app: https://hannahhapich.shinyapps.io/match/
Source code: https://github.com/hannahhapich/MaTCH
Publication: DOI: 10.1021/acs.est.4c02406


What Makes MaTCH Distinct

1. Deployed, Interactive Web App

MaTCH is available as a live Shiny application, allowing users to upload datasets, harmonize fields, and export standardized outputs without writing code:

🔗 MaTCH App: https://hannahhapich.shinyapps.io/match/

The app supports:

  • Upload of microplastics or trash datasets with arbitrary column names
  • Automated semantic matching of descriptors (e.g., polymer, shape, size)
  • Unit normalization and categorical alignment
  • Download of cleaned, harmonized datasets ready for analysis or database ingestion

This makes MaTCH accessible to field scientists, regulators, students, and data managers, not just computational users.

➡️ Related Plastiverse resource:


2. Fully Open-Source Codebase

All MaTCH functionality is implemented in an open GitHub repository:

🔗 MaTCH GitHub Repository: https://github.com/hannahhapich/MaTCH

The repository includes:

  • Source code for semantic ingestion and harmonization
  • Documentation for local deployment and customization
  • Transparent mapping logic for categories and units

This enables:

  • Reproducibility and peer review of the harmonization process,
  • Integration into custom pipelines (e.g., R, Shiny, or database workflows),
  • Extension of the ontology as new descriptors emerge.

How MaTCH Works

Semantic Harmonization via AI Embeddings

MaTCH uses natural language processing and semantic embeddings to map disparate terminology into a shared conceptual space. For example:

  • “LDPE”, “low-density polyethylene”, and “polyethylene (low density)”
  • “fiber”, “microfiber”, “thread-like fragment”

are recognized as semantically equivalent and harmonized into standardized categories.

This approach allows MaTCH to handle:

  • Non-standard field names
  • Misspellings and legacy terminology
  • Previously unseen descriptors

Key Use Cases of MaTCH

Cross-Study Synthesis & Meta-Analysis

MaTCH enables aggregation of datasets collected using different reporting conventions—unlocking large-scale analyses across regions, matrices, and time.

➡️ See also:


Modeling, Exposure, and Risk Assessment

Harmonized inputs reduce structural uncertainty in fate, transport, and risk models that depend on consistent size, polymer, and concentration data.

➡️ Related Plastiverse resource:


Improving Future Reporting

By revealing where datasets diverge semantically, MaTCH also reinforces the value of standardized reporting and complements existing QA/QC and reporting guidance.

➡️ Related guidance:


Why This Matters for Plastiverse

Plastiverse is designed to connect tools, data, and standards across the plastics research ecosystem. MaTCH fills a critical gap between data generation and data reuse by making heterogeneous datasets interoperable.

Together with:

  • reporting guidelines,
  • QA/QC frameworks, and
  • shared databases,

MaTCH helps move the field toward FAIR, synthesis-ready microplastics data.


Citation

Hapich, H. R., Cowger, W., & Gray, A. B. (2024). Microplastics and Trash Cleaning and Harmonization (MaTCH): Semantic Data Ingestion and Harmonization Using Artificial Intelligence. Environmental Science & Technology, 58, 20502–20512. https://doi.org/10.1021/acs.est.4c02406