Overview of MaTCH
Microplastics and trash datasets are notoriously difficult to combine due to inconsistent terminology, reporting units, size classes, and categorical descriptors. Even when high-quality methods are used, semantic heterogeneity across studies severely limits synthesis, modeling, and risk assessment.
The Microplastics and Trash Cleaning and Harmonization (MaTCH) framework addresses this challenge using artificial intelligence, semantic embeddings, and rule-based standardization to harmonize heterogeneous datasets into a unified, interoperable structure (DOI: 10.1021/acs.est.4c02406).
Importantly, MaTCH is not just a concept or algorithm—it is a publicly deployed web application with fully open-source code, making it immediately usable by the research and monitoring community.
Live app: https://hannahhapich.shinyapps.io/match/
Source code: https://github.com/hannahhapich/MaTCH
Publication: DOI: 10.1021/acs.est.4c02406
What Makes MaTCH Distinct
1. Deployed, Interactive Web App
MaTCH is available as a live Shiny application, allowing users to upload datasets, harmonize fields, and export standardized outputs without writing code:
🔗 MaTCH App: https://hannahhapich.shinyapps.io/match/
The app supports:
- Upload of microplastics or trash datasets with arbitrary column names
- Automated semantic matching of descriptors (e.g., polymer, shape, size)
- Unit normalization and categorical alignment
- Download of cleaned, harmonized datasets ready for analysis or database ingestion
This makes MaTCH accessible to field scientists, regulators, students, and data managers, not just computational users.
➡️ Related Plastiverse resource:
- Microplastics Data Crosswalk
https://www.plastiverse.org/tools/microplastics-data-crosswalk
2. Fully Open-Source Codebase
All MaTCH functionality is implemented in an open GitHub repository:
🔗 MaTCH GitHub Repository: https://github.com/hannahhapich/MaTCH
The repository includes:
- Source code for semantic ingestion and harmonization
- Documentation for local deployment and customization
- Transparent mapping logic for categories and units
This enables:
- Reproducibility and peer review of the harmonization process,
- Integration into custom pipelines (e.g., R, Shiny, or database workflows),
- Extension of the ontology as new descriptors emerge.
How MaTCH Works
Semantic Harmonization via AI Embeddings
MaTCH uses natural language processing and semantic embeddings to map disparate terminology into a shared conceptual space. For example:
- “LDPE”, “low-density polyethylene”, and “polyethylene (low density)”
- “fiber”, “microfiber”, “thread-like fragment”
are recognized as semantically equivalent and harmonized into standardized categories.
This approach allows MaTCH to handle:
- Non-standard field names
- Misspellings and legacy terminology
- Previously unseen descriptors
Key Use Cases of MaTCH
Cross-Study Synthesis & Meta-Analysis
MaTCH enables aggregation of datasets collected using different reporting conventions—unlocking large-scale analyses across regions, matrices, and time.
➡️ See also:
- Atlas of Ocean Microplastics
https://www.plastiverse.org/tools/atlas-of-ocean-microplastics
Modeling, Exposure, and Risk Assessment
Harmonized inputs reduce structural uncertainty in fate, transport, and risk models that depend on consistent size, polymer, and concentration data.
➡️ Related Plastiverse resource:
Improving Future Reporting
By revealing where datasets diverge semantically, MaTCH also reinforces the value of standardized reporting and complements existing QA/QC and reporting guidance.
➡️ Related guidance:
Why This Matters for Plastiverse
Plastiverse is designed to connect tools, data, and standards across the plastics research ecosystem. MaTCH fills a critical gap between data generation and data reuse by making heterogeneous datasets interoperable.
Together with:
- reporting guidelines,
- QA/QC frameworks, and
- shared databases,
MaTCH helps move the field toward FAIR, synthesis-ready microplastics data.
Citation
Hapich, H. R., Cowger, W., & Gray, A. B. (2024). Microplastics and Trash Cleaning and Harmonization (MaTCH): Semantic Data Ingestion and Harmonization Using Artificial Intelligence. Environmental Science & Technology, 58, 20502–20512. https://doi.org/10.1021/acs.est.4c02406