Debbie_img

What is the DEBBIE pipeline?

Powered by software container technology (Docker) and a workflow manager (Nextflow), the DEBBIE pipeline automatically and continuously retrieves research abstracts, filters them according to relevance using the DEBBIE_BioBERT model, annotates concepts using the Biomaterials Annotator, and stores the information within a document-oriented database named the Database of Experimental Biomaterials and their Biological Effect (DEBBIE).


Pipeline

As a starting point for the gathering of relevant biomaterials abstracts, the following PubMed search query is executed: ((((((((Biomedical and dental materials[MeSH Terms]) OR (Prostheses and implants[MeSH Terms])) OR (Materials testing[MeSH Terms])) OR (Tissue engineering[MeSH Terms])) OR (Tissue scaffolds[MeSH Terms])) OR (Equipment safety[MeSH Terms])) OR (Medical device recalls[MeSH Terms])) OR (Biomaterials)) OR (Cell scaffolds).

The DEBBIE_BioBERT model performs multiclass classification of abstracts to determine if they are relevant (either clinical or non-clinical studies) or not relevant to the field of biomaterials. The DEBBIE_BioBERT model was developed using Transformers, which is the state-of-the-art in NLP. We used the pre-trained BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) model and trained it on our biomaterial abstracts dataset. This technique is known as fine-tuning. BioBERT is a domain-specific language representation model pre-trained on large-scale biomedical corpora.

Watch the DEBBIE video to learn more about its capabilities.

What are annotations?

Automated text annotation is a Natural Language Processing (NLP) technique that identifies and extracts relevant concepts hidden within large collections of textual data through computational approaches. In DEBBIE, annotations are performed using the Biomaterials Annotator, an instrument combining several open lexical resource and built using the General Architecture of Text Engineering (GATE) software and the Stanford Core Natural Language Processing (CoreNLP) framework. Below is an example of an annotated abstract in GATE, with identified terms labeled with their respective categories.


Pipeline

Quick Start

DEBBIE's User Interface Search

  1. Enter a single term into the search field on the DEBBIE homepage and click the Submit Query button in order to begin the search.
    Example of search terms: Fibroin, Encapsulation, Bone

  2. The search results page contains a quick summary, showing mentions of the term over the years, other commonly associated terms and the study types where your search term has appeared.

  3. Below the quick summary, there are optional annotation categories. Select the categories you wish to explore further, and a drop down box will appear with further information, such as associated top terms.

DEBBIE's RESTful API

  1. The contents of DEBBIE are programmatically accessible through the RESTful API located at http://debbie.bsc.es/search/api/v1

  2. Users can submit different searches through the RESTful API:

    • Returns the frequency of usage of the term of interest over time, normalized per abstract:
      https://debbie.bsc.es/search/api/v1/search/[term]/years

    • Returns the thirteen most frequently annotated terms that are associated (co-occur) with the term of interest in different abstracts:
      https://debbie.bsc.es/search/api/v1/search/[term]/top_terms

    • Returns the thirteen most frequently annotated terms that co-occurs with term of interest within a particular category:
      https://debbie.bsc.es/search/api/v1/search/[term]/top_terms/[type]
      Category possible values [type]:
      Biomaterial - BiologicallyActiveSubstance - ManufacturedObject - ManufacturedObjectComponent - MedicalApplication - ManufacturedObjectFeatures - Structure - AssociatedBiologicalProcess - MaterialProcessing - EffectOnBiologicalSystem Cell - AdverseEffects Species - ResearchTechnique Tissue

    • Returns network that described the relationships between the term of interest as well as the associated terms:
      https://debbie.bsc.es/search/api/v1/search/[term]/network

    • Examples with the term "silk":
      https://debbie.bsc.es/search/api/v1/search/silk/years
      https://debbie.bsc.es/search/api/v1/search/silk/top_terms
      https://debbie.bsc.es/search/api/v1/search/silk/top_terms/Biomaterial
      https://debbie.bsc.es/search/api/v1/search/silk/network

Annotation Categories

Category Definition Examples
Adverse Effects An unfavorable or unintended disease, sign, or symptom (including an abnormal laboratory finding) that is temporally associated with the use of a medical device or biomaterial Cytotoxicity, Inflammatory reaction, Abscesses
Associated Biological Process A cellular or biological process that the manufactured object is designed to cause or support, or is measured to affect Adipogenesis, Angiogenesis, Cell attachment
Biomaterial A non-drug raw material or substance suitable for inclusion in systems which augment or replace the function of bodily tissues or organs Polydioxanone, Polyglycolide, Hydroxyapatite
Biomaterial Types Classification or nature of biomaterials Polymer, Ceramic, Metal
Biological Active Substance Substance included in a manufactured object in order to impart a biological activity Collagen, Heparin, RGD
Cell The reported cell line or primary cell type Fibroblast, Type II Pneumocyte, Osteocyte
Effect on Biological System The effect associated with manufactured object in a specific test system (cells, tissue or organism) Biocompatibility, Cytocompatibility, Immunomodulatory
Manufactured Object A physical object created by hand or machine Experimental scaffold, Medical device, Surgical implant
Manufactured Object Component A part, region or component referred to as a distinct unit, such as a surface or a layer Core, Shell, Coat
Material Processing A planned process which results in physical changes in a specified input material Biofabrication, Coating, Knitting
Manufactured Object Features Characteristics inherent or given during processing to a manufactured object or its components Geometry, Mechanical Property, Physical Property
Medical Application, Disease or Condition Intended use, context, function or outcome of the manufactured object Artificial organs, Encapsulation, Diabetes, Injury
Research Technique The reported laboratory technique used in an experimental study Scanning electron microscope, High Performance Liquid Chromatography
Species The species and /or breed used in the study Rat, Rabbit, Mouse
Structure The configuration, form or texture associated with a manufactured object or its components Fiber, Gel, Mesh
Tissue A tissue or an organ mentioned in the study as the target or test system for the biomaterial object or medical device Lung epithelium, Nerve plexus, Elastic cartilage tissue

Ontology/Terminology Sources


Individual Pipeline Components and Databases

Name Availability Brief Description
PubMed Retrieval Tool Takes a desired time frame and retreives PMID, title, abstract, and publication date (month and year) of all records archived by PubMed over that period.
PubMed Standardization Tool Takes a PubMed abstract collection in XML format stored in a working directory and standardizes the content, generating an individual plain text file for each abstract.
Gold Standard Literature Set The gold standard set is a list of PMIDs for abstracts selected to represent the biomaterials literature, with focus on biological evaluation of biomaterials and biocompatibility.
Background Literature Set The backgroud set is a list of PMIDs representing non-biomaterials abstracts for the purpose of relevance classification.
Classifier This component performs binary classification of abstracts to determine if they are relevant or not relevant to the field of biomaterials.
OWL2DICT Tool Maps out the entire OWL file to retrieve all child classes found within the selected categories as terms, each term’s class memberships, as well as any associated synonyms and properties.
OWL2DICT-lite Tool A simpler implementation of owlready2. It retrieves all classes (but no properties) for given ancestors from an .owl ontology provided locally.
Biomaterial Annotator The biomaterials annotator is a lexical resource for performing annotations on the biomaterials literature
GATE-TO-JSON Tool This component exports debbie annotations in XML GATE format to JSON format.
Import JSON-TO-MONGO Tool This component inserts and pushes JSON files into a designated MongoDB.
Complete Pipeline Tool The automated pipeline retrieves biomaterials abstracts from PubMed, annotates them using multiple lexical assets, and deposits the annotated abstracts in a MongoDB.

Citing DEBBIE

Corvi, J., McKitrick, A., Fernández, J., Fuenteslópez, C., Gelpi, J., Ginebra, M.-P., Capella-Guitierrez, S., Hakimi, O. DEBBIE: the open access database of experimental scaffolds and biomaterials with an automated information retrieval pipeline. Manuscript in preparation (2022).

Corvi, J., Fuenteslópez, C., Fernández, J., Gelpi, J., Ginebra, M.-P., Capella-Guitierrez, S., Hakimi, O. The biomaterials annotator: a systemfor ontology-based concept annotation of biomaterials text. In:Proceedings of the Second Workshop on Scholarly DocumentProcessing, pp. 36–48. Association for Computational Linguistics,Online (2021). https://www.aclweb.org/anthology/2021.sdp-1.5

Hakimi, O., Gelpi, J., Krallinger, M., Curi, F., Repchevsky, D., Ginebra, M.-P. The devices, experimental scaffolds, and biomaterials ontology (deb): A tool for mapping, annotation, and analysis of biomaterials’ data. Adv. Funct. Mater. (2020)

eu_flag

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 751277.