What is the DEBBIE pipeline?

Powered by software container technology (Docker) and a workflow manager (Nextflow), the DEBBIE pipeline automatically and continuously retrieves research abstracts, filters them according to relevance using a supervised classifier, annotates concepts using pre-configured lexical resources, and stores the information within a document-oriented database named the Database of Experimental Biomaterials and their Biological Effect (DEBBIE).


What are annotations?

Automated text annotation is a Natural Language Processing (NLP) technique that identifies and extracts relevant concepts hidden within large collections of textual data through computational approaches. In DEBBIE, annotations are performed using the Biomaterials Annotator, an instrument combining several open lexical resource and built using the General Architecture of Text Engineering (GATE) software and the Stanford Core Natural Language Processing (CoreNLP) framework. Below is an example of an annotated abstract in GATE, with identified terms labeled with their respective categories.


Quick Start

DEBBIE's User Interface Search

  1. Enter a single term into the search field on the DEBBIE homepage and click the Submit Query button in order to begin the search.
    Example of search terms: Fibroin, Encapsulation, Bone

  2. The search results page contains a quick summary, showing mentions of the term over the years, other commonly associated terms and the study types where your search term has appeared.

  3. Below the quick summary, there are optional annotation categories. Select the categories you wish to explore further, and a drop down box will appear with further information, such as associated top terms.


  1. The contents of DEBBIE are programmatically accessible through the RESTful API located at

  2. Users can submit different searches through the RESTful API:

    • Returns the frequency of usage of the term of interest over time, normalized per abstract:[term]/years

    • Returns the thirteen most frequently annotated terms that are associated (co-occur) with the term of interest in different abstracts:[term]/top_terms

    • Returns the thirteen most frequently annotated terms that co-occurs with term of interest within a particular category:[term]/top_terms/[type]
      Category possible values [type]:
      Biomaterial - BiologicallyActiveSubstance - ManufacturedObject - ManufacturedObjectComponent - MedicalApplication - ManufacturedObjectFeatures - Structure - AssociatedBiologicalProcess - MaterialProcessing - EffectOnBiologicalSystem Cell - AdverseEffects Species - ResearchTechnique Tissue

    • Returns network that described the relationships between the term of interest as well as the associated terms:[term]/network

    • Examples with the term "silk":

Annotation Categories

Category Definition Examples
Biomaterial A non-drug raw material or substance suitable for inclusion in systems which augment or replace the function of bodily tissues or organs Polydioxanone, Polyglycolide, Hydroxyapatite
Biomaterial Types Classification or nature of biomaterials Polymer, Ceramic, Metal
Biological Active Substance Substance included in a manufactured object in order to impart a biological activity Collagen, Heparin, RGD
Chemical Any material with a definite chemical composition Calcium, hydrogen peroxide, ethanol
Manufactured Object A physical object created by hand or machine Experimental scaffold, Medical device, Surgical implant
Manufactured Object Component A part, region or component referred to as a distinct unit, such as a surface or a layer Core, Shell, Coat
Structure The configuration, form or texture associated with a manufactured object or its components Fiber, Gel, Mesh
Shape The external form, contours, or outline of a manufactured object or its components. Cube, Tube, Disk
Architectural Organization Characteristics related to the three dimensional organization, orientation and assembly of a manufactured object or its components. Multilayered, Unidirectional, Hierarchical
Degradation Features A process or characteristic related to the long term integrity of a manufactured object or its component. Degradability, Delayed Resorption, Fragmentation
Manufactured Object Features Characteristics inherent or given during processing to a manufactured object or its components Geometry, Mechanical Property, Physical Property
Material Processing A planned process which results in physical changes in a specified input material Biofabrication, Coating, Knitting
Medical Application, Disease or Condition Intended use, context, function or outcome of the manufactured object Artificial organs, Encapsulation, Diabetes, Injury
Associated Biological Process A cellular or biological process that the manufactured object is designed to cause or support, or is measured to affect Adipogenesis, Angiogenesis, Cell attachment
Effect on Biological System The effect associated with manufactured object in a specific test system (cells, tissue or organism) Biocompatibility, Cytocompatibility, Immunomodulatory
Adverse Effects An unfavorable or unintended disease, sign, or symptom (including an abnormal laboratory finding) that is temporally associated with the use of a medical device or biomaterial Cytotoxicity, Inflammatory reaction, Abscesses
Study Type The study set up, such as in vitro, in vivo, or clinical Clinical study, in vitro, in vivo
Tissue A tissue or an organ mentioned in the study as the target or test system for the biomaterial object or medical device Lung epithelium, Nerve plexus, Elastic cartilage tissue
Cell The reported cell line or primary cell type Fibroblast, Type II Pneumocyte, Osteocyte
Species The species and /or breed used in the study Rat, Rabbit, Mouse
Research Technique The reported laboratory technique used in an experimental study Scanning electron microscope, High Performance Liquid Chromatography

Ontology/Terminology Sources

Individual Pipeline Components and Databases

Name Availability Brief Description
PubMed Retrieval Tool Takes a desired time frame and retreives PMID, title, abstract, and publication date (month and year) of all records archived by PubMed over that period.
PubMed Standardization Tool Takes a PubMed abstract collection in XML format stored in a working directory and standardizes the content, generating an individual plain text file for each abstract.
Gold Standard Literature Set The gold standard set is a list of PMIDs for abstracts selected to represent the biomaterials literature, with focus on biological evaluation of biomaterials and biocompatibility.
Background Literature Set The backgroud set is a list of PMIDs representing non-biomaterials abstracts for the purpose of relevance classification.
Classifier This component performs binary classification of abstracts to determine if they are relevant or not relevant to the field of biomaterials.
OWL2DICT Tool Maps out the entire OWL file to retrieve all child classes found within the selected categories as terms, each term’s class memberships, as well as any associated synonyms and properties.
OWL2DICT-lite Tool A simpler implementation of owlready2. It retrieves all classes (but no properties) for given ancestors from an .owl ontology provided locally.
Biomaterial Annotator The biomaterials annotator is a lexical resource for performing annotations on the biomaterials literature
GATE-TO-JSON Tool This component exports debbie annotations in XML GATE format to JSON format.
Import JSON-TO-MONGO Tool This component inserts and pushes JSON files into a designated MongoDB.
Complete Pipeline Tool The automated pipeline retrieves biomaterials abstracts from PubMed, annotates them using multiple lexical assets, and deposits the annotated abstracts in a MongoDB.


McKitrick A, Corvi J and Hakimi O. Manuscript in preparation (2020)

Hakimi O, Gelpi JL, Krallinger M, Curi F, Repchevsky D and Ginebra MP. The devices, experimental scaffolds, and biomaterials ontology (deb): A tool for mapping, annotation, and analysis of biomaterials’ data. Adv. Funct. Mater. (2020)


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 751277.