Tools

Here is a list of R libraries I developed to support data manipulation and analyses.

Visualization

Build SVG Custom User Interface

The aim of the bscui R package is to render any SVG image as an interactive figure and convert identified elements into an actionable interface. This figure can be seamlessly integrated into R Markdown and Quarto documents, as well as Shiny applications, allowing manipulation of elements and reporting actions performed on them.

Data knowledge management

Modeling data with ReDaMoR

The ReDaMoR (Relational Data ModeleR) package allows the manipulation of relational data models in R. It provides functions to create, import and save relational data models. These functions are also accessible through a graphical user interface made with Shiny.

Managing data with TKCat

The aim of TKCat (Tailored Knowledge Catalog) is to facilitate the management of data from knowledge resources that are frequently used alone or together in research environments. In TKCat, knowledge resources are manipulated as modeled database (MDB) objects. These objects provide access to the data tables along with a general description of the resource and a detail data model generated with ReDaMoR documenting the tables, their fields and their relationships. These MDB are then gathered in catalogs that can be easily explored an shared. TKCat provides tools to easily subset, filter and combine MDBs and create new catalogs suited for specific needs.

Concept dictionaries

Biological Entity Dictionary (BED)

The aim of the BED (Biological Entity Dictionary) R package is to get and explore mapping between identifiers of biological entities (BE). This package provides a way to connect to a BED Neo4j database in which the relationships between the identifiers from different sources are recorded.

Dictionary Of Disease Ontologies (DODO)

The aim of DODO (Dictionary Of Disease Ontologies) is to allow an easier way to interact and explore disease ontologies and their identifiers. The database is build on Neo4j and incorporates different ontologies with an accompagnying R package that allows easy access, exploration, and definition of disease concepts of interest. It can work as the intermediate player to facilitate access and exhaustive extraction of information from other life science databases without the need to harmonize these up front.

Bioinformatics

Phenotype Consensus Analysis (PCAN)

Phenotype Consensus Analysis (PCAN) is an indirect phenotype-based method that quantifies the consensus similarity of genetic disorders linked to the mechanism of a putative disease causing gene. PCAN makes use of widely adopted knowledge resources for protein-protein interactions (e.g. STRING) and signaling pathways (e.g. Reactome) and the comprehensive HPO (Human Phenotype Ontology) annotation resource. This approach allows support for the discovery of novel disease genes and naturally lends itself to the mechanistic deconvolution of diverse phenotypes.

Backend libraries

neo2R

The aim of the neo2R is to provide simple and low level connectors for querying Neo4j graph databases from R. The objects returned by the query functions are either lists or data.frames with very few post-processing. It allows fast processing of queries returning many records. And it let the user handle post-processing according to the data model and his needs.

ClickHouseHTTP

ClickHouse is an open-source, high performance columnar OLAP (online analytical processing of queries) database management system for real-time analytics using SQL. This R DBI backend relies on the ClickHouse HTTP interface and support the HTTPS protocol.