Skip to contents

CRAN_Status_Badge

The aim of TKCat (Tailored Knowledge Catalog) is to facilitate the management of data from knowledge resources that are frequently used alone or together in research environments. In TKCat, knowledge resources are manipulated as modeled database (MDB) objects. These objects provide access to the data tables along with a general description of the resource and a detail data model generated with ReDaMoR documenting the tables, their fields and their relationships. These MDB are then gathered in catalogs that can be easily explored an shared. TKCat provides tools to easily subset, filter and combine MDBs and create new catalogs suited for specific needs.

This package has been presented at the useR!2022 conference and here are the slides.

The TKCat R package is licensed under GPL-3.

Installation

From CRAN

Dependencies

The following R packages available on CRAN are required:

  • ReDaMoR: Relational Data Modeler
  • magrittr: A Forward-Pipe Operator for R
  • DBI: R Database Interface
  • visNetwork: Network Visualization using ‘vis.js’ Library
  • dplyr: A Grammar of Data Manipulation
  • ClickHouseHTTP: A Simple HTTP Database Interface to ‘ClickHouse’
  • rlang: Functions for Base Types and Core R and ‘Tidyverse’ Features
  • tidyselect: Select from a Set of Strings
  • getPass: Masked User Input
  • shiny: Web Application Framework for R
  • shinydashboard: Create Dashboards with ‘Shiny’
  • DT: A Wrapper of the JavaScript Library ‘DataTables’
  • htmltools: Tools for HTML
  • readr: Read Rectangular Text Data
  • jsonlite: A Simple and Robust JSON Parser and Generator for R
  • jsonvalidate: Validate ‘JSON’ Schema
  • base64enc: Tools for base64 encoding
  • markdown: Render Markdown with ‘commonmark’
  • promises: Abstractions for Promise-Based Asynchronous Programming
  • future: Unified Parallel and Distributed Processing in R for Everyone
  • xml2: Parse XML
  • Matrix: Sparse and Dense Matrix Classes and Methods
  • uuid: Tools for Generating and Handling of UUIDs
  • crayon: Colored Terminal Output
  • roxygen2: In-Line Documentation for R

And those are suggested:

  • knitr: A General-Purpose Package for Dynamic Report Generation in R
  • rmarkdown: Dynamic Documents for R
  • stringr: Simple, Consistent Wrappers for Common String Operations
  • RClickhouse: ‘Yandex Clickhouse’ Interface for R with Basic ‘dplyr’ Support
  • data.tree: General Purpose Hierarchical Data Structure
  • BED: Biological Entity Dictionary (BED)

From github

devtools::install_github("patzaw/TKCat")

Documentation

Documentation is available in the following vignette: TKCat user guide

Alternatives

  • The dm package provides similar features but with different implementation choices. Here are the main differences:

    • The dm data model feature is built upon the datamodelr package whereas TKCat relies on ReDaMoR.
    • Both dm and TKCat provides mechanisms to check the fulfillment of the data model constrains and tools to automatically take advantage of them.
    • dm supports connection to many different DBMS. It also take advantage of constraints which are documented in the DBMS when available. TKCat only supports the ClickHouse system through the ClickHouseHTTP or RClickhouse packages.
    • TKCat implements 3 main types of MDB based on files, memory tables or ClickHouse database. It also provides mechanisms to automatically convert from and to any of these implementations.
    • TKCat supports catalogs of MDBs facilitating the exploration of existing data. It also allows the integration of different MDBs through the automatic identification of similar concepts (Collections) and the automatic conversion of the different vocabulary on which they rely.

Acknowledgments

This work was entirely supported by UCB Pharma (Early Solutions department).