FDSA Information for  Data Contributors

Seamlessly Integrated Data Sharing

The Federated and Distributed data Sharing Appliance (FDSA) is engineered to facilitate secure, remote data sharing. FDSA operates as a plug-in attached directly to data contributors’ data stores, allowing them to make their datasets discoverable and analyzable through the AD Workbench. The data remains securely stored within the contributor's environment while permissioned researchers access it remotely.

How It Works

Data contributors install the FDSA plug-in, which is designed to work with a full range of possible infrastructure setups. Then they publish their metadata in the AD Workbench FAIR Catalog. When researchers request access to a data set, data contributors can approve or reject the request utilizing existing data governance procedures. When an analysis is completed, data contributors approve or reject the release of the results to the researcher.

Two Levels of Data Sharing

FDSA supports two distinct levels of data sharing, with data contributors deciding which level meets their requirements:

Distributed Sharing (Level 1): At this level, researchers can analyze record-level data that is made available in an airlocked workspace, allowing for more detailed analysis.
Federated Sharing (Level 2): At this level, researchers see only derived data and results in the workspace. The analysis of record-level data is done at the data source and, after a rigorous quarantine process, is aggregated and anonymized to ensure privacy.

Versatile Deployment Options

FDSA is optimized for flexibility, offering seamless deployment in both on-premise and cloud environments, intranet or internet accessibility, and easy integration into various infrastructure setups.

Cloud or on-premise deployment: FDSA is designed to be cloud-agnostic, providing organizations the freedom to choose the cloud provider that best suits their needs. However, for organizations that require direct oversight of their infrastructure due to regulatory compliance, security policies, or operational preferences, FDSA can also be deployed on-premise.
Intranet or internet accessibility: FDSA can be deployed within an intranet environment, restricting access to internal networks for heightened security. It can also be configured with a fully qualified domain and set up as a public IP, making it accessible over the internet.
Tailored to your infrastructure: Regardless of your existing infrastructure, FDSA can be tailored to fit. It can be deployed and managed according to your specific operational needs based on your setup, without tying you down to any particular provider or environment.

Key Features and Functionality

1. Secure Data Sharing

FDSA ensures that all shared data is governed by the originating organization, maintaining compliance with data privacy and security regulations. Organizations have the ability to set and enforce governance policies over their data. Granular access control mechanisms allow organizations to define who can access specific datasets. All data shared between entities is encrypted and transmitted securely. The appliance includes a built-in mobile authenticator that provides two-factor authentication (2FA) to enhance the security of data access.

2. Data Access Approvals

FDSA offers a simplified, streamlined system for reviewing and managing data access requests. This system is integrated with the AD Workbench FAIR data access request framework, enabling efficient approval or denial of requests. Fine-tuned controls allow administrators to base access on detailed parameters.

3. FDSA Job/Task Query Management

FDSA manages and tracks the status of queries submitted by researchers, providing visibility into the entire query lifecycle. The system tracks queries through various states: Queued, Initializing, Running, Quarantined, Approved/Rejected, Complete/Rejected. It also logs and tracks the history of each query, enabling auditing and review.

4. User-Friendly Design

FDSA is designed with an easy-to-use UI that makes it easy for organizations to upload and manage datasets, operations, access, and maintenance. A centralized, easy-to-navigate dashboard simplifies all user operations and actions, including data access approvals and processed queries audit. There are also features for adding new users, admins, auto end-user creation upon data access requests, and user action management (e.g., role changes, disabling accounts, resetting MFA).

5. Easy Integration

FDSA supports seamless integration with the data contributor's infrastructure and internal systems, including data access request decision tools and applications. It enables extendibility and integration with external systems and internal applications through webhooks for managing data access decisions. It is also compatible with existing server environments, data management tools, and traffic restrictions through whitelisting process and limited IP traffic.

6. Data Quarantine and Audit Process

FDSA provides a quarantine process for data results, granting administrators the authority to review and audit processed data before release. They can either approve or reject it for release and provide feedback on their decision.

7. Docker Registry - Model Read

During data processing, FDSA can access and read published data researcher models from a secure Azure Docker registry and process inside a container.

8. Data Connectivity

FDSA includes database connectors that enable easy connections to remote data sources served as federated data. It features built-in connectors for PostgreSQL, supporting structured datasets and facilitating complex data queries across multiple databases.

Get the FDSA

Onboarding

Email fdsa.support@alzheimersdata.org to start the process of signing a contract agreement. Include your contact information, organization information, reasons for using the FDSA product, and a signing authority contact.
AD Data Initiative will review your request and, if approved, send an electronic contract agreement for signature. 
As soon as you sign the license, you will receive an email with instruction steps to install and a license code.

Note: Data contributors are expected to complete the prerequisites listed below before install. 

Installation Guide

Prerequisites for Installation

Prerequisite 1: System requirements

OS: Linux -> ubuntu 20.04 LTS or higher; Rocky OS version 8+
CPU: 4 cores or more
Memory: 8GB or higher
Storage: 100GB (Min)
Privileged access (root access)
Libraries and Tools: Git 1.8+
Open Port 443 accessible from the internet

Prerequisite 2: SSL Certificate

An SSL certificate ensures that data transmission between your appliance and users is encrypted and secure. Here’s how to obtain an SSL certificate:

SSL Purchase: Purchase a public SSL/TLS v1.2+ CA Certificate (.crt and .key) from a reputable Certificate Authority (CA) like GoDaddy, DigiCert, or Comodo. (Avoid using free solutions like 'Let's Encrypt.')
CSR generation: Follow the CA's instructions to generate a Certificate Signing Request (CSR) and submit it. Once approved, you'll receive the SSL certificate files, including the public key, private key, and intermediate certificates. Keep these secure.
Certificate Note: When you get your SSL certificate, remember to coordinate with your administrator to extract the private key.

Prerequisite 3: Fully Qualified Domain Name and Public IP Address

A Fully Qualified Domain Name (FQDN) is necessary to access FDSA over the internet. Here's how to set up an FQDN:

DNS Selection: Choose a domain name registrar (e.g., GoDaddy, Namecheap) and register a domain name.
DNS Configuration: Configure the DNS records for your domain to point to your server's public IP address.
DNS Resolution: Ensure that your chosen FQDN resolves correctly to your server.
Email the FQDN to fdsa.support@alzheimersdata.org for ADWB FAIR Whitelisting. If using a Load Balancer, send the Public facing FQDN.

Prerequisite 4: SSH Key

In order to get the project we’ll need the SSH Key. Here's how to create one:

In the server where FDSA is going to be installed, start the root user mode: sudo su
Generate an SSH Key using your email (you can leave the passphrase empty, but it's optional): ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
View the generated SSH Key: cat /root/.ssh/id_rsa.pub
Copy and send the Public Key to fdsa.support@alzheimersdata.org. Our support team will add your key to the GitHub FDSA-Release repository and notify you when it's ready.

Installation Steps

Installing FDSA

Start the root user mode:

sudo su
Ensure the installation directory exists:

mkdir -p /var/www
Navigate to the installation directory:

cd /var/www/
Clone the release repository from one of the following options:

git clone git@github.com:alzheimersdata-org/federated-data-sharing-appliance-releases.git
Enter the 'federated-data-sharing-appliance-release' directory:

cd federated-data-sharing-appliance-releases
Copy the '.env-template' and rename it to '.env':

cp .env-template .env
Copy your SSL certificates to the 'ssl-certs' folder:

cp -f <PATH_TO_YOUR_CERT>/fullchain_certfile.crt ./ssl-certs/fullchain_certfile.crt

cp -f <PATH_TO_YOUR_PRIVATE_KEY>/private_keyfile.key ./ssl-certs/private_keyfile.key
Start the appliance and complete the required fields:

bash ubuntu-startup.sh host.example.com
After the installation is completed, you will receive your Super Admin credentials as an output in the console.
Reboot the server:

reboot

Upgrading FDSA to latest version

Ubuntu OS

Log in as the root user: sudo su
Go to the installation directory: cd /var/www/federated-data-sharing-appliance-releases
Run the Ubuntu upgrade script: bash change-version.sh

Rocky OS

Log in as the root user: sudo su
Go to the installation directory: cd /var/www/federated-data-sharing-appliance-releases
Run the Rocky upgrade script: bash rocky-update-version.sh

Data Security and Compliance

We use a suite of tools to test a live service or codebase for security vulnerabilities. In our setup, these tools are integrated into our pipelines. These tools include: Bandit, Semgrep, SonarQube Scanner, SonarQube OSS Index, Whispers, Dependency Checker, and SQLmap.

Penetration Testing: Before releasing each new version, we conduct a full scan using OWASP ZAP (Zed Attack Proxy), an open-source web application security scanner widely recognized for its ability to identify vulnerabilities in web applications. OWASP ZAP’s comprehensive capabilities include:

Automated Scanning: Detects common vulnerabilities such as SQL injection and cross-site scripting (XSS).
Passive and Active Scanning: Analyzes server responses and sends crafted requests to test security.
Intercepting Proxy: Allows for the inspection and modification of traffic between the browser and the web application.
Spidering: Crawls the web application to discover all pages and endpoints.
Fuzzing: Tests how the application handles unexpected inputs.
Reporting: Generates detailed reports on identified vulnerabilities.

Resources

Access user guides, release notes, additional product specs, and more.

Resources Login to the Workbench

Contact and Support

For support and more information, email fdsa.support@alzheimersdata.org.

Go to FDSA Information for Researchers

Back to the Federated Data Sharing Appliance (FDSA)

FDSA Information for Data Contributors