FDSA Information for Data Contributors
Remotely share your data while maintaining full control over it.
Seamlessly Integrated Data Sharing
The federated data sharing appliance (FDSA) is engineered to facilitate secure, remote data sharing. FDSA operates as a plug-in attached directly to data contributors’ data stores, allowing them to make their datasets discoverable and analyzable through the AD Workbench. The data remains securely stored within the contributor's environment while permissioned researchers access it remotely.
How It Works
Data contributors install the FDSA plug-in, which is designed to work with a full range of possible infrastructure setups. Then they publish their metadata in the AD Workbench FAIR Catalog. When researchers request access to a data set, data contributors can approve or reject the request utilizing existing data governance procedures. When an analysis is completed, data contributors approve or reject the release of the results to the researcher.
Two Levels of Data Sharing
FDSA supports two distinct levels of data sharing, with data contributors deciding which level meets their requirements:
- Distributed Sharing (Level 1): At this level, researchers can analyze record-level data that is made available in an airlocked workspace, allowing for more detailed analysis.
- Federated Sharing (Level 2): At this level, researchers see only derived data and results in the workspace. The analysis of record-level data is done at the data source and, after a rigorous quarantine process, is aggregated and anonymized to ensure privacy.
Versatile Deployment Options
FDSA is optimized for flexibility, offering seamless deployment in both on-premise and cloud environments, intranet or internet accessibility, and easy integration into various infrastructure setups.
- Cloud or on-premise deployment: FDSA is designed to be cloud-agnostic, providing organizations the freedom to choose the cloud provider that best suits their needs. However, for organizations that require direct oversight of their infrastructure due to regulatory compliance, security policies, or operational preferences, FDSA can also be deployed on-premise.
- Intranet or internet accessibility: FDSA can be deployed within an intranet environment, restricting access to internal networks for heightened security. It can also be configured with a fully qualified domain and set up as a public IP, making it accessible over the internet.
- Tailored to your infrastructure: Regardless of your existing infrastructure, FDSA can be tailored to fit. It can be deployed and managed according to your specific operational needs based on your setup, without tying you down to any particular provider or environment.
Key Features and Functionality
FDSA ensures that all shared data is governed by the originating organization, maintaining compliance with data privacy and security regulations. Organizations have the ability to set and enforce governance policies over their data. Granular access control mechanisms allow organizations to define who can access specific datasets. All data shared between entities is encrypted and transmitted securely. The appliance includes a built-in mobile authenticator that provides two-factor authentication (2FA) to enhance the security of data access.
FDSA offers a simplified, streamlined system for reviewing and managing data access requests. This system is integrated with the AD Workbench FAIR data access request framework, enabling efficient approval or denial of requests. Fine-tuned controls allow administrators to base access on detailed parameters.
FDSA manages and tracks the status of queries submitted by researchers, providing visibility into the entire query lifecycle. The system tracks queries through various states: Queued, Initializing, Running, Quarantined, Approved/Rejected, Complete/Rejected. It also logs and tracks the history of each query, enabling auditing and review.
FDSA is designed with an easy-to-use UI that makes it easy for organizations to upload and manage datasets, operations, access, and maintenance. A centralized, easy-to-navigate dashboard simplifies all user operations and actions, including data access approvals and processed queries audit. There are also features for adding new users, admins, auto end-user creation upon data access requests, and user action management (e.g., role changes, disabling accounts, resetting MFA).
FDSA supports seamless integration with the data contributor's infrastructure and internal systems, including data access request decision tools and applications. It enables extendibility and integration with external systems and internal applications through webhooks for managing data access decisions. It is also compatible with existing server environments, data management tools, and traffic restrictions through whitelisting process and limited IP traffic.
FDSA provides a quarantine process for data results, granting administrators the authority to review and audit processed data before release. They can either approve or reject it for release and provide feedback on their decision.
During data processing, FDSA can access and read published data researcher models from a secure Azure Docker registry and process inside a container.
FDSA includes database connectors that enable easy connections to remote data sources served as federated data. It features built-in connectors for PostgreSQL, supporting structured datasets and facilitating complex data queries across multiple databases.
Get the FDSA
Onboarding
- Email fdsa.support@alzheimersdata.org to start the process of signing a contract agreement. Include your contact information, organization information, reasons for using the FDSA product, and a signing authority contact.
- AD Data Initiative will review your request and, if approved, send an electronic contract agreement for signature.
- As soon as you sign the license, you will receive an email with instruction steps to install and a license code.
Note: Data contributors are expected to complete the prerequisites listed below before install.
Installation Guide
Prerequisites for Installation
- OS: Linux -> ubuntu 20.04 LTS or higher; Rocky OS version 8+
- CPU: 4 cores or more
- Memory: 8GB or higher
- Storage: 100GB (Min)
- Privileged access (root access)
- Libraries and Tools: Git 1.8+
- Open Port 443 accessible from the internet
An SSL certificate ensures that data transmission between your appliance and users is encrypted and secure. Here’s how to obtain an SSL certificate:
- SSL Purchase: Purchase a public SSL/TLS v1.2+ CA Certificate (.crt and .key) from a reputable Certificate Authority (CA) like GoDaddy, DigiCert, or Comodo. (Avoid using free solutions like 'Let's Encrypt.')
- CSR generation: Follow the CA's instructions to generate a Certificate Signing Request (CSR) and submit it. Once approved, you'll receive the SSL certificate files, including the public key, private key, and intermediate certificates. Keep these secure.
- Certificate Note: When you get your SSL certificate, remember to coordinate with your administrator to extract the private key.
A Fully Qualified Domain Name (FQDN) is necessary to access FDSA over the internet. Here's how to set up an FQDN:
- DNS Selection: Choose a domain name registrar (e.g., GoDaddy, Namecheap) and register a domain name.
- DNS Configuration: Configure the DNS records for your domain to point to your server's public IP address.
- DNS Resolution: Ensure that your chosen FQDN resolves correctly to your server.
- Email the FQDN to fdsa.support@alzheimersdata.org for ADWB FAIR Whitelisting. If using a Load Balancer, send the Public facing FQDN.
In order to get the project we’ll need the SSH Key. Here's how to create one:
- In the server where FDSA is going to be installed, start the root user mode: sudo su
- Generate an SSH Key using your email (you can leave the passphrase empty, but it's optional): ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
- View the generated SSH Key: cat /root/.ssh/id_rsa.pub
- Copy and send the Public Key to fdsa.support@alzheimersdata.org. Our support team will add your key to the GitHub FDSA-Release repository and notify you when it's ready.
Installation Steps
- Start the root user mode:
sudo su
- Ensure the installation directory exists:
mkdir -p /var/www
- Navigate to the installation directory:
cd /var/www/
- Clone the release repository from one of the following options:
git clone git@github.com:alzheimersdata-org/federated-data-sharing-appliance-releases.git
- Enter the 'federated-data-sharing-appliance-release' directory:
cd federated-data-sharing-appliance-releases
- Copy the '.env-template' and rename it to '.env':
cp .env-template .env
- Copy your SSL certificates to the 'ssl-certs' folder:
cp -f <PATH_TO_YOUR_CERT>/fullchain_certfile.crt ./ssl-certs/fullchain_certfile.crt
cp -f <PATH_TO_YOUR_PRIVATE_KEY>/private_keyfile.key ./ssl-certs/private_keyfile.key
- Start the appliance and complete the required fields:
bash ubuntu-startup.sh host.example.com
- After the installation is completed, you will receive your Super Admin credentials as an output in the console.
- Reboot the server:
reboot
Ubuntu OS
- Log in as the root user: sudo su
- Go to the installation directory: cd /var/www/federated-data-sharing-appliance-releases
- Run the Ubuntu upgrade script: bash change-version.sh
Rocky OS
- Log in as the root user: sudo su
- Go to the installation directory: cd /var/www/federated-data-sharing-appliance-releases
- Run the Rocky upgrade script: bash rocky-update-version.sh
Data Security and Compliance
We use a suite of tools to test a live service or codebase for security vulnerabilities. In our setup, these tools are integrated into our pipelines. These tools include: Bandit, Semgrep, SonarQube Scanner, SonarQube OSS Index, Whispers, Dependency Checker, and SQLmap.
Penetration Testing: Before releasing each new version, we conduct a full scan using OWASP ZAP (Zed Attack Proxy), an open-source web application security scanner widely recognized for its ability to identify vulnerabilities in web applications. OWASP ZAP’s comprehensive capabilities include:
- Automated Scanning: Detects common vulnerabilities such as SQL injection and cross-site scripting (XSS).
- Passive and Active Scanning: Analyzes server responses and sends crafted requests to test security.
- Intercepting Proxy: Allows for the inspection and modification of traffic between the browser and the web application.
- Spidering: Crawls the web application to discover all pages and endpoints.
- Fuzzing: Tests how the application handles unexpected inputs.
- Reporting: Generates detailed reports on identified vulnerabilities.
Contact and Support
For support and more information, email fdsa.support@alzheimersdata.org.