ftm-datalake
ftm-datalake
provides a data standard and archive storage for leaked data, private and public document collections. The concepts and implementations are originally inspired by mmmeta and Aleph's servicelayer archive.
ftm-datalake
acts as a multi-tenant storage and retrieval mechanism for structured entity data, documents and their metadata. It provides a high-level interface for generating and sharing document collections and importing them into various search and analysis platforms, such as ICIJ Datashare, Liquid Investigations, and OpenAleph.
Installation
Requires python 3.11 or later.
Quickstart
Development
This package is using poetry for packaging and dependencies management, so first install it.
Clone this repository to a local destination.
Within the repo directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
Testing
ftm-datalake
uses pytest as the testing framework.
make test
License and Copyright
ftm-datalake
, (c) 2024 investigativedata.io
ftm-datalake
, (c) 2025 Data and Research Center – DARC
ftm-datalake
is licensed under the AGPLv3 or later license.