Skip to content

ftm-datalake on pypi Python test and package pre-commit Coverage Status AGPLv3+ License

ftm-datalake

ftm-datalake provides a data standard and archive storage for leaked data, private and public document collections. The concepts and implementations are originally inspired by mmmeta and Aleph's servicelayer archive.

ftm-datalake acts as a multi-tenant storage and retrieval mechanism for structured entity data, documents and their metadata. It provides a high-level interface for generating and sharing document collections and importing them into various search and analysis platforms, such as ICIJ Datashare, Liquid Investigations, and OpenAleph.

Read the specification

Installation

Requires python 3.11 or later.

pip install ftm-datalake

Quickstart

>> get started here

Development

This package is using poetry for packaging and dependencies management, so first install it.

Clone this repository to a local destination.

Within the repo directory, run

poetry install --with dev

This installs a few development dependencies, including pre-commit which needs to be registered:

poetry run pre-commit install

Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml)

Testing

ftm-datalake uses pytest as the testing framework.

make test

ftm-datalake, (c) 2024 investigativedata.io

ftm-datalake, (c) 2025 Data and Research Center – DARC

ftm-datalake is licensed under the AGPLv3 or later license.