REST API
ftm-lakehouse includes a FastAPI-based REST API for remote access to the lakehouse. It exposes journal operations and dataset job execution over HTTP with JWT-based authentication.
Running the API
The interactive API docs (ReDoc) are served at /.
Configuration
API settings use the LAKEHOUSE_API_ prefix:
| Variable | Description | Default |
|---|---|---|
LAKEHOUSE_API_SECRET_KEY |
JWT signing key | change-for-production |
LAKEHOUSE_API_ACCESS_TOKEN_EXPIRE |
Token expiry in minutes | 5 |
LAKEHOUSE_API_ACCESS_TOKEN_ALGORITHM |
JWT algorithm | HS256 |
LAKEHOUSE_API_AUTH_REQUIRED |
Require authentication | true |
LAKEHOUSE_API_TITLE |
OpenAPI title | FollowTheMoney Data Lakehouse Api |
When auth_required is false, read-only requests (GET, HEAD, OPTIONS) are allowed without a token. Write requests are always rejected in public mode.
Authentication
The API uses JWT bearer tokens with a method + path prefix authorization model. Tokens encode a list of allowed HTTP methods and path prefixes, keeping auth logic external to the API itself.
Token structure
Tokens carry two claims:
- methods: List of allowed HTTP methods (e.g.
["GET", "POST"]) or["*"]for all - prefixes: List of allowed path prefixes or glob patterns
Examples
Allow all access:
from ftm_lakehouse.api.auth import create_access_token
token = create_access_token(methods=["*"], prefixes=["/"])
Read-only access:
Scoped to a dataset's archive:
Glob pattern matching:
Routes
Storage
The base storage routes are provided by anystore and expose raw key-value access to the lakehouse store.
Journal
| Method | Path | Description |
|---|---|---|
POST |
/{dataset}/journal/bulk |
Write TSV rows into the journal |
GET |
/{dataset}/journal/iterate |
Stream all journal rows as TSV |
POST |
/{dataset}/journal/flush |
Stream and delete journal rows |
GET |
/{dataset}/journal/count |
Get journal row count |
DELETE |
/{dataset}/journal/clear |
Delete all journal rows |
Operations
| Method | Path | Description |
|---|---|---|
POST |
/{dataset}/_operation |
Run a job operation on a dataset |
The request body must be a serialized DatasetJobModel with a name field identifying the operation:
Available operations:
| Job name | Description |
|---|---|
CrawlJob |
Batch file ingestion from a source URI |
OptimizeJob |
Compact parquet files, optional vacuum and sidecar apply |
ExportStatementsJob |
Export to statements.csv |
ExportEntitiesJob |
Export to entities.ftm.json |
ExportStatisticsJob |
Export to statistics.json |
ExportDocumentsJob |
Export to documents.csv |
ExportIndexJob |
Export index.json with resources |
MappingJob |
Process a CSV mapping configuration |
RecreateJob |
Rebuild parquet store from exports |
DownloadArchiveJob |
Export archive files to original paths |
MakeJob |
Full workflow: flush + all exports |
Pass ?force=true to skip freshness checks.