Skip to content

ftm-datalake

No description available

Usage

ftm-datalake [OPTIONS] COMMAND [ARGS]...

Arguments

No arguments available

Options

Name Description Required Default
--version / --no-version Show version [default: no-version] No -
-d TEXT Dataset foreign_id No -
--install-completion Install completion for the current shell. No -
--show-completion Show completion for the current shell, to copy it or customize the installation. No -
--help Show this message and exit. No -

Commands

Name Description
config Print current runtime configuration for...
catalog Show catalog for all existing datasets
versions Show versions of dataset
diff Show documents diff for given version
make Make or update a ftm_datalake dataset and...
get Retrieve a file from dataset archive and...
head Retrieve a file info from dataset archive...
ls List all files in dataset archive
crawl Crawl documents from local or remote sources
export Export a complete dataset in LeakRFC format
memorious Memorious related operations
aleph Aleph related operations

Sub Commands

ftm-datalake config

Print current runtime configuration for base archive or given dataset

Usage

ftm-datalake config [OPTIONS]

Arguments

No arguments available

Options

Name Description Required Default
--help Show this message and exit. No -

ftm-datalake catalog

Show catalog for all existing datasets

Usage

ftm-datalake catalog [OPTIONS]

Arguments

No arguments available

Options

Name Description Required Default
-o TEXT [default: -] No -
--collect-stats / --no-collect-stats Collect document statistics [default: no-collect-stats] No -
--names-only / --no-names-only Only show dataset names (foreign_id) [default: no-names-only] No -
--help Show this message and exit. No -

ftm-datalake versions

Show versions of dataset

Usage

ftm-datalake versions [OPTIONS]

Arguments

No arguments available

Options

Name Description Required Default
--help Show this message and exit. No -

ftm-datalake diff

Show documents diff for given version

Usage

ftm-datalake diff [OPTIONS]

Arguments

No arguments available

Options

Name Description Required Default
-v TEXT Version Yes -
-o TEXT [default: -] No -
--help Show this message and exit. No -

ftm-datalake make

Make or update a ftm_datalake dataset and check integrity

Usage

ftm-datalake make [OPTIONS]

Arguments

No arguments available

Options

Name Description Required Default
-o TEXT [default: -] No -
--check-integrity / --no-check-integrity Check checksums [default: check-integrity] No -
--cleanup / --no-cleanup Cleanup (delete) unreferenced metadata [default: cleanup] No -
--metadata-only / --no-metadata-only Check document metadata only [default: no-metadata-only] No -
--dataset-metadata-only / --no-dataset-metadata-only Compute dataset metadata only [default: no-dataset-metadata-only] No -
--help Show this message and exit. No -

ftm-datalake get

Retrieve a file from dataset archive and write to out uri (default: stdout)

Usage

ftm-datalake get [OPTIONS] KEY

Arguments

Name Description Required
KEY [required] No

Options

Name Description Required Default
-o TEXT [default: -] No -
--help Show this message and exit. No -

ftm-datalake head

Retrieve a file info from dataset archive and write to out uri (default: stdout)

Usage

ftm-datalake head [OPTIONS] KEY

Arguments

Name Description Required
KEY [required] No

Options

Name Description Required Default
-o TEXT [default: -] No -
--help Show this message and exit. No -

ftm-datalake ls

List all files in dataset archive

Usage

ftm-datalake ls [OPTIONS]

Arguments

No arguments available

Options

Name Description Required Default
-o TEXT [default: -] No -
--keys / --no-keys Show only keys [default: no-keys] No -
--checksums / --no-checksums Show only checksums [default: no-checksums] No -
--help Show this message and exit. No -

ftm-datalake crawl

Crawl documents from local or remote sources

Usage

ftm-datalake crawl [OPTIONS] URI

Arguments

Name Description Required
URI [required] No

Options

Name Description Required Default
-o TEXT Write results to this destination [default: -] No -
--skip-existing / --no-skip-existing Skip already existing files (doesn't check actual similarity) [default: skip-existing] No -
--extract / --no-extract Extract archives via patool [default: no-extract] No -
--extract-keep-source / --no-extract-keep-source Keep the source archive when extracting [default: no-extract-keep-source] No -
--extract-ensure-subdir / --no-extract-ensure-subdir Ensure a subdirectory with the package filename when extracting [default: no-extract-ensure-subdir] No -
--exclude TEXT Exclude paths glob pattern No -
--include TEXT Include paths glob pattern No -
--help Show this message and exit. No -

ftm-datalake export

Export a complete dataset in LeakRFC format

Usage

ftm-datalake export [OPTIONS] OUT

Arguments

Name Description Required
OUT [required] No

Options

Name Description Required Default
--help Show this message and exit. No -

ftm-datalake memorious

Memorious related operations

Usage

ftm-datalake memorious sync [OPTIONS]

Arguments

No arguments available

Options

Name Description Required Default
--help Show this message and exit. No -
-i TEXT [required] No -
--name-only / --no-name-only Use only file name as key [default: no-name-only] No -
--strip-prefix TEXT Strip from file key prefix No -
--key-template TEXT Template to generate key No -
--help Show this message and exit. No -

ftm-datalake aleph

Aleph related operations

Usage

ftm-datalake aleph load-catalog [OPTIONS] URI

Arguments

Name Description Required
URI Dataset index.json uri Yes
URI Catalog index.json uri Yes

Options

Name Description Required Default
--help Show this message and exit. No -
--host TEXT Aleph host No -
--api-key TEXT Aleph api key No -
--folder TEXT Base folder path No -
--foreign-id TEXT Aleph foreign_id (if different from dataset) No -
--metadata / --no-metadata Update collection metadata [default: metadata] No -
--help Show this message and exit. No -
--host TEXT Aleph host No -
--api-key TEXT Aleph api key No -
--foreign-id TEXT Aleph foreign_id (if different from dataset) No -
--metadata / --no-metadata Update collection metadata [default: metadata] No -
--help Show this message and exit. No -
--host TEXT Aleph host No -
--api-key TEXT Aleph api key No -
--metadata / --no-metadata Update collection metadata [default: metadata] No -
--help Show this message and exit. No -