ftm-datalake
No description available
Usage
ftm-datalake [OPTIONS] COMMAND [ARGS]...
Arguments
No arguments available
Options
Name | Description | Required | Default |
---|---|---|---|
--version / --no-version |
Show version [default: no-version] | No | - |
-d TEXT |
Dataset foreign_id | No | - |
--install-completion |
Install completion for the current shell. | No | - |
--show-completion |
Show completion for the current shell, to copy it or customize the installation. | No | - |
--help |
Show this message and exit. | No | - |
Commands
Name | Description |
---|---|
config |
Print current runtime configuration for... |
catalog |
Show catalog for all existing datasets |
versions |
Show versions of dataset |
diff |
Show documents diff for given version |
make |
Make or update a ftm_datalake dataset and... |
get |
Retrieve a file from dataset archive and... |
head |
Retrieve a file info from dataset archive... |
ls |
List all files in dataset archive |
crawl |
Crawl documents from local or remote sources |
export |
Export a complete dataset in LeakRFC format |
memorious |
Memorious related operations |
aleph |
Aleph related operations |
Sub Commands
ftm-datalake config
Print current runtime configuration for base archive or given dataset
Usage
ftm-datalake config [OPTIONS]
Arguments
No arguments available
Options
Name | Description | Required | Default |
---|---|---|---|
--help |
Show this message and exit. | No | - |
ftm-datalake catalog
Show catalog for all existing datasets
Usage
ftm-datalake catalog [OPTIONS]
Arguments
No arguments available
Options
Name | Description | Required | Default |
---|---|---|---|
-o TEXT |
[default: -] | No | - |
--collect-stats / --no-collect-stats |
Collect document statistics [default: no-collect-stats] | No | - |
--names-only / --no-names-only |
Only show dataset names (foreign_id ) [default: no-names-only] |
No | - |
--help |
Show this message and exit. | No | - |
ftm-datalake versions
Show versions of dataset
Usage
ftm-datalake versions [OPTIONS]
Arguments
No arguments available
Options
Name | Description | Required | Default |
---|---|---|---|
--help |
Show this message and exit. | No | - |
ftm-datalake diff
Show documents diff for given version
Usage
ftm-datalake diff [OPTIONS]
Arguments
No arguments available
Options
Name | Description | Required | Default |
---|---|---|---|
-v TEXT |
Version | Yes | - |
-o TEXT |
[default: -] | No | - |
--help |
Show this message and exit. | No | - |
ftm-datalake make
Make or update a ftm_datalake dataset and check integrity
Usage
ftm-datalake make [OPTIONS]
Arguments
No arguments available
Options
Name | Description | Required | Default |
---|---|---|---|
-o TEXT |
[default: -] | No | - |
--check-integrity / --no-check-integrity |
Check checksums [default: check-integrity] | No | - |
--cleanup / --no-cleanup |
Cleanup (delete) unreferenced metadata [default: cleanup] | No | - |
--metadata-only / --no-metadata-only |
Check document metadata only [default: no-metadata-only] | No | - |
--dataset-metadata-only / --no-dataset-metadata-only |
Compute dataset metadata only [default: no-dataset-metadata-only] | No | - |
--help |
Show this message and exit. | No | - |
ftm-datalake get
Retrieve a file from dataset archive and write to out uri (default: stdout)
Usage
ftm-datalake get [OPTIONS] KEY
Arguments
Name | Description | Required |
---|---|---|
KEY |
[required] | No |
Options
Name | Description | Required | Default |
---|---|---|---|
-o TEXT |
[default: -] | No | - |
--help |
Show this message and exit. | No | - |
ftm-datalake head
Retrieve a file info from dataset archive and write to out uri (default: stdout)
Usage
ftm-datalake head [OPTIONS] KEY
Arguments
Name | Description | Required |
---|---|---|
KEY |
[required] | No |
Options
Name | Description | Required | Default |
---|---|---|---|
-o TEXT |
[default: -] | No | - |
--help |
Show this message and exit. | No | - |
ftm-datalake ls
List all files in dataset archive
Usage
ftm-datalake ls [OPTIONS]
Arguments
No arguments available
Options
Name | Description | Required | Default |
---|---|---|---|
-o TEXT |
[default: -] | No | - |
--keys / --no-keys |
Show only keys [default: no-keys] | No | - |
--checksums / --no-checksums |
Show only checksums [default: no-checksums] | No | - |
--help |
Show this message and exit. | No | - |
ftm-datalake crawl
Crawl documents from local or remote sources
Usage
ftm-datalake crawl [OPTIONS] URI
Arguments
Name | Description | Required |
---|---|---|
URI |
[required] | No |
Options
Name | Description | Required | Default |
---|---|---|---|
-o TEXT |
Write results to this destination [default: -] | No | - |
--skip-existing / --no-skip-existing |
Skip already existing files (doesn't check actual similarity) [default: skip-existing] | No | - |
--extract / --no-extract |
Extract archives via patool [default: no-extract] |
No | - |
--extract-keep-source / --no-extract-keep-source |
Keep the source archive when extracting [default: no-extract-keep-source] | No | - |
--extract-ensure-subdir / --no-extract-ensure-subdir |
Ensure a subdirectory with the package filename when extracting [default: no-extract-ensure-subdir] | No | - |
--exclude TEXT |
Exclude paths glob pattern | No | - |
--include TEXT |
Include paths glob pattern | No | - |
--help |
Show this message and exit. | No | - |
ftm-datalake export
Export a complete dataset in LeakRFC format
Usage
ftm-datalake export [OPTIONS] OUT
Arguments
Name | Description | Required |
---|---|---|
OUT |
[required] | No |
Options
Name | Description | Required | Default |
---|---|---|---|
--help |
Show this message and exit. | No | - |
ftm-datalake memorious
Memorious related operations
Usage
ftm-datalake memorious sync [OPTIONS]
Arguments
No arguments available
Options
Name | Description | Required | Default |
---|---|---|---|
--help |
Show this message and exit. | No | - |
-i TEXT |
[required] | No | - |
--name-only / --no-name-only |
Use only file name as key [default: no-name-only] | No | - |
--strip-prefix TEXT |
Strip from file key prefix | No | - |
--key-template TEXT |
Template to generate key | No | - |
--help |
Show this message and exit. | No | - |
ftm-datalake aleph
Aleph related operations
Usage
ftm-datalake aleph load-catalog [OPTIONS] URI
Arguments
Name | Description | Required |
---|---|---|
URI |
Dataset index.json uri | Yes |
URI |
Catalog index.json uri | Yes |
Options
Name | Description | Required | Default |
---|---|---|---|
--help |
Show this message and exit. | No | - |
--host TEXT |
Aleph host | No | - |
--api-key TEXT |
Aleph api key | No | - |
--folder TEXT |
Base folder path | No | - |
--foreign-id TEXT |
Aleph foreign_id (if different from dataset) | No | - |
--metadata / --no-metadata |
Update collection metadata [default: metadata] | No | - |
--help |
Show this message and exit. | No | - |
--host TEXT |
Aleph host | No | - |
--api-key TEXT |
Aleph api key | No | - |
--foreign-id TEXT |
Aleph foreign_id (if different from dataset) | No | - |
--metadata / --no-metadata |
Update collection metadata [default: metadata] | No | - |
--help |
Show this message and exit. | No | - |
--host TEXT |
Aleph host | No | - |
--api-key TEXT |
Aleph api key | No | - |
--metadata / --no-metadata |
Update collection metadata [default: metadata] | No | - |
--help |
Show this message and exit. | No | - |