Facets & Aggregations
Analyze search results through aggregations to understand data distribution and patterns.
See significant terms aggregations as well.
Basic usage
# Single facet
openaleph-search search query-string "corruption" --args "facet=dataset"
# Multiple facets
openaleph-search search query-string "investigation" \
--args "facet=dataset&facet=schema&facet=countries"
Types
Terms aggregations
Count distinct values in keyword fields.
Cardinality aggregations
Get total count of distinct values.
openaleph-search search query-string "investigation" \
--args "facet=countries&facet_total:countries=true"
Date histogram aggregations
Group results by time intervals.
openaleph-search search query-string "transaction" \
--args "facet=created_at&facet_interval:created_at=month"
Parameters
facet
Field name to facet on.
facet_size:FIELD
Number of values to return (default: 20).
facet_total:FIELD
Include total distinct count.
facet_values:FIELD
Return actual values (default: true).
# Only counts, no values
--args "facet=entities&facet_values:entities=false&facet_total:entities=true"
facet_interval:FIELD
Time interval for date fields.
Intervals: year, quarter, month, week, day, hour, minute
facet_type:FIELD
Aggregation type for special fields.
Response format
Aggregations appear in the aggregations section:
{
"hits": {...},
"aggregations": {
"dataset.values": {
"buckets": [
{"key": "panama_papers", "doc_count": 1250},
{"key": "paradise_papers", "doc_count": 890}
]
},
"schema.cardinality": {
"value": 12
},
"created_at.intervals": {
"buckets": [
{
"key": 1609459200000,
"key_as_string": "2021-01-01",
"doc_count": 145
}
]
},
"names.significant_terms": {
"buckets": [
{
"key": "mossack fonseca",
"doc_count": 25,
"score": 0.8745,
"bg_count": 100
}
]
}
}
}
Common fields
Apart from the common group fields, individual FollowTheMoney properties can be used as well via properties.<prop>
Entity fields
schema- Entity schema typeschemata- Schema inheritance (e.g.schemata=LegalEntityincludes all its descendants)dataset- Dataset identifier
Group fields
These groups are part of the index as keyword fields:
addresseschecksumscountriesdatesemailsentitiesgendersidentifiersipslanguagesmimetypesnamesphonestopicsurls
Name fields
names- Normalized entity names (includes the NER mentions fromAnalyzableentities.)name_symbols- Name symbols (extracted fromnames)
Date histograms
Calendar intervals
openaleph-search search query-string "transaction" \
--args "facet=dates&facet_interval:dates=month"
Example values: year, quarter, month, week, day
Fixed intervals
Examples: 1h, 15m, 7d, 1M
Date range with histogram
openaleph-search search query-string "event" \
--args "filter:gte:properties.startDate=2020-01-01&filter:lte:properties.startDate=2023-12-31&facet=properties.startDate&facet_interval:properties.startDate=quarter"
Includes empty buckets within range.
Post-filters
Each facet excludes its own filters to show alternative options:
# Dataset facet shows ALL datasets, not just filtered ones
openaleph-search search query-string "company" \
--args "filter:dataset=collection1&filter:dataset=collection2&facet=dataset"
This allows users to see alternative filter options.
Performance
Execution strategy
All facets use execution_hint: map for keyword fields.
High cardinality
Fields with many unique values:
- Use facet size limits
- Monitor query performance
- Consider sampling for large datasets
Examples
Multi-facet analysis
openaleph-search search query-string "investigation" \
--args "facet=dataset&facet=schema&facet=countries&facet=created_at&facet_interval:created_at=month"
Document classification
openaleph-search search query-string "*" \
--args "filter:schemata=Document&facet=properties.mimeType&facet=languages&facet_size:properties.mimeType=100"
Entity network
openaleph-search search query-string "person" \
--args "filter:schema=Person&facet=dataset&facet=countries"
Temporal trends
openaleph-search search query-string "company" \
--args "facet=schema&facet=created_at&facet_interval:created_at=year&facet_size:schema=50"
Error handling
Invalid fields
Non-existent fields return empty results:
Type mismatches
Requesting histograms on non-date fields falls back to term aggregation.
Authorization failures
Restricted fields return empty results while maintaining query functionality.