Highlighting
Show where search terms appear in results with highlighted text snippets.
Elasticsearch documentation highlighters
Basic usage
Enable highlighting via query parameters:
Parameters
highlight
Enable highlighting.
- Type:
bool - Default:
false
highlight_count
Number of snippets per document.
- Type:
int - Default:
3 - Use
0to return full highlighted text
max_highlight_analyzed_offset
Maximum characters to analyze per document.
- Type:
int - Default:
999999
Reduce this value for better performance on large documents:
Highlighter types
The system uses different Elasticsearch highlighters optimized for each field type:
Unified Highlighter
Used for: content (document text), text (secondary text), translation (translated text), name (entity names)
The default highlighter for all fields. Balanced performance with good support for mixed content.
Fast Vector Highlighter (FVH)
Optionally used for the content field. Provides more accurate phrase highlighting (wraps entire phrases in a single <em> tag) but requires term vectors. Disabled by default because it is incompatible with copy_to fields excluded from _source — for entities where multiple properties copy into content (e.g. HyperText with both bodyHtml and indexText), FVH causes term vector offset mismatches that drop hits from results.
Configuration (via environment):
OPENALEPH_SEARCH_HIGHLIGHTER_FVH_ENABLED=false(default)- Requires
OPENALEPH_SEARCH_CONTENT_TERM_VECTORS=true(default) when enabled
Plain Highlighter
Used for: names (keywords)
Fast highlighting for simple keyword matches.
Configuration
Control highlighting behavior via environment variables:
highlighter_fvh_enabled
Use Fast Vector Highlighter for content field.
Default: false. When false, uses Unified Highlighter instead. See Highlighter types for trade-offs.
highlighter_fragment_size
Characters per snippet.
Default: 200
highlighter_number_of_fragments
Snippets per document.
Default: 3
highlighter_phrase_limit
Maximum phrases to analyze per document.
Default: 64
Lower values improve performance but may miss some matches.
highlighter_boundary_max_scan
Characters to scan for sentence boundaries.
Default: 100
highlighter_no_match_size
Fragment size when no match found.
Default: 300
highlighter_max_analyzed_offset
Maximum characters to analyze.
Default: 100000
Response format
Highlighted results appear in the highlight field:
{
"hits": {
"hits": [
{
"_id": "doc-123",
"_source": {...},
"highlight": {
"content": [
"Evidence of <em>corruption</em> was found...",
"The <em>investigation</em> revealed..."
],
"name": [
"<em>John Smith</em>"
]
}
}
]
}
}
Matched terms are wrapped in <em> tags.
Fields highlighted
Multiple fields are highlighted automatically:
content- Main document text (primary highlight field for entities)names- Normalized name keywordstext- Secondary text content (catch-allcopy_totarget)translation- Translated text content
The text and translation highlight fields can be disabled via settings:
OPENALEPH_SEARCH_HIGHLIGHTER_TEXT_FIELD=falseOPENALEPH_SEARCH_HIGHLIGHTER_TRANSLATION_FIELD=false
Examples
Basic highlighting
More snippets
Full text highlighting
Limited document size
openaleph-search search query-string "report" \
--args "highlight=true&max_highlight_analyzed_offset=100000"
With filters
openaleph-search search query-string "corruption" \
--args "filter:schema=Document&filter:countries=us&highlight=true"
Performance considerations
Index size
Fast Vector Highlighter requires term vectors, which increase index size by approximately 20-30%.
Disable term vectors if index storage size is a serious concern:
Requires reindexing to take effect.
Query performance
- More snippets (
highlight_count) = slower queries - Larger documents = slower highlighting
- Lower
phrase_limit= faster but less accurate - Reduce
max_highlight_analyzed_offsetfor large documents
Optimization tips
For better performance:
# Reduce snippets
--args "highlight=true&highlight_count=2"
# Limit analyzed text
--args "highlight=true&max_highlight_analyzed_offset=500000"
# Use dehydration with highlighting
--args "highlight=true&dehydrate=true"
Troubleshooting
No highlights returned
Check that:
- highlight=true parameter is set
- Query matches terms in highlightable fields
- Terms exist in analyzed fields (not just keyword fields)
Incomplete highlights
- Document may exceed
max_highlight_analyzed_offset - Query may exceed
phrase_limit - Field may not have term vectors enabled
Slow highlighting
- Reduce
highlight_count - Lower
max_highlight_analyzed_offset - Decrease
phrase_limit - The unified highlighter (default) generally performs well; FVH is not recommended due to compatibility issues with
copy_tofields