Highlighting

Show where search terms appear in results with highlighted text snippets.

Elasticsearch documentation highlighters

Basic usage

Enable highlighting via query parameters:

openaleph-search search query-string "corruption" --args "highlight=true"

Parameters

`highlight`

Enable highlighting.

Type: bool
Default: false

`highlight_count`

Number of snippets per document.

Type: int
Default: 3
Use 0 to return full highlighted text

--args "highlight=true&highlight_count=5"

`max_highlight_analyzed_offset`

Maximum characters to analyze per document.

Type: int
Default: 999999

Reduce this value for better performance on large documents:

--args "highlight=true&max_highlight_analyzed_offset=500000"

Highlighter types

The system uses different Elasticsearch highlighters optimized for each field type:

Unified Highlighter

Used for: content (document text), text (secondary text), translation (translated text), name (entity names)

The default highlighter for all fields. Balanced performance with good support for mixed content.

Fast Vector Highlighter (FVH)

Optionally used for the content field. Provides more accurate phrase highlighting (wraps entire phrases in a single <em> tag) but requires term vectors. Disabled by default because it is incompatible with copy_to fields excluded from _source — for entities where multiple properties copy into content (e.g. HyperText with both bodyHtml and indexText), FVH causes term vector offset mismatches that drop hits from results.

Configuration (via environment):

OPENALEPH_SEARCH_HIGHLIGHTER_FVH_ENABLED=false (default)
Requires OPENALEPH_SEARCH_CONTENT_TERM_VECTORS=true (default) when enabled

Plain Highlighter

Used for: names (keywords)

Fast highlighting for simple keyword matches.

Configuration

Control highlighting behavior via environment variables:

`highlighter_fvh_enabled`

Use Fast Vector Highlighter for content field.

export OPENALEPH_SEARCH_HIGHLIGHTER_FVH_ENABLED=false

Default: false. When false, uses Unified Highlighter instead. See Highlighter types for trade-offs.

`highlighter_fragment_size`

Characters per snippet.

export OPENALEPH_SEARCH_HIGHLIGHTER_FRAGMENT_SIZE=200

Default: 200

`highlighter_number_of_fragments`

Snippets per document.

export OPENALEPH_SEARCH_HIGHLIGHTER_NUMBER_OF_FRAGMENTS=3

Default: 3

`highlighter_phrase_limit`

Maximum phrases to analyze per document.

export OPENALEPH_SEARCH_HIGHLIGHTER_PHRASE_LIMIT=64

Default: 64

Lower values improve performance but may miss some matches.

`highlighter_boundary_max_scan`

Characters to scan for sentence boundaries.

export OPENALEPH_SEARCH_HIGHLIGHTER_BOUNDARY_MAX_SCAN=100

Default: 100

`highlighter_no_match_size`

Fragment size when no match found.

export OPENALEPH_SEARCH_HIGHLIGHTER_NO_MATCH_SIZE=300

Default: 300

`highlighter_max_analyzed_offset`

Maximum characters to analyze.

export OPENALEPH_SEARCH_HIGHLIGHTER_MAX_ANALYZED_OFFSET=100000

Default: 100000

Response format

Highlighted results appear in the highlight field:

{
  "hits": {
    "hits": [
      {
        "_id": "doc-123",
        "_source": {...},
        "highlight": {
          "content": [
            "Evidence of <em>corruption</em> was found...",
            "The <em>investigation</em> revealed..."
          ],
          "name": [
            "<em>John Smith</em>"
          ]
        }
      }
    ]
  }
}

Matched terms are wrapped in <em> tags.

Fields highlighted

Multiple fields are highlighted automatically:

content - Main document text (primary highlight field for entities)
names - Normalized name keywords
text - Secondary text content (catch-all copy_to target)
translation - Translated text content

The text and translation highlight fields can be disabled via settings:

OPENALEPH_SEARCH_HIGHLIGHTER_TEXT_FIELD=false
OPENALEPH_SEARCH_HIGHLIGHTER_TRANSLATION_FIELD=false

Examples

Basic highlighting

openaleph-search search query-string "money laundering" --args "highlight=true"

More snippets

openaleph-search search query-string "investigation" \
  --args "highlight=true&highlight_count=5"

Full text highlighting

openaleph-search search query-string "evidence" \
  --args "highlight=true&highlight_count=0"

Limited document size

openaleph-search search query-string "report" \
  --args "highlight=true&max_highlight_analyzed_offset=100000"

With filters

openaleph-search search query-string "corruption" \
  --args "filter:schema=Document&filter:countries=us&highlight=true"

Performance considerations

Index size

Fast Vector Highlighter requires term vectors, which increase index size by approximately 20-30%.

Disable term vectors if index storage size is a serious concern:

export OPENALEPH_SEARCH_CONTENT_TERM_VECTORS=false

Requires reindexing to take effect.

Query performance

More snippets (highlight_count) = slower queries
Larger documents = slower highlighting
Lower phrase_limit = faster but less accurate
Reduce max_highlight_analyzed_offset for large documents

Optimization tips

For better performance:

# Reduce snippets
--args "highlight=true&highlight_count=2"

# Limit analyzed text
--args "highlight=true&max_highlight_analyzed_offset=500000"

# Use dehydration with highlighting
--args "highlight=true&dehydrate=true"

Troubleshooting

No highlights returned

Check that: - highlight=true parameter is set - Query matches terms in highlightable fields - Terms exist in analyzed fields (not just keyword fields)

Incomplete highlights

Document may exceed max_highlight_analyzed_offset
Query may exceed phrase_limit
Field may not have term vectors enabled

Slow highlighting

Reduce highlight_count
Lower max_highlight_analyzed_offset
Decrease phrase_limit
The unified highlighter (default) generally performs well; FVH is not recommended due to compatibility issues with copy_to fields