Skip to content

Highlighting

Show where search terms appear in results with highlighted text snippets.

Elasticsearch documentation highlighters

Basic usage

Enable highlighting via query parameters:

openaleph-search search query-string "corruption" --args "highlight=true"

Parameters

highlight

Enable highlighting.

  • Type: bool
  • Default: false

highlight_count

Number of snippets per document.

  • Type: int
  • Default: 3
  • Use 0 to return full highlighted text
--args "highlight=true&highlight_count=5"

max_highlight_analyzed_offset

Maximum characters to analyze per document.

  • Type: int
  • Default: 999999

Reduce this value for better performance on large documents:

--args "highlight=true&max_highlight_analyzed_offset=500000"

Highlighter types

The system uses different Elasticsearch highlighters optimized for each field type:

Unified Highlighter

Used for: content (document text), text (secondary text), translation (translated text), name (entity names)

The default highlighter for all fields. Balanced performance with good support for mixed content.

Fast Vector Highlighter (FVH)

Optionally used for the content field. Provides more accurate phrase highlighting (wraps entire phrases in a single <em> tag) but requires term vectors. Disabled by default because it is incompatible with copy_to fields excluded from _source — for entities where multiple properties copy into content (e.g. HyperText with both bodyHtml and indexText), FVH causes term vector offset mismatches that drop hits from results.

Configuration (via environment):

  • OPENALEPH_SEARCH_HIGHLIGHTER_FVH_ENABLED=false (default)
  • Requires OPENALEPH_SEARCH_CONTENT_TERM_VECTORS=true (default) when enabled

Plain Highlighter

Used for: names (keywords)

Fast highlighting for simple keyword matches.

Configuration

Control highlighting behavior via environment variables:

highlighter_fvh_enabled

Use Fast Vector Highlighter for content field.

export OPENALEPH_SEARCH_HIGHLIGHTER_FVH_ENABLED=false

Default: false. When false, uses Unified Highlighter instead. See Highlighter types for trade-offs.

highlighter_fragment_size

Characters per snippet.

export OPENALEPH_SEARCH_HIGHLIGHTER_FRAGMENT_SIZE=200

Default: 200

highlighter_number_of_fragments

Snippets per document.

export OPENALEPH_SEARCH_HIGHLIGHTER_NUMBER_OF_FRAGMENTS=3

Default: 3

highlighter_phrase_limit

Maximum phrases to analyze per document.

export OPENALEPH_SEARCH_HIGHLIGHTER_PHRASE_LIMIT=64

Default: 64

Lower values improve performance but may miss some matches.

highlighter_boundary_max_scan

Characters to scan for sentence boundaries.

export OPENALEPH_SEARCH_HIGHLIGHTER_BOUNDARY_MAX_SCAN=100

Default: 100

highlighter_no_match_size

Fragment size when no match found.

export OPENALEPH_SEARCH_HIGHLIGHTER_NO_MATCH_SIZE=300

Default: 300

highlighter_max_analyzed_offset

Maximum characters to analyze.

export OPENALEPH_SEARCH_HIGHLIGHTER_MAX_ANALYZED_OFFSET=100000

Default: 100000

Response format

Highlighted results appear in the highlight field:

{
  "hits": {
    "hits": [
      {
        "_id": "doc-123",
        "_source": {...},
        "highlight": {
          "content": [
            "Evidence of <em>corruption</em> was found...",
            "The <em>investigation</em> revealed..."
          ],
          "name": [
            "<em>John Smith</em>"
          ]
        }
      }
    ]
  }
}

Matched terms are wrapped in <em> tags.

Fields highlighted

Multiple fields are highlighted automatically:

  • content - Main document text (primary highlight field for entities)
  • names - Normalized name keywords
  • text - Secondary text content (catch-all copy_to target)
  • translation - Translated text content

The text and translation highlight fields can be disabled via settings:

  • OPENALEPH_SEARCH_HIGHLIGHTER_TEXT_FIELD=false
  • OPENALEPH_SEARCH_HIGHLIGHTER_TRANSLATION_FIELD=false

Examples

Basic highlighting

openaleph-search search query-string "money laundering" --args "highlight=true"

More snippets

openaleph-search search query-string "investigation" \
  --args "highlight=true&highlight_count=5"

Full text highlighting

openaleph-search search query-string "evidence" \
  --args "highlight=true&highlight_count=0"

Limited document size

openaleph-search search query-string "report" \
  --args "highlight=true&max_highlight_analyzed_offset=100000"

With filters

openaleph-search search query-string "corruption" \
  --args "filter:schema=Document&filter:countries=us&highlight=true"

Performance considerations

Index size

Fast Vector Highlighter requires term vectors, which increase index size by approximately 20-30%.

Disable term vectors if index storage size is a serious concern:

export OPENALEPH_SEARCH_CONTENT_TERM_VECTORS=false

Requires reindexing to take effect.

Query performance

  • More snippets (highlight_count) = slower queries
  • Larger documents = slower highlighting
  • Lower phrase_limit = faster but less accurate
  • Reduce max_highlight_analyzed_offset for large documents

Optimization tips

For better performance:

# Reduce snippets
--args "highlight=true&highlight_count=2"

# Limit analyzed text
--args "highlight=true&max_highlight_analyzed_offset=500000"

# Use dehydration with highlighting
--args "highlight=true&dehydrate=true"

Troubleshooting

No highlights returned

Check that: - highlight=true parameter is set - Query matches terms in highlightable fields - Terms exist in analyzed fields (not just keyword fields)

Incomplete highlights

  • Document may exceed max_highlight_analyzed_offset
  • Query may exceed phrase_limit
  • Field may not have term vectors enabled

Slow highlighting

  • Reduce highlight_count
  • Lower max_highlight_analyzed_offset
  • Decrease phrase_limit
  • The unified highlighter (default) generally performs well; FVH is not recommended due to compatibility issues with copy_to fields