Annotation
Warning
This is an experimental feature and is likely to change or break fast
After extracting patterns, names and mentions from an Entities text fields, ftm-analyze can store annotations in the indexText field for the extracted tokens following the specification from the markdown-like syntax of the Elasticsearch annotated text plugin.
Example bodyText of an entity:
During analysis, the email address will be detected and extracted as a pattern. Then, the resulting indexText of this Entity will contain the annotation for the emailMentioned property.
To know that this indexText is annotated, a __annotated__ prefix is added.
Parsing
Applications can parse the annotated text knowing these conventions:
- Schema annotation:
c_<schema>(it will include parent schemata)- Example:
[Jane Doe](s_Person&s_LegalEntity)
- Example:
- Fingerprints annotation (via rigour.names):
f_<value>- Example:
[Jane Doe](f_doe+jane)
- Example:
- Pattern annotation (available properties):
p_<prop>- Example:
[Jane Doe](p_namesMentioned)
- Example:
If extracted as a mentioned Person, Mrs. Jane Doe would actually look like this:
[Mrs. Jane Doe](f_doe+jane&f_mrs+jane+doe&s_Person&s_LegalEntity&p_namesMentioned&p_peopleMentioned)
Disable
Annotating into indexText is the default behaviour.
To disable this feature, set env var FTM_ANALYZE_ANNOTATE=0 or use the command-line flag (see reference)