devscope.io

“totalTermFreq must be at least docFreq” error after upgrading to 8.4.2

elastic/elasticsearch

Issue

Elasticsearch Version

8.4.2 (in Docker)

Installed Plugins

Whatever comes in the Elasticsearch Docker image

Java Version

bundled (java -version doesn't work inside the Docker image)

OS Version

Whatever's in the Elasticsearch Docker image:

$ uname -a
Linux c8dae8d214e8 5.10.76-linuxkit #1 SMP Mon Nov 8 10:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

We're seeing the following error from queries in Elasticsearch 8.4.2:

totalTermFreq must be at least docFreq, totalTermFreq: 413958, docFreq: 413959

These are queries that worked successfully in Elasticsearch 8.4.1.

The errors are reproducible, but don't affect all queries.

The numbers in the error message seem to be stable.

Steps to Reproduce

The index mapping ```json { "dynamic": "strict", "properties": { "edition": { "type": "text" }, "notes": { "type": "text" }, "physicalDescription": { "type": "text" } } } ```
An example document ```json { "notes": [ "This material has been provided by University of Bristol Library. The original may be consulted at University of Bristol Library." ], "physicalDescription": "xv, 276 pages ; 19 cm", "edition": "New [3rd] ed." } ```
The query ```json { "bool": { "should": [ { "multi_match": { "query": "1", "fields": [ "physicalDescription", "edition", "notes" ], "type": "cross_fields", "operator": "And", "_name": "data" } } ] } } ```
The error ```json { "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "totalTermFreq must be at least docFreq, totalTermFreq: 413958, docFreq: 413959" } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": 0, "index": "works-indexed-2022-08-24", "node": "ZTz1Ump1THGC5lMd9VL3XQ", "reason": { "type": "illegal_argument_exception", "reason": "totalTermFreq must be at least docFreq, totalTermFreq: 413958, docFreq: 413959" } } ], "caused_by": { "type": "illegal_argument_exception", "reason": "totalTermFreq must be at least docFreq, totalTermFreq: 413958, docFreq: 413959", "caused_by": { "type": "illegal_argument_exception", "reason": "totalTermFreq must be at least docFreq, totalTermFreq: 413958, docFreq: 413959" } } }, "status": 400 } ```

Steps to reproduce

  1. Start an instance of the Elasticsearch Docker container:

    docker run \
      --env xpack.security.enabled=false \
      --env discovery.type=single-node \
      --publish 9200:9200 \
      -it docker.elastic.co/elasticsearch/elasticsearch:8.4.2

    (These settings may not all be required, but they're how I usually the local Docker image and they avoid having to do the password/CA certs dance. I’m running Docker on macOS, although I don’t think it makes a difference.)

  2. Download the following data set from S3: https://wellcomecollection-data-public-delta.s3.eu-west-1.amazonaws.com/elasticsearch-issue-files/works.json.gz (40MB)

    This contains ~900k documents with a minimal set of fields to cause this error – if I reduce the number of documents or fields, the error goes away.

    (Corpus is CC-BY 4.0 licensed Wellcome Collection, similar to https://developers.wellcomecollection.org/docs/datasets)

  3. Run the attached Python script, which will:

    • Create a new index with our index mapping
    • Load the contents of works.json.gz into the index
    • Make a minimal query that induces the error


    repro.py ```python import gzip import json import random from elasticsearch import Elasticsearch from elasticsearch.helpers import bulk client = Elasticsearch("http://localhost:9200") index = f"example-{random.randint(0, 10000)}" resp = client.indices.create( index=index, mappings={ "dynamic": "strict", "properties": { "edition": {"type": "text"}, "notes": {"type": "text"}, "physicalDescription": {"type": "text"}, }, }, ) print("Create index resp:") print(resp) print("") def get_actions(): for line in gzip.open("works.json.gz"): yield {"_index": index, "_op_type": "index", "_source": json.loads(line)} bulk_resp = bulk(client, get_actions()) print("Bulk resp:") print(bulk_resp) print("") query_resp = client.search( index=index, query={ "bool": { "should": [ { "multi_match": { "query": "1", "fields": [ "physicalDescription", "edition", "notes", ], "type": "cross_fields", "operator": "And", "_name": "data", } } ] } }, ) print("Query resp:") print(json.dumps(query_resp.raw, indent=2, sort_keys=True)) ```

Expected behaviour

The search request returns a list of results.

Actual behaviour

We get an error from the Elasticsearch Python library:

elasticsearch.BadRequestError: BadRequestError(400, 'search_phase_execution_exception', 'totalTermFreq must be at least docFreq, totalTermFreq: 287039, docFreq: 287040')

Notes

  • If I run my Python script against the 8.4.1 Docker image, the error doesn't reproduce.

  • I found an issue with a similar error message from 7.0.1: https://github.com/elastic/elasticsearch/issues/41934

    Notably, the fix for that issue (https://github.com/elastic/elasticsearch/pull/41938) mentions cross_fields, which we're using in our query.

  • If I remove any of the fields from the query, the error goes away.

  • I’m using the Elasticsearch Docker image for the sake of an easy reproduction case, but we're seeing this issue in our managed Elastic Cloud clusters also. We actually see the error in two different clusters:

    • a cluster that was created and populated on 8.4.1, then updated to 8.4.2
    • a cluster that was created and populated on 8.4.2
  • Our actual documents are quite a bit larger, and the query more complicated. We can share more details if it would be useful, but I figured you'd prefer the minimal version.

  • The numbers in the error message seem to vary depending on which Elastic node handles the request, but each node returns a consistent set of numbers. e.g. if I run this query in our Elastic Cloud cluster, it gets handled by one of two nodes:

    • error.failed_shards[0].node = ZTz1Ump1THGC5lMd9VL3XQ gets the error "totalTermFreq must be at least docFreq, totalTermFreq: 413958, docFreq: 413959"
    • error.failed_shards[0].node = qhmqpnpCQhe9KKSKYWQfNw gets the error "totalTermFreq must be at least docFreq, totalTermFreq: 437539, docFreq: 437540"

    If I run the query repeatedly, each node returns consistent numbers.

Logs (if relevant)

No response

2022-09-22 18:55:07


Add a Comment


Top 3 Comments

  jtibshirani answered on 2022-09-22 22:29:51

I uploaded a fix here: https://github.com/elastic/elasticsearch/pull/90278. I tried running the reproduction script, and the query now completes successfully.

The fix will be available in 8.4.3. The bug was also present in 7.17.7 and 8.5.0, but we caught it early enough that it'll be fixed before those are released.

3 positive reactions.
  nik9000 answered on 2022-09-22 19:26:09

"type": "cross_fields",

I believe the bug is with cross_fields. @jtibshirani might be able to confirm. I'll see if I can reproduce and get a stack trace.

0 positive reactions.
  elasticsearchmachine answered on 2022-09-22 19:25:36

Pinging @elastic/es-search (Team:Search)

0 positive reactions.

Quick Hint

Does Elasticsearch use nginx?

Elasticsearch has a few features designed for scalability, but you can free up resources on your Elasticsearch servers by offloading the load balancing of requests to NGINX Open Source or NGINX Plus, which has even more enterprise‑grade features.

Repo Information


Age 12yrs
Vendor elastic
Repo Name elasticsearch
Primary Language Java
Default Branch main
Last Update 11 hours ago

Elastic's Code Library

Similar Issues

💾 apm-agent-dotnet Publish artifacts in CI on non-PRs 💬 4 closed 🗓️ 1 day ago
💾 apm-agent-dotnet Use debug log level when dropping spans 💬 3 closed 🗓️ 1 day ago