Skip to content

With operator 0.3.7, nodes never get to vote (loop restarts) (similar to #222) #228

@OdyX

Description

@OdyX

Hello there,

I have now had time to upgrade our operator to 0.3.7, and I still experience the issue described in #222 . Again, reverting to 0.3.5 fixes these issue.

This is on a somwhat standard kubernetes, 1-replica setup:

---
apiVersion: ts.opentelekomcloud.com/v1alpha1
kind: TypesenseCluster
metadata:
  name: zebra-preprod
spec:
  image: mirror.gcr.io/typesense/typesense:29.0
  replicas: 1
  storage:
    size: "128Mi"
    storageClassName: csi-cinder-sc-delete-wait
  adminApiKey:
    name: typesense-zebra-preprod-bootstrap-key
  healthProbeTimeoutInMilliseconds: 10000
  incrementalQuorumRecovery: true
  resources:
    limits:
      cpu: 1
      memory: 768Mi
    requests:
      memory: 768Mi
  metrics:
    release: 'undefined'
    resources:
      limits:
        cpu: 100m
        memory: 64Mi
      requests:
        memory: 64Mi
  healthcheck:
    resources:
      limits:
        cpu: 100m
        memory: 32Mi
      requests:
        memory: 32Mi

With 0.3.7:

I20260129 09:56:20.883572   291 raft_server.cpp:605] Finished loading collections from disk.
I20260129 09:56:20.883666   291 raft_server.cpp:616] Loaded 0conversation model(s).
I20260129 09:56:20.883678   291 raft_server.cpp:620] Initializing batched indexer from snapshot state...
I20260129 09:56:20.883739   291 batched_indexer.cpp:635] Restored 0 in-flight requests from snapshot.
I20260129 09:56:20.883786   291 raft_server.cpp:633] Loaded 0 personalization model(s).
I20260129 09:56:20.883822   291 raft_server.h:294] Configuration of this group is 10.64.95.250:8107:8108
I20260129 09:56:20.883937   291 snapshot_executor.cpp:264] node default_group:10.64.67.24:8107:8108 snapshot_load_done, last_included_index: 6367974 last_included_term: 605 peers: "10.64.95.250:8107:8108"
I20260129 09:56:20.885480   263 raft_meta.cpp:521] Loaded single stable meta, path /usr/share/typesense/data/state/meta term 607 votedfor 0.0.0.0:0:0 time: 1235
I20260129 09:56:20.885545   263 node.cpp:608] node default_group:10.64.67.24:8107:8108 init, term: 607 last_log_id: (index=6367975,term=605) conf: 10.64.95.250:8107:8108 old_conf:
I20260129 09:56:20.885615   263 raft_server.cpp:141] Node last_index: 6367975
I20260129 09:56:20.885628   263 typesense_server_utils.cpp:309] Typesense peering service is running on 10.64.67.24:8107
I20260129 09:56:20.885643   263 typesense_server_utils.cpp:310] Snapshot interval configured as: 3600s
I20260129 09:56:20.885654   263 typesense_server_utils.cpp:311] Snapshot max byte count configured as: 4194304
W20260129 09:56:20.885668   263 controller.cpp:1550] SIGINT was installed with 1
I20260129 09:56:20.885769   263 raft_server.cpp:692] Term: 607, pending_queue: 0, last_index: 6367975, committed: 0, known_applied: 6367974, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 31837413
W20260129 09:56:20.885792   263 raft_server.cpp:717] Node with no leader. Resetting peers of size: 1
W20260129 09:56:20.885810   263 node.cpp:926] node default_group:10.64.67.24:8107:8108 set_peer from 10.64.95.250:8107:8108 to 10.64.67.47:8107:8108
I20260129 09:56:20.891244   263 raft_meta.cpp:546] Saved single stable meta, path /usr/share/typesense/data/state/meta term 608 votedfor 0.0.0.0:0:0 time: 5383
I20260129 09:56:26.165112   285 node.cpp:1579] node default_group:10.64.67.24:8107:8108 term 608 start pre_vote
W20260129 09:56:26.165217   285 node.cpp:1589] node default_group:10.64.67.24:8107:8108 can't do pre_vote as it is not in 10.64.67.47:8107:8108
> ..... More log lines, but check the timestamps
I20260129 09:57:54.755228   287 node.cpp:1579] node default_group:10.64.67.24:8107:8108 term 608 start pre_vote
W20260129 09:57:54.755301   287 node.cpp:1589] node default_group:10.64.67.24:8107:8108 can't do pre_vote as it is not in 10.64.67.47:8107:8108
I20260129 09:57:59.898531   287 node.cpp:1579] node default_group:10.64.67.24:8107:8108 term 608 start pre_vote
W20260129 09:57:59.898599   287 node.cpp:1589] node default_group:10.64.67.24:8107:8108 can't do pre_vote as it is not in 10.64.67.47:8107:8108
I20260129 09:58:00.904165   263 raft_server.cpp:692] Term: 608, pending_queue: 0, last_index: 6367975, committed: 0, known_applied: 6367974, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 31837413
W20260129 09:58:00.904224   263 raft_server.cpp:717] Node with no leader. Resetting peers of size: 1
I20260129 09:58:03.886561     1 typesense_server_utils.cpp:60] Stopping Typesense server...
> … until it dies

The same single-replica cluster, on 0.3.5:

I20260129 10:02:17.028793   287 raft_server.cpp:605] Finished loading collections from disk.
I20260129 10:02:17.028932   287 raft_server.cpp:616] Loaded 0conversation model(s).
I20260129 10:02:17.028945   287 raft_server.cpp:620] Initializing batched indexer from snapshot state...
I20260129 10:02:17.028995   287 batched_indexer.cpp:635] Restored 0 in-flight requests from snapshot.
I20260129 10:02:17.029036   287 raft_server.cpp:633] Loaded 0 personalization model(s).
I20260129 10:02:17.029070   287 raft_server.h:294] Configuration of this group is 10.64.95.250:8107:8108
I20260129 10:02:17.029173   287 snapshot_executor.cpp:264] node default_group:10.64.67.131:8107:8108 snapshot_load_done, last_included_index: 6367974 last_included_term: 605 peers: "10.64.95.250:8107:8108"
I20260129 10:02:17.030550   263 raft_meta.cpp:521] Loaded single stable meta, path /usr/share/typesense/data/state/meta term 610 votedfor 0.0.0.0:0:0 time: 1179
I20260129 10:02:17.030592   263 node.cpp:608] node default_group:10.64.67.131:8107:8108 init, term: 610 last_log_id: (index=6367975,term=605) conf: 10.64.95.250:8107:8108 old_conf:
I20260129 10:02:17.030647   263 raft_server.cpp:141] Node last_index: 6367975
I20260129 10:02:17.030664   263 typesense_server_utils.cpp:309] Typesense peering service is running on 10.64.67.131:8107
I20260129 10:02:17.030674   263 typesense_server_utils.cpp:310] Snapshot interval configured as: 3600s
I20260129 10:02:17.030683   263 typesense_server_utils.cpp:311] Snapshot max byte count configured as: 4194304
W20260129 10:02:17.030730   263 controller.cpp:1550] SIGINT was installed with 1
I20260129 10:02:17.030818   263 raft_server.cpp:692] Term: 610, pending_queue: 0, last_index: 6367975, committed: 0, known_applied: 6367974, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 31837413
W20260129 10:02:17.030834   263 raft_server.cpp:717] Node with no leader. Resetting peers of size: 1
W20260129 10:02:17.030848   263 node.cpp:926] node default_group:10.64.67.131:8107:8108 set_peer from 10.64.95.250:8107:8108 to 10.64.67.197:8107:8108
I20260129 10:02:17.036015   263 raft_meta.cpp:546] Saved single stable meta, path /usr/share/typesense/data/state/meta term 611 votedfor 0.0.0.0:0:0 time: 5138
I20260129 10:02:22.091174   278 node.cpp:1579] node default_group:10.64.67.131:8107:8108 term 611 start pre_vote
W20260129 10:02:22.091253   278 node.cpp:1589] node default_group:10.64.67.131:8107:8108 can't do pre_vote as it is not in 10.64.67.197:8107:8108
> … it tries for some time, then finally:
W20260129 10:03:37.046319   263 raft_server.cpp:717] Node with no leader. Resetting peers of size: 1
W20260129 10:03:37.046335   263 node.cpp:926] node default_group:10.64.67.131:8107:8108 set_peer from 10.64.67.197:8107:8108 to 10.64.67.131:8107:8108
I20260129 10:03:37.054266   263 raft_meta.cpp:546] Saved single stable meta, path /usr/share/typesense/data/state/meta term 612 votedfor 0.0.0.0:0:0 time: 7861
I20260129 10:03:39.442730   278 node.cpp:1579] node default_group:10.64.67.131:8107:8108 term 612 start pre_vote
I20260129 10:03:39.442939   278 node.cpp:1645] node default_group:10.64.67.131:8107:8108 term 612 start vote and grant vote self
I20260129 10:03:39.448323   287 raft_meta.cpp:546] Saved single stable meta, path /usr/share/typesense/data/state/meta term 613 votedfor 10.64.67.131:8107:8108 time: 5227
I20260129 10:03:39.448383   287 node.cpp:1899] node default_group:10.64.67.131:8107:8108 term 613 become leader of group 10.64.67.131:8107:8108
I20260129 10:03:39.453203   287 raft_server.h:294] Configuration of this group is 10.64.67.131:8107:8108
I20260129 10:03:39.453270   287 node.cpp:3298] node default_group:10.64.67.131:8107:8108 reset ConfigurationCtx, new_peers: 10.64.67.131:8107:8108, old_peers: 10.64.67.131:8107:8108
I20260129 10:03:39.453289   287 raft_server.h:277] Node becomes leader, term: 613

Happy to provide more inputs or logs. I also manage a second 3-nodes cluster, that exhibits the same issues.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions