Skip to content

batched cosine / euclidean procedure for more efficient computation of vector similarities #4447

@jexp

Description

@jexp

apoc.ml/algo.cosine(Similarity) or sth like that.

Parameters:

  • list of nodes
  • property name
  • target embedding
  • top-k
  • threshold

Efficient implementation without multiple conversions of data

  • Kernel API for property access?
  • SIMD / Java Vector API ?
  • Early filtering and stop when top-k is reached or threshold not passed
  • Stream results out

Compare large scale performance, with say 1000 / 10k nodes compared with the genai.vector.consine function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Blocked

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions