similarity

Similarity operations enable fuzzy text searching using PostgreSQL's pg_trgm extension. Dream provides three similarity operators with configurable scoring thresholds for different matching strategies.

Setup

Before using similarity operations, you need to set up the pg_trgm extension and optionally create GIN indexes for better performance:

// In a migration file
import { DreamMigrationHelpers } from '@rvoh/dream'
import { Kysely } from 'kysely'

export async function up(db: Kysely<any>): Promise<void> {
  // Create the pg_trgm extension (only needed once per database)
  await DreamMigrationHelpers.createExtension(db, 'pg_trgm')

  // Create GIN index for better performance (optional but recommended)
  await DreamMigrationHelpers.createGinIndex(db, 'users_name_gin_index', {
    table: 'users',
    column: 'name',
  })
}

export async function down(db: Kysely<any>): Promise<void> {
  await db.schema.dropIndex('users_name_gin_index').execute()
}

Similarity Operators

PostgreSQL's pg_trgm extension provides three distinct similarity functions: similarity() compares the overall similarity between two strings using trigrams, word_similarity() finds the best matching word within a string by comparing each word individually, and strict_word_similarity() performs a more rigorous word-by-word comparison that requires stronger matches. Each function uses different algorithms and scoring methods, making them suitable for different types of fuzzy search scenarios.

ops.similarity()

The similarity() operator uses trigram similarity with a default score threshold of 0.3. It's the most flexible similarity operator and works well for general fuzzy matching:

import { ops } from '@rvoh/dream'

const titleSearchedRecipes = await Recipe.passthrough({ locale: 'en-US' })
  .innerJoin('currentRecipeText', {
    and: {
      title: ops.strictWordSimilarity(this.query),
    },
  })
  .cursorPaginate({ cursor })

ops.wordSimilarity()

The wordSimilarity() operator uses word-based similarity with a default score threshold of 0.5. It's better at matching individual words within longer text:

import { ops } from '@rvoh/dream'

const fullTextSearchedRecipes = await Recipe.passthrough({ locale: 'en-US' })
  .innerJoin('currentAggregateRecipeText', {
    and: {
      aggregate: ops.wordSimilarity(this.query),
    },
  })
  .cursorPaginate({ cursor })

ops.strictWordSimilarity()

The strictWordSimilarity() operator uses strict word-based similarity with a default score threshold of 0.6. It provides the most precise matching:

import { ops } from '@rvoh/dream'

const fullTextSearchedRecipes = await Recipe.passthrough({ locale: 'en-US' })
  .innerJoin('currentAggregateRecipeText', {
    and: {
      aggregate: ops.strictWordSimilarity(this.query),
    },
  })
  .cursorPaginate({ cursor })

Score thresholds

All similarity operators accept an optional score parameter that controls the matching threshold:

0.0 - Very loose matching (matches almost anything)
0.3 - Default for similarity() - good balance of recall and precision
0.5 - Default for wordSimilarity() - moderate precision
0.6 - Default for strictWordSimilarity() - high precision
1.0 - Exact match only

// Low threshold - more results, less precise
const looseResults = await User.where({
  name: ops.similarity('john', { score: 0.1 }),
}).all()

// High threshold - fewer results, more precise
const preciseResults = await User.where({
  name: ops.similarity('john', { score: 0.8 }),
}).all()

GIN indexes

For optimal performance on large datasets, create GIN indexes on columns you'll search frequently:

// In your migration
await DreamMigrationHelpers.createGinIndex(db, 'posts_title_gin_index', {
  table: 'posts',
  column: 'title',
})

await DreamMigrationHelpers.createGinIndex(db, 'posts_content_gin_index', {
  table: 'posts',
  column: 'content',
})

Setup​

Similarity Operators​

ops.similarity()​

ops.wordSimilarity()​

ops.strictWordSimilarity()​

Score thresholds​

GIN indexes​