Skip to content

Transforms

Transforms normalise a plaintext value before it is passed to the HMAC function. Normalisation ensures that logically equivalent inputs — differing only in case or whitespace — produce the same blind index fingerprint.

Without transforms, searching for "jane@example.com" would not match a record saved with "Jane@Example.com", even though they refer to the same address.

Configuring Transforms

Transforms are declared on the [BlindIndex] attribute as an ordered array of strings:

csharp
[BlindIndex(
    CompanionProperty = nameof(EmailHash),
    Transforms = ["lowercase", "trim"])]
public string Email { get; set; } = "";

Transforms are applied left to right. The example above first converts to lowercase, then strips surrounding whitespace.

Built-In Transforms

lowercase

Converts the entire value to lowercase using the invariant culture.

InputOutput
"Jane@Example.COM""jane@example.com"
"ACME Corp""acme corp"
"already lower""already lower"

Use on: email addresses, usernames, domain names.


trim

Removes leading and trailing whitespace (spaces, tabs, newlines).

InputOutput
" jane@example.com ""jane@example.com"
"\tjane\n""jane"
"no change""no change"

Use on: any field where trailing spaces might appear from user input or data imports.


alphanumeric

Removes all characters that are not ASCII letters (a–z, A–Z) or digits (0–9). Useful for normalising names or identifiers that might contain punctuation.

InputOutput
"O'Brien""OBrien"
"Smith-Jones""SmithJones"
"+1 (555) 867-5309""15558675309"

Combine with lowercase for case-insensitive matching

alphanumeric alone does not change case. Use ["lowercase", "alphanumeric"] if you want case-insensitive matching.


digits

Retains only ASCII digit characters (0–9). All other characters are removed. Designed for phone numbers, tax IDs, and other numeric identifiers.

InputOutput
"+1 (555) 867-5309""15558675309"
"SSN: 123-45-6789""123456789"
"GB VAT 123 456 789""123456789"

last4

Retains only the last 4 characters of the value after all other characters have been processed. Commonly used for partial credit card or SSN matching.

InputOutput
"4111111111111111""1111"
"123-45-6789""6789"
"AB12""AB12"
"AB""AB" (shorter than 4 — returned as-is)

Combine last4 with digits for card numbers

Use ["digits", "last4"] to strip formatting characters before taking the last four digits. This ensures "4111-1111-1111-1111" and "4111111111111111" produce the same result.


first_char

Retains only the first character of the value. Useful for bucketed or initial-based lookups.

InputOutput
"Jane""J"
"jane""j"
"""" (empty string is preserved)

Low cardinality warning

first_char produces at most 26 distinct values (plus digits and symbols). This is a very low-cardinality blind index and is susceptible to frequency analysis. See Security Considerations.


Transform Ordering

Transforms are applied in the order they are declared. Order matters.

Example: ["trim", "lowercase", "digits"]

Input:  "  +1 (555) 867-5309  "
  trim →  "+1 (555) 867-5309"
  lowercase → "+1 (555) 867-5309"  (no letters, no change)
  digits → "15558675309"

Example: ["digits", "last4"]

Input:  "4111-1111-1111-1111"
  digits → "4111111111111111"
  last4 → "1111"

Reversing the order would give last4 the formatted string first, which could produce a different result depending on the trailing characters.

Custom Transforms

Use WithTransform() to add inline custom transforms in the fluent API:

cs
// Inline custom transforms — no class or registration needed
var transformServices = new ServiceCollection();
var transformBuilder = transformServices.AddTayra(opts => opts.LicenseKey = licenseKey);
transformBuilder.Entity<IndexedCustomer>(e =>
{
    e.DataSubjectId(c => c.CustomerId);
    e.PersonalData(c => c.Email);
    e.BlindIndex(c => c.Email)
        .WithTransform(value => value.Split('@')[0]) // extract local part
        .WithLowercase()
        .StoredIn(c => c.EmailIndex);
});
anchor

Custom transforms are just functions — no class or registration needed. They compose naturally with built-in transforms in the pipeline.

Custom Transform Rules

  • The function must be a pure function — same input always produces the same output.
  • The function must not throw on an empty string.
  • Transforms should be fast (no I/O, no allocations if avoidable).

Transform Reference Summary

NameEffectTypical Use
lowercaseConverts to invariant lowercaseEmail, username
trimRemoves leading/trailing whitespaceAny user-input field
alphanumericKeeps only [a-zA-Z0-9]Names, identifiers
digitsKeeps only [0-9]Phone numbers, tax IDs
last4Keeps last 4 charactersCard numbers, SSN suffix
first_charKeeps first character onlyBucketed lookups

See Also