Skip to content

Security Considerations

Blind indexes are a practical tool for querying encrypted data, but they introduce trade-offs that you must understand before deploying them in production.

HMAC vs Deterministic Encryption

Two common approaches to searchable encryption are HMAC blind indexes and deterministic encryption (e.g., AES-SIV or AES-ECB). Tayra uses HMAC:

PropertyHMAC Blind IndexDeterministic Encryption
Searchable by exact matchYesYes
Separate companion column requiredYesNo (field is directly searchable)
Invertible with the keyNoYes
Key compromise reveals valuesNoYes (decrypts directly)
Nonce reuse vulnerabilityN/APresent in some modes
Frequency analysis possibleYesYes
Compatible with standard AES-GCMYesRequires separate cipher mode
Key rotation replaces hashYesYes (re-encryption required)

HMAC is preferable because it is one-way even when the HMAC key is known. An attacker who recovers the HMAC key can re-compute hashes for guessed values, but cannot reverse a hash directly to plaintext.

Key Separation

Blind index HMAC keys are distinct from field encryption keys. They are stored in the same IKeyStore but under a different prefix (bi: by default). This has two important properties:

  1. Crypto-shredding the encryption key does not destroy the HMAC fingerprints. The companion column continues to hold the hash. This is intentional — the hash is not plaintext and is not regulated as personal data in most jurisdictions. However, review this with your legal team if you are subject to strict interpretations of GDPR Article 17.

  2. Rotating the encryption key does not invalidate blind indexes. You must separately decide whether to rotate the HMAC key. See Key Management.

Are HMAC hashes personal data?

Regulators disagree. The general consensus is that an HMAC hash is not personal data when the HMAC key is secret and the field has sufficient cardinality (many distinct values). If the key is compromised, the hashes may become personal data via guessing attacks. Apply appropriate access controls to the key store.

Frequency Analysis

Because the same plaintext always produces the same HMAC (given the same key and transforms), an attacker who can observe the companion column can perform frequency analysis:

  • The most common HMAC value corresponds to the most common plaintext value.
  • For low-cardinality fields (status, country code, gender marker), the distribution may directly reveal the plaintext distribution.

Mitigation strategies:

  1. Avoid blind indexes on low-cardinality fields. If a field has fewer than ~10,000 distinct values, a blind index may reveal meaningful statistical information.
  2. Use truncation. Truncating the HMAC output to a shorter prefix (e.g., 8 bytes instead of 32) creates intentional hash collisions, reducing the value of frequency analysis. This increases false-positive rates — your query will return multiple rows that must be decrypted and post-filtered.

Truncation is not supported in the current release

HMAC truncation is planned for a future release. Until then, avoid blind indexes on low-cardinality fields.

Brute-Force and Dictionary Attacks

An attacker who recovers the HMAC key and has access to the companion column can attempt a brute-force or dictionary attack:

  1. Generate candidate values (e.g., all email addresses from a known list).
  2. Apply the same transforms.
  3. Compute HMAC for each candidate.
  4. Compare against values in the companion column.

Risk by field type:

FieldCardinalityBrute-force risk
Email addressVery high (billions)Low
Full nameHigh (millions)Low to medium
Phone numberHigh (trillions)Low
Country code (ISO 3166)249 valuesHigh — do not use a blind index
Gender marker2–10 valuesVery high — do not use a blind index
Last 4 digits of card10,000 valuesMedium — acceptable if key is protected

Mitigation:

  • Protect the HMAC key with the same rigour as the encryption key.
  • Use a hardware-backed key store (Vault, Azure Key Vault, AWS KMS) in production.
  • Do not log or expose HMAC keys in diagnostics.

When to Use Blind Indexes

Good candidates:

  • Email addresses (high cardinality, normalized with lowercase + trim)
  • Full names (medium-high cardinality, use with lowercase)
  • Phone numbers (high cardinality, use with digits)
  • National ID numbers (high cardinality)
  • Last 4 digits of payment card (10,000 combinations — acceptable with a protected key)

Poor candidates:

  • Country, state, or region codes (very low cardinality)
  • Boolean fields (2 values)
  • Status or category fields (typically 2–50 values)
  • Date of birth alone (low cardinality — approximately 36,000 distinct values spanning 100 years, but concentrated in a much narrower range)

Do not use blind indexes as a substitute for access control

Blind indexes protect the database at rest. They do not prevent a user with query access to the database from reading companion column values and performing frequency analysis. Restrict database access to the application service account.

Crypto-Shredding Compatibility

Crypto-shredding deletes the encryption key for a data subject. It does not delete the HMAC key, because HMAC keys are scoped to a field definition, not to an individual data subject.

After crypto-shredding:

  • Email (encrypted column) — returns the replacement value.
  • EmailHash (companion column) — still contains the HMAC fingerprint.

If your data retention requirements demand removal of all fingerprints for a shredded subject, you must explicitly delete or overwrite the companion column value as part of your erasure workflow.

cs
// After crypto-shredding, the HMAC fingerprint remains in the companion column.
// If your retention policy requires removal, explicitly clear it:

// 1. Crypto-shred the encryption key
await biTayra.ShredAsync(indexed.Id.ToString());

// 2. Clear the companion column to remove the fingerprint
indexed.EmailHash = null;

// 3. Persist the cleared companion value
// db.Update(indexed); or dbContext.SaveChangesAsync();
Console.WriteLine($"  EmailHash after erasure: {indexed.EmailHash ?? "(null)"}");
anchor

See Also