Data Type Selection and Schema Design Problems

warningProactive Health

Identifying and fixing problematic data type choices in PostgreSQL schemas, including timestamp without timezone, char(n) padding issues, and money type locale problems.

Prompt: “I'm reviewing our PostgreSQL schema and I see we're using timestamp without time zone in many places, char(20) for some text fields, and the money type for currency. Are these bad choices and what problems might they cause in production?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

1,119vs1,047

Latency

25.2smsvs24.4sms

Turns

2vs2

Tool Calls

0vs0

Schema Calls

0vs0

Response

3,173 charsvs2,393 chars

Evaluation

Without Schema

A-

With Schema

B+

Both responses accurately identify all three PostgreSQL schema issues with correct technical guidance. The response without Schema provides more comprehensive coverage, including extensive production impact scenarios and a complete SQL migration example that makes it immediately actionable. The response with Schema delivers more concise guidance while adding useful clarifications (such as how timestamptz normalizes to UTC without storing the zone) and proactively offers to help audit or migrate the schema. Both responses correctly prioritize the severity of issues, with timestamp and money types being high risk. The primary difference is depth versus efficiency in delivering the core technical information.

Without Schema

Yes, these are commonly problematic choices in PostgreSQL. Here's what can go wrong:

## `timestamp without time zone`

**Problems:**
- **Timezone ambiguity**: The database doesn't know what timezone the timestamp represents, leading to confusion
- **DST issues**: When daylight saving time transitions occur, you can have ambiguous or invalid times
- **Multi-timezone bugs**: If users are in different timezones, you'll get incorrect results when converting/displaying times
- **Comparison issues**: Comparing timestamps from different timezones becomes unreliable

**What happens in production:**
- User sees "meeting at 3 PM" but it's actually 6 PM in their timezone
- Scheduled jobs run at wrong times after DST changes
- Reports show incorrect time ranges
- API responses have timestamps that clients interpret incorrectly

**Fix**: Use `timestamp with time zone` (or `timestamptz`). Always store in UTC, convert to local timezone only for display.

## `char(20)` for text fields

**Problems:**
- **Space padding**: Values are right-padded with spaces to fill 20 characters
- **Storage waste**: "Hi" uses same space as 20 characters
- **Comparison issues**: `'test' != 'test                '` can cause unexpected behavior
- **Fixed limit**: Can't store longer values without schema changes

**What happens in production:**
- String comparisons fail unexpectedly
- Data truncation when values exceed limit
- Wasted storage (minimal but unnecessary)
- Confusing bugs with trailing spaces

**Fix**: Use `text` or `varchar`. In PostgreSQL, `text` has no performance penalty and is the most flexible. Use `varchar(n)` only if you truly need to enforce a length constraint.

## `money` type for currency

**Problems:**
- **Locale-dependent**: Format depends on database locale setting, causing inconsistent behavior
- **Limited precision**: Only 2 decimal places, insufficient for some currencies or calculations
- **No currency type**: Can't distinguish USD from EUR
- **Calculation errors**: Rounding issues in financial calculations
- **Poor portability**: Not standard SQL, makes migration harder

**What happens in production:**
- Currency displays incorrectly after locale changes
- Calculation errors in interest/tax computations
- Can't handle cryptocurrencies or currencies with different decimal places
- Exchange rate conversions become complex
- Audit failures due to rounding discrepancies

**Fix**: Use `numeric(19,4)` or `decimal(19,4)`:
- Store amounts as integers (cents) if you need exact arithmetic
- Store currency code separately as a `char(3)` or `text` field (ISO 4217 codes)
- Handle formatting in application code

## Recommended schema changes

```sql
-- Instead of:
CREATE TABLE orders (
    created_at timestamp without time zone,
    customer_code char(20),
    total_amount money
);

-- Use:
CREATE TABLE orders (
    created_at timestamptz NOT NULL DEFAULT now(),
    customer_code text NOT NULL,
    total_amount numeric(19,4) NOT NULL,
    currency_code char(3) NOT NULL DEFAULT 'USD'
);
```

The `char(20)` issue is the least critical of the three, but the timestamp and money issues can cause real data correctness problems in production.

With Schema

These are all problematic choices that can cause real production issues. Here's what to watch for:

## `timestamp without time zone`

**Problems:**
- **Data loss**: You lose timezone context. A timestamp "2026-03-31 14:00:00" could be UTC, EST, PST - you don't know
- **Conversion bugs**: When the application or database timezone settings change, the same stored value gets interpreted differently
- **Distributed systems**: If you have users or servers in multiple timezones, you'll get wrong results when comparing or displaying times
- **DST issues**: Daylight saving time transitions can create ambiguous or impossible timestamps

**Fix**: Use `timestamptz` (timestamp with time zone) and store everything in UTC. Despite the name, it doesn't store the zone - it normalizes to UTC on write and converts on read.

## `char(20)` for text

**Problems:**
- **Space padding**: PostgreSQL pads with spaces to 20 chars. "hello" becomes "hello               " (with 15 spaces)
- **Comparison bugs**: `'hello' = 'hello               '` requires special handling
- **Wasted storage**: Every value takes 20 bytes even if you only use 5
- **Rigid constraints**: Exactly 20 chars - not fewer, not more

**Fix**: Use `varchar(20)` if you need length enforcement, or just `text` (PostgreSQL's `text` type is efficient and has no performance penalty vs varchar).

## `money` type for currency

**Problems:**
- **Locale dependency**: Formatting and parsing depends on database locale settings (LC_MONETARY)
- **Limited precision**: Only 2 decimal places, which doesn't work for some currencies or exchange rate calculations
- **No currency tracking**: Doesn't store whether it's USD, EUR, JPY
- **Calculation issues**: Mixing different currencies gives wrong results silently
- **Division problems**: `$10.00 / 3` can cause precision loss

**Fix**: Use `numeric(19,4)` or `decimal` type. Store currency code separately (like `currency_code varchar(3)`). Handle rounding explicitly in your application logic.

---

**Severity in production:**
- `timestamp without time zone`: High - will cause wrong data for any multi-timezone scenario
- `char(20)`: Medium - mostly wasted space and occasional string comparison bugs
- `money`: High - will cause financial calculation errors and currency confusion

Would you like me to help audit your schema to find these patterns, or help write a migration to fix them?

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When reviewing PostgreSQL schema design for problematic data types, start with the critical data corruption risks from timestamp without timezone and money types, then address the space-padding and arbitrary limit issues from char(n) and varchar(n), and finally audit for less common but equally problematic types like timetz and serial.

1Audit all timestamp columns for timezone handling

Query information_schema.columns for timestamp without time zone usage: SELECT table_name, column_name FROM information_schema.columns WHERE data_type = 'timestamp without time zone'. This is your highest priority because `timestamp-no-timezone-corruption` causes silent data corruption when application servers are in different timezones or timezone settings change. Any column storing a point in time (created_at, updated_at, scheduled_for) should be timestamptz, not timestamp. If you have distributed systems or multiple environments with different timezone configs, this is actively corrupting your data right now.

timestamp without time zone causes silent data corruption across timezones Storing UTC in timestamp columns breaks timezone-aware functions

2Identify money type columns and check locale settings

Search for money type usage with: SELECT table_name, column_name FROM information_schema.columns WHERE data_type = 'money'. The `money-type-locale-corruption` insight tells us that money precision depends on lc_monetary locale setting, which can differ between servers or during restore operations, silently changing values. Check your cluster's locale with SHOW lc_monetary; if you have multiple databases or plan to replicate/migrate data, this is a ticking time bomb. Plan migration to numeric(19,4) with separate currency columns.

money type precision depends on locale causing data corruption

3Find char(n) columns and measure storage waste

Query for char(n) usage: SELECT table_name, column_name, character_maximum_length FROM information_schema.columns WHERE data_type = 'character'. The `char-n-space-padding-issues` insight explains that char(n) pads with trailing spaces, wasting storage and causing confusing comparison behavior. PostgreSQL stores char(n) and text identically internally, so there's zero performance benefit. For a table with millions of rows and char(20) fields storing 5-character values, you're wasting 15 bytes per row. Calculate actual storage waste and prioritize high-volume tables for migration to text.

char(n) pads with spaces causing comparison and storage issues

4Review varchar(n) columns for arbitrary limits

Find all varchar columns with: SELECT table_name, column_name, character_maximum_length FROM information_schema.columns WHERE data_type = 'character varying' ORDER BY character_maximum_length. Look for the classic varchar(255) or other arbitrary limits — the `varchar-n-arbitrary-limits` insight notes these provide no performance advantage over text in PostgreSQL but create constraints that require table rewrites to change. If you see varchar(255) on email addresses or names, that's a code smell. Replace with text and CHECK constraints for actual business logic validation.

varchar(n) creates arbitrary length limits requiring table rewrites

5Check for timetz usage and meaningless offsets

Search for time with time zone: SELECT table_name, column_name FROM information_schema.columns WHERE data_type = 'time with time zone'. The `timetz-meaningless-offset` insight explains that timetz stores time-of-day with timezone offset, but that's meaningless because DST rules depend on the date — '3:00 PM PDT' is ambiguous without knowing which day. PostgreSQL docs say this type exists only for SQL compliance. If you need points in time, use timestamptz; for time-of-day values like 'store opens at 9:00 AM', use plain time and handle timezone conversion in your app.

timetz stores meaningless time-only values with timezone offset

6Audit serial columns for modern identity alternatives

Find serial/bigserial columns: SELECT table_name, column_name FROM information_schema.columns WHERE column_default LIKE 'nextval%'. While not causing data corruption, the `serial-lacks-ownership-enforcement` insight shows that serial pseudo-types don't truly own their sequences and allow manual inserts that can cause sequence conflicts. If you're on PostgreSQL 10+, new tables should use GENERATED ALWAYS AS IDENTITY instead — it ties the sequence tightly to the column and prevents the foot-gun of manual ID insertions. This is a lower priority migration but worth planning for your next schema version.

serial pseudo-type lacks sequence ownership and permissions enforcement