log file sync - Diagnose Oracle Commit Performance Issues

log file sync

Overview

Wait Event Class: Commit

Parameters: buffer# (the log buffer position that must be flushed)

log file sync is the wait a foreground session incurs from the moment it issues a COMMIT (or ROLLBACK) until the Log Writer (LGWR) background process has written all redo generated by that transaction to the online redo log files on disk.

The Commit Sequence

Session calls COMMIT
Session posts LGWR to write redo
Session waits on log file sync
LGWR writes redo to disk (itself waiting on log file parallel write)
LGWR signals the session that the write is complete
Session’s log file sync wait ends; the commit returns to the application

The critical insight: log file sync duration equals LGWR’s write latency plus any LGWR scheduling delay. If LGWR is slow — because redo logs are on slow storage, because the OS is not scheduling LGWR promptly, or because there are too many concurrent commits — every committed transaction feels that latency.

Why It Matters

Every single committed transaction waits on this event
In high-frequency OLTP systems, log file sync is commonly in the top 3 wait events
A 5 ms average wait means every commit takes at least 5 ms — capping throughput at ~200 commits/second per session
High log file sync waits directly translate to poor application response time for write-heavy workloads

When This Wait Is a Problem

Latency Thresholds

Average Wait Time	Assessment
< 1 ms	Excellent — redo logs on NVMe or very fast SAN with write cache
1–5 ms	Good — typical well-tuned SAN or SSD
5–10 ms	Investigate — may be storage contention, slow disk, or misconfiguration
10–20 ms	Problem — application performance is significantly impacted
> 20 ms	Critical — redo log I/O is severely degraded

Normal vs. Problematic

This event is expected in any database that commits. It becomes problematic when:

Average latency is consistently above 5–10 ms
The total time in log file sync represents > 10% of DB time in an OLTP system
Application response times spike in correlation with commit-heavy operations
log file sync average latency diverges significantly from log file parallel write average latency (indicating LGWR scheduling delays)

Diagnostic Queries

1. Current log file sync Statistics

-- System-level commit wait overview
SELECT
    event,
    total_waits,
    total_timeouts,
    ROUND(time_waited / 100, 2)           AS total_secs,
    ROUND(average_wait * 10, 2)           AS avg_wait_ms,
    ROUND(time_waited * 100.0 /
          NULLIF((SELECT SUM(time_waited)
                  FROM v$system_event
                  WHERE wait_class NOT IN ('Idle')), 0), 2) AS pct_db_time
FROM v$system_event
WHERE event IN ('log file sync', 'log file parallel write')
ORDER BY time_waited DESC;

-- Compare sync vs parallel write to detect LGWR scheduling gaps
-- If log file sync avg >> log file parallel write avg, LGWR is being starved of CPU
SELECT
    e1.event                          AS sync_event,
    ROUND(e1.average_wait * 10, 2)    AS sync_avg_ms,
    e2.event                          AS write_event,
    ROUND(e2.average_wait * 10, 2)    AS write_avg_ms,
    ROUND((e1.average_wait - e2.average_wait) * 10, 2) AS lgwr_sched_delay_ms
FROM v$system_event e1
CROSS JOIN v$system_event e2
WHERE e1.event = 'log file sync'
  AND e2.event = 'log file parallel write';

2. Sessions Currently Waiting on Commits

-- Active sessions waiting for log file sync right now
SELECT
    sw.sid,
    sw.serial#,
    s.username,
    s.program,
    s.machine,
    s.sql_id,
    sw.p1                       AS log_buffer_position,
    sw.seconds_in_wait          AS secs_waiting,
    s.last_call_et              AS session_last_active_secs
FROM v$session_wait  sw
JOIN v$session       s ON sw.sid = s.sid
WHERE sw.event = 'log file sync'
ORDER BY sw.seconds_in_wait DESC;

3. Redo Log Configuration and Status

-- Online redo log groups, sizes, and current status
SELECT
    l.group#,
    l.members,
    ROUND(l.bytes / 1024 / 1024, 0)  AS size_mb,
    l.status,
    l.archived,
    l.sequence#,
    TO_CHAR(l.first_time, 'YYYY-MM-DD HH24:MI:SS') AS first_time
FROM v$log l
ORDER BY l.group#;

-- Redo log member file locations
SELECT
    lf.group#,
    lf.member,
    lf.status,
    lf.type
FROM v$logfile lf
ORDER BY lf.group#, lf.member;

-- Log switch frequency (should be every 15-30+ minutes in OLTP)
SELECT
    TO_CHAR(first_time, 'YYYY-MM-DD HH24') AS hour,
    COUNT(*)                                AS switches_per_hour,
    ROUND(60.0 / COUNT(*), 1)              AS avg_minutes_between_switches
FROM v$log_history
WHERE first_time > SYSDATE - 7
GROUP BY TO_CHAR(first_time, 'YYYY-MM-DD HH24')
ORDER BY 1 DESC
FETCH FIRST 48 ROWS ONLY;

4. Commit Rate and ASH Analysis

-- Commits per second from V$SYSSTAT
SELECT
    name,
    value                                                   AS total_commits,
    ROUND(value / (SELECT (SYSDATE - startup_time) * 86400
                   FROM v$instance), 2)                     AS commits_per_sec
FROM v$sysstat
WHERE name IN ('user commits', 'user rollbacks');

-- ASH: Which programs/users are causing the most log file sync waits?
SELECT
    s.program,
    s.username,
    COUNT(*)                               AS ash_samples,
    COUNT(*) * 10                          AS estimated_wait_secs,
    ROUND(COUNT(*) * 100.0 /
          SUM(COUNT(*)) OVER (), 2)        AS pct_of_sync_waits
FROM v$active_session_history ash
JOIN v$session s ON ash.session_id = s.sid
WHERE ash.event       = 'log file sync'
  AND ash.sample_time > SYSDATE - 1/24
GROUP BY s.program, s.username
ORDER BY ash_samples DESC
FETCH FIRST 15 ROWS ONLY;

Root Causes

1. Redo Logs on Slow Storage

Redo log members on spinning disks, NFS mounts, or shared SAN volumes with high contention exhibit high per-write latency. Since every commit blocks on this write, storage speed is the primary determinant of commit throughput.

Key point: Redo logs must be written sequentially and synchronously before the commit returns. This is non-negotiable for durability. The only solution is faster storage or reducing the frequency of commits.

Best practice: Place redo log members on the fastest available storage — dedicated SSDs, NVMe, or a high-tier SAN LUN that is not shared with data files. In ASM, use a separate disk group for redo logs.

2. Excessive Commit Frequency — Committing After Every Row

Applications that commit inside a row-by-row loop generate one log file sync wait per row. For a million-row insert, this means 1 million individual commits, each paying the full latency penalty.

Example pattern:

-- BAD: Commit per row
FOR each_record IN source_cursor LOOP
    INSERT INTO target_table VALUES (each_record);
    COMMIT;  -- 1 commit per row = 1 log file sync per row
END LOOP;

This is the single most common cause of high aggregate log file sync time. The fix is trivial: batch commits.

3. Redo Log Files Too Small — Frequent Log Switches

Small redo log files cause frequent log switches. During a log switch, LGWR must wait for ARCH to finish archiving the current log before it can begin writing to the next one (if all groups are in ACTIVE state). This causes checkpoint-related stalls which manifest as elevated log file sync and log file switch waits.

Target: redo log switches no more frequently than every 15–30 minutes during peak load.

4. LGWR CPU Starvation

In heavily loaded systems, LGWR may not get CPU promptly after being posted by a committing session. The gap between when LGWR is woken and when it actually runs shows up as the difference between log file sync average and log file parallel write average. On virtualized environments, vCPU scheduling can introduce significant delays.

5. Redo Log Multiplexing to Slow Members

Each log group must have all its members written before LGWR considers the write complete. If one member is on fast storage and another is on slow storage, every write waits for the slowest member.

Detection: Check V$LOGFILE for member locations; compare performance of different mount points.

6. Asynchronous I/O Disabled

When asynchronous I/O is not available (e.g., due to OS configuration or file system type), LGWR blocks during writes. With async I/O enabled, LGWR can issue multiple writes concurrently across log members. Check V$IOSTAT_FILE and the disk_asynch_io parameter.

Resolution Steps

Move Redo Logs to Faster Storage

-- Step 1: Identify current log file locations
SELECT group#, member FROM v$logfile ORDER BY group#, member;

-- Step 2: Add new members on faster storage to each group
ALTER DATABASE ADD LOGFILE MEMBER '/fast_disk/redo/redo01b.log' TO GROUP 1;
ALTER DATABASE ADD LOGFILE MEMBER '/fast_disk/redo/redo02b.log' TO GROUP 2;
ALTER DATABASE ADD LOGFILE MEMBER '/fast_disk/redo/redo03b.log' TO GROUP 3;

-- Step 3: Perform log switches to move away from old members
ALTER SYSTEM SWITCH LOGFILE;
ALTER SYSTEM CHECKPOINT;

-- Step 4: Drop old slow members (only after group is not CURRENT/ACTIVE)
ALTER DATABASE DROP LOGFILE MEMBER '/slow_disk/redo/redo01a.log';
ALTER DATABASE DROP LOGFILE MEMBER '/slow_disk/redo/redo02a.log';
ALTER DATABASE DROP LOGFILE MEMBER '/slow_disk/redo/redo03a.log';

Batch Commits to Reduce Commit Frequency

-- BAD pattern (commit per row):
BEGIN
    FOR i IN 1..1000000 LOOP
        INSERT INTO target_table VALUES (i, 'data');
        COMMIT;
    END LOOP;
END;
/

-- GOOD pattern (commit every N rows):
DECLARE
    l_commit_interval NUMBER := 10000;
BEGIN
    FOR i IN 1..1000000 LOOP
        INSERT INTO target_table VALUES (i, 'data');
        IF MOD(i, l_commit_interval) = 0 THEN
            COMMIT;
        END IF;
    END LOOP;
    COMMIT; -- Final commit for remainder
END;
/

-- BEST pattern for bulk loads (single commit):
INSERT /*+ APPEND */ INTO target_table
SELECT * FROM source_table;
COMMIT;

Resize Redo Log Files to Reduce Switch Frequency

-- Step 1: Check current sizes and switch frequency
SELECT group#, ROUND(bytes/1024/1024) AS size_mb FROM v$log;

-- Calculate optimal size: target 20-minute switches
-- optimal_size_mb = redo_generated_per_min * 20
-- Check redo generated per minute:
SELECT
    ROUND(AVG(redo_size_mb), 2) AS avg_redo_per_interval_mb
FROM (
    SELECT
        snap_id,
        (e_redo - b_redo) / 1024 / 1024 AS redo_size_mb
    FROM (
        SELECT
            snap_id,
            SUM(CASE WHEN stat_name = 'redo size' THEN value END) AS e_redo,
            LAG(SUM(CASE WHEN stat_name = 'redo size' THEN value END))
                OVER (ORDER BY snap_id)                             AS b_redo
        FROM dba_hist_sysstat
        WHERE stat_name = 'redo size'
          AND snap_id BETWEEN &begin_snap AND &end_snap
        GROUP BY snap_id
    )
);

-- Step 2: Add new larger log groups
ALTER DATABASE ADD LOGFILE GROUP 4
    ('/fast_disk/redo/redo04a.log', '/mirror/redo/redo04b.log')
    SIZE 1G;

ALTER DATABASE ADD LOGFILE GROUP 5
    ('/fast_disk/redo/redo05a.log', '/mirror/redo/redo05b.log')
    SIZE 1G;

-- Step 3: Drop old groups (must not be CURRENT or ACTIVE)
-- Force switches until old groups are INACTIVE
ALTER SYSTEM SWITCH LOGFILE;
ALTER SYSTEM CHECKPOINT;

-- Check status before dropping
SELECT group#, status FROM v$log;
ALTER DATABASE DROP LOGFILE GROUP 1;
ALTER DATABASE DROP LOGFILE GROUP 2;

Enable Asynchronous I/O

-- Check current async I/O setting
SHOW PARAMETER disk_asynch_io;

-- Enable if disabled (requires OS support)
ALTER SYSTEM SET disk_asynch_io = TRUE SCOPE=SPFILE;
-- Requires database restart

-- Verify LGWR is using async I/O
SELECT * FROM v$iostat_file
WHERE file_type = 'Log File';

Prevention & Tuning

1. Place Redo Logs on Dedicated Fast Storage

Never share redo log LUNs with data files, temp files, or archive logs. Use a dedicated high-priority storage pool. In virtualized environments, use thick-provisioned disks with write-back cache enabled at the array level.

2. Monitor Commit Rate and log file sync Latency Continuously

-- Create a baseline metric to track over time
SELECT
    TO_CHAR(s.begin_interval_time, 'YYYY-MM-DD HH24:MI') AS snap_time,
    ROUND((e2.time_waited_micro - e1.time_waited_micro) /
          NULLIF((e2.total_waits - e1.total_waits), 0) / 1000, 3) AS avg_ms_per_commit
FROM dba_hist_system_event  e1
JOIN dba_hist_system_event  e2 ON  e2.snap_id        = e1.snap_id + 1
                                AND e2.event_name     = e1.event_name
                                AND e2.dbid           = e1.dbid
                                AND e2.instance_number = e1.instance_number
JOIN dba_hist_snapshot      s  ON  s.snap_id          = e2.snap_id
                                AND s.dbid             = e2.dbid
                                AND s.instance_number  = e2.instance_number
WHERE e1.event_name = 'log file sync'
ORDER BY s.begin_interval_time DESC
FETCH FIRST 48 ROWS ONLY;

3. Consider COMMIT_WRITE for Async Commits (Non-Critical Data)

For workloads where absolute durability is not required for every commit (e.g., session state data, non-financial logging), Oracle supports asynchronous commit — the commit returns immediately without waiting for LGWR. Use with care.

-- Session-level async commit (use cautiously)
ALTER SESSION SET commit_write = 'BATCH,NOWAIT';

-- Or via hint:
COMMIT /*+ WRITE BATCH NOWAIT */;

-- Reset to default (synchronous) when done
ALTER SESSION SET commit_write = 'IMMEDIATE,WAIT';

4. Use NOLOGGING for Bulk Operations

For bulk loads where data can be reloaded from source if needed, use NOLOGGING to dramatically reduce redo generation (and thus log file sync waits):

-- Direct path insert with NOLOGGING
ALTER TABLE target_table NOLOGGING;
INSERT /*+ APPEND */ INTO target_table SELECT * FROM source_table;
COMMIT;
ALTER TABLE target_table LOGGING;
-- Note: Take a backup after NOLOGGING operations

log file parallel write — The LGWR background process’s own I/O wait; this is what log file sync is waiting for
db file sequential read — Single-block I/O; compare latency to redo write latency for storage baseline
enq: TX - row lock contention — Lock contention also accumulates during long transactions before commit