log file sync - Diagnose Oracle Commit Performance Issues
log file sync
Section titled “log file sync”Overview
Section titled “Overview”Wait Event Class: Commit
Parameters: buffer# (the log buffer position that must be flushed)
log file sync is the wait a foreground session incurs from the moment it issues a COMMIT (or ROLLBACK) until the Log Writer (LGWR) background process has written all redo generated by that transaction to the online redo log files on disk.
The Commit Sequence
Section titled “The Commit Sequence”- Session calls
COMMIT - Session posts LGWR to write redo
- Session waits on
log file sync - LGWR writes redo to disk (itself waiting on
log file parallel write) - LGWR signals the session that the write is complete
- Session’s
log file syncwait ends; the commit returns to the application
The critical insight: log file sync duration equals LGWR’s write latency plus any LGWR scheduling delay. If LGWR is slow — because redo logs are on slow storage, because the OS is not scheduling LGWR promptly, or because there are too many concurrent commits — every committed transaction feels that latency.
Why It Matters
Section titled “Why It Matters”- Every single committed transaction waits on this event
- In high-frequency OLTP systems,
log file syncis commonly in the top 3 wait events - A 5 ms average wait means every commit takes at least 5 ms — capping throughput at ~200 commits/second per session
- High
log file syncwaits directly translate to poor application response time for write-heavy workloads
When This Wait Is a Problem
Section titled “When This Wait Is a Problem”Latency Thresholds
Section titled “Latency Thresholds”| Average Wait Time | Assessment |
|---|---|
| < 1 ms | Excellent — redo logs on NVMe or very fast SAN with write cache |
| 1–5 ms | Good — typical well-tuned SAN or SSD |
| 5–10 ms | Investigate — may be storage contention, slow disk, or misconfiguration |
| 10–20 ms | Problem — application performance is significantly impacted |
| > 20 ms | Critical — redo log I/O is severely degraded |
Normal vs. Problematic
Section titled “Normal vs. Problematic”This event is expected in any database that commits. It becomes problematic when:
- Average latency is consistently above 5–10 ms
- The total time in
log file syncrepresents > 10% of DB time in an OLTP system - Application response times spike in correlation with commit-heavy operations
log file syncaverage latency diverges significantly fromlog file parallel writeaverage latency (indicating LGWR scheduling delays)
Diagnostic Queries
Section titled “Diagnostic Queries”1. Current log file sync Statistics
Section titled “1. Current log file sync Statistics”-- System-level commit wait overviewSELECT event, total_waits, total_timeouts, ROUND(time_waited / 100, 2) AS total_secs, ROUND(average_wait * 10, 2) AS avg_wait_ms, ROUND(time_waited * 100.0 / NULLIF((SELECT SUM(time_waited) FROM v$system_event WHERE wait_class NOT IN ('Idle')), 0), 2) AS pct_db_timeFROM v$system_eventWHERE event IN ('log file sync', 'log file parallel write')ORDER BY time_waited DESC;
-- Compare sync vs parallel write to detect LGWR scheduling gaps-- If log file sync avg >> log file parallel write avg, LGWR is being starved of CPUSELECT e1.event AS sync_event, ROUND(e1.average_wait * 10, 2) AS sync_avg_ms, e2.event AS write_event, ROUND(e2.average_wait * 10, 2) AS write_avg_ms, ROUND((e1.average_wait - e2.average_wait) * 10, 2) AS lgwr_sched_delay_msFROM v$system_event e1CROSS JOIN v$system_event e2WHERE e1.event = 'log file sync' AND e2.event = 'log file parallel write';2. Sessions Currently Waiting on Commits
Section titled “2. Sessions Currently Waiting on Commits”-- Active sessions waiting for log file sync right nowSELECT sw.sid, sw.serial#, s.username, s.program, s.machine, s.sql_id, sw.p1 AS log_buffer_position, sw.seconds_in_wait AS secs_waiting, s.last_call_et AS session_last_active_secsFROM v$session_wait swJOIN v$session s ON sw.sid = s.sidWHERE sw.event = 'log file sync'ORDER BY sw.seconds_in_wait DESC;3. Redo Log Configuration and Status
Section titled “3. Redo Log Configuration and Status”-- Online redo log groups, sizes, and current statusSELECT l.group#, l.members, ROUND(l.bytes / 1024 / 1024, 0) AS size_mb, l.status, l.archived, l.sequence#, TO_CHAR(l.first_time, 'YYYY-MM-DD HH24:MI:SS') AS first_timeFROM v$log lORDER BY l.group#;
-- Redo log member file locationsSELECT lf.group#, lf.member, lf.status, lf.typeFROM v$logfile lfORDER BY lf.group#, lf.member;
-- Log switch frequency (should be every 15-30+ minutes in OLTP)SELECT TO_CHAR(first_time, 'YYYY-MM-DD HH24') AS hour, COUNT(*) AS switches_per_hour, ROUND(60.0 / COUNT(*), 1) AS avg_minutes_between_switchesFROM v$log_historyWHERE first_time > SYSDATE - 7GROUP BY TO_CHAR(first_time, 'YYYY-MM-DD HH24')ORDER BY 1 DESCFETCH FIRST 48 ROWS ONLY;4. Commit Rate and ASH Analysis
Section titled “4. Commit Rate and ASH Analysis”-- Commits per second from V$SYSSTATSELECT name, value AS total_commits, ROUND(value / (SELECT (SYSDATE - startup_time) * 86400 FROM v$instance), 2) AS commits_per_secFROM v$sysstatWHERE name IN ('user commits', 'user rollbacks');
-- ASH: Which programs/users are causing the most log file sync waits?SELECT s.program, s.username, COUNT(*) AS ash_samples, COUNT(*) * 10 AS estimated_wait_secs, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct_of_sync_waitsFROM v$active_session_history ashJOIN v$session s ON ash.session_id = s.sidWHERE ash.event = 'log file sync' AND ash.sample_time > SYSDATE - 1/24GROUP BY s.program, s.usernameORDER BY ash_samples DESCFETCH FIRST 15 ROWS ONLY;Root Causes
Section titled “Root Causes”1. Redo Logs on Slow Storage
Section titled “1. Redo Logs on Slow Storage”Redo log members on spinning disks, NFS mounts, or shared SAN volumes with high contention exhibit high per-write latency. Since every commit blocks on this write, storage speed is the primary determinant of commit throughput.
Key point: Redo logs must be written sequentially and synchronously before the commit returns. This is non-negotiable for durability. The only solution is faster storage or reducing the frequency of commits.
Best practice: Place redo log members on the fastest available storage — dedicated SSDs, NVMe, or a high-tier SAN LUN that is not shared with data files. In ASM, use a separate disk group for redo logs.
2. Excessive Commit Frequency — Committing After Every Row
Section titled “2. Excessive Commit Frequency — Committing After Every Row”Applications that commit inside a row-by-row loop generate one log file sync wait per row. For a million-row insert, this means 1 million individual commits, each paying the full latency penalty.
Example pattern:
-- BAD: Commit per rowFOR each_record IN source_cursor LOOP INSERT INTO target_table VALUES (each_record); COMMIT; -- 1 commit per row = 1 log file sync per rowEND LOOP;This is the single most common cause of high aggregate log file sync time. The fix is trivial: batch commits.
3. Redo Log Files Too Small — Frequent Log Switches
Section titled “3. Redo Log Files Too Small — Frequent Log Switches”Small redo log files cause frequent log switches. During a log switch, LGWR must wait for ARCH to finish archiving the current log before it can begin writing to the next one (if all groups are in ACTIVE state). This causes checkpoint-related stalls which manifest as elevated log file sync and log file switch waits.
Target: redo log switches no more frequently than every 15–30 minutes during peak load.
4. LGWR CPU Starvation
Section titled “4. LGWR CPU Starvation”In heavily loaded systems, LGWR may not get CPU promptly after being posted by a committing session. The gap between when LGWR is woken and when it actually runs shows up as the difference between log file sync average and log file parallel write average. On virtualized environments, vCPU scheduling can introduce significant delays.
5. Redo Log Multiplexing to Slow Members
Section titled “5. Redo Log Multiplexing to Slow Members”Each log group must have all its members written before LGWR considers the write complete. If one member is on fast storage and another is on slow storage, every write waits for the slowest member.
Detection: Check V$LOGFILE for member locations; compare performance of different mount points.
6. Asynchronous I/O Disabled
Section titled “6. Asynchronous I/O Disabled”When asynchronous I/O is not available (e.g., due to OS configuration or file system type), LGWR blocks during writes. With async I/O enabled, LGWR can issue multiple writes concurrently across log members. Check V$IOSTAT_FILE and the disk_asynch_io parameter.
Resolution Steps
Section titled “Resolution Steps”Move Redo Logs to Faster Storage
Section titled “Move Redo Logs to Faster Storage”-- Step 1: Identify current log file locationsSELECT group#, member FROM v$logfile ORDER BY group#, member;
-- Step 2: Add new members on faster storage to each groupALTER DATABASE ADD LOGFILE MEMBER '/fast_disk/redo/redo01b.log' TO GROUP 1;ALTER DATABASE ADD LOGFILE MEMBER '/fast_disk/redo/redo02b.log' TO GROUP 2;ALTER DATABASE ADD LOGFILE MEMBER '/fast_disk/redo/redo03b.log' TO GROUP 3;
-- Step 3: Perform log switches to move away from old membersALTER SYSTEM SWITCH LOGFILE;ALTER SYSTEM CHECKPOINT;
-- Step 4: Drop old slow members (only after group is not CURRENT/ACTIVE)ALTER DATABASE DROP LOGFILE MEMBER '/slow_disk/redo/redo01a.log';ALTER DATABASE DROP LOGFILE MEMBER '/slow_disk/redo/redo02a.log';ALTER DATABASE DROP LOGFILE MEMBER '/slow_disk/redo/redo03a.log';Batch Commits to Reduce Commit Frequency
Section titled “Batch Commits to Reduce Commit Frequency”-- BAD pattern (commit per row):BEGIN FOR i IN 1..1000000 LOOP INSERT INTO target_table VALUES (i, 'data'); COMMIT; END LOOP;END;/
-- GOOD pattern (commit every N rows):DECLARE l_commit_interval NUMBER := 10000;BEGIN FOR i IN 1..1000000 LOOP INSERT INTO target_table VALUES (i, 'data'); IF MOD(i, l_commit_interval) = 0 THEN COMMIT; END IF; END LOOP; COMMIT; -- Final commit for remainderEND;/
-- BEST pattern for bulk loads (single commit):INSERT /*+ APPEND */ INTO target_tableSELECT * FROM source_table;COMMIT;Resize Redo Log Files to Reduce Switch Frequency
Section titled “Resize Redo Log Files to Reduce Switch Frequency”-- Step 1: Check current sizes and switch frequencySELECT group#, ROUND(bytes/1024/1024) AS size_mb FROM v$log;
-- Calculate optimal size: target 20-minute switches-- optimal_size_mb = redo_generated_per_min * 20-- Check redo generated per minute:SELECT ROUND(AVG(redo_size_mb), 2) AS avg_redo_per_interval_mbFROM ( SELECT snap_id, (e_redo - b_redo) / 1024 / 1024 AS redo_size_mb FROM ( SELECT snap_id, SUM(CASE WHEN stat_name = 'redo size' THEN value END) AS e_redo, LAG(SUM(CASE WHEN stat_name = 'redo size' THEN value END)) OVER (ORDER BY snap_id) AS b_redo FROM dba_hist_sysstat WHERE stat_name = 'redo size' AND snap_id BETWEEN &begin_snap AND &end_snap GROUP BY snap_id ));
-- Step 2: Add new larger log groupsALTER DATABASE ADD LOGFILE GROUP 4 ('/fast_disk/redo/redo04a.log', '/mirror/redo/redo04b.log') SIZE 1G;
ALTER DATABASE ADD LOGFILE GROUP 5 ('/fast_disk/redo/redo05a.log', '/mirror/redo/redo05b.log') SIZE 1G;
-- Step 3: Drop old groups (must not be CURRENT or ACTIVE)-- Force switches until old groups are INACTIVEALTER SYSTEM SWITCH LOGFILE;ALTER SYSTEM CHECKPOINT;
-- Check status before droppingSELECT group#, status FROM v$log;ALTER DATABASE DROP LOGFILE GROUP 1;ALTER DATABASE DROP LOGFILE GROUP 2;Enable Asynchronous I/O
Section titled “Enable Asynchronous I/O”-- Check current async I/O settingSHOW PARAMETER disk_asynch_io;
-- Enable if disabled (requires OS support)ALTER SYSTEM SET disk_asynch_io = TRUE SCOPE=SPFILE;-- Requires database restart
-- Verify LGWR is using async I/OSELECT * FROM v$iostat_fileWHERE file_type = 'Log File';Prevention & Tuning
Section titled “Prevention & Tuning”1. Place Redo Logs on Dedicated Fast Storage
Section titled “1. Place Redo Logs on Dedicated Fast Storage”Never share redo log LUNs with data files, temp files, or archive logs. Use a dedicated high-priority storage pool. In virtualized environments, use thick-provisioned disks with write-back cache enabled at the array level.
2. Monitor Commit Rate and log file sync Latency Continuously
Section titled “2. Monitor Commit Rate and log file sync Latency Continuously”-- Create a baseline metric to track over timeSELECT TO_CHAR(s.begin_interval_time, 'YYYY-MM-DD HH24:MI') AS snap_time, ROUND((e2.time_waited_micro - e1.time_waited_micro) / NULLIF((e2.total_waits - e1.total_waits), 0) / 1000, 3) AS avg_ms_per_commitFROM dba_hist_system_event e1JOIN dba_hist_system_event e2 ON e2.snap_id = e1.snap_id + 1 AND e2.event_name = e1.event_name AND e2.dbid = e1.dbid AND e2.instance_number = e1.instance_numberJOIN dba_hist_snapshot s ON s.snap_id = e2.snap_id AND s.dbid = e2.dbid AND s.instance_number = e2.instance_numberWHERE e1.event_name = 'log file sync'ORDER BY s.begin_interval_time DESCFETCH FIRST 48 ROWS ONLY;3. Consider COMMIT_WRITE for Async Commits (Non-Critical Data)
Section titled “3. Consider COMMIT_WRITE for Async Commits (Non-Critical Data)”For workloads where absolute durability is not required for every commit (e.g., session state data, non-financial logging), Oracle supports asynchronous commit — the commit returns immediately without waiting for LGWR. Use with care.
-- Session-level async commit (use cautiously)ALTER SESSION SET commit_write = 'BATCH,NOWAIT';
-- Or via hint:COMMIT /*+ WRITE BATCH NOWAIT */;
-- Reset to default (synchronous) when doneALTER SESSION SET commit_write = 'IMMEDIATE,WAIT';4. Use NOLOGGING for Bulk Operations
Section titled “4. Use NOLOGGING for Bulk Operations”For bulk loads where data can be reloaded from source if needed, use NOLOGGING to dramatically reduce redo generation (and thus log file sync waits):
-- Direct path insert with NOLOGGINGALTER TABLE target_table NOLOGGING;INSERT /*+ APPEND */ INTO target_table SELECT * FROM source_table;COMMIT;ALTER TABLE target_table LOGGING;-- Note: Take a backup after NOLOGGING operationsRelated Wait Events
Section titled “Related Wait Events”- log file parallel write — The LGWR background process’s own I/O wait; this is what log file sync is waiting for
- db file sequential read — Single-block I/O; compare latency to redo write latency for storage baseline
- enq: TX - row lock contention — Lock contention also accumulates during long transactions before commit