log file parallel write - Tune Oracle Redo Log I/O
log file parallel write
Section titled “log file parallel write”Overview
Section titled “Overview”Wait Event Class: System I/O
Parameters: files (number of log files being written), blocks (number of redo blocks written), requests (number of I/O requests issued)
log file parallel write is the wait recorded by the Log Writer (LGWR) background process itself as it writes redo entries from the log buffer to the online redo log files on disk. This is a System I/O class event — meaning it is a background process wait — as opposed to log file sync, which is the foreground session wait.
The Relationship Between log file sync and log file parallel write
Section titled “The Relationship Between log file sync and log file parallel write”These two events are tightly coupled:
- A committing session waits on
log file sync - While the session waits, LGWR is performing
log file parallel write - When
log file parallel writecompletes, LGWR signals all waiting sessions, ending theirlog file syncwaits
The “parallel” in the name refers to LGWR writing to all members of a log group simultaneously (in parallel I/O operations). For a log group with 2 members, LGWR issues 2 parallel writes and waits for both to complete before signaling sessions.
Why This Event Appears in AWR
Section titled “Why This Event Appears in AWR”Although log file parallel write is a background wait, it appears in AWR background wait event reports. Its average latency is the most direct measure of redo log I/O performance. A well-tuned system has:
log file parallel writeaverage ≈log file syncaverage (LGWR scheduled promptly)- Both values below 2–5 ms
If log file sync is significantly higher than log file parallel write, the excess is LGWR CPU scheduling latency — often seen in overcommitted virtual machines.
When This Wait Is a Problem
Section titled “When This Wait Is a Problem”Thresholds for log file parallel write
Section titled “Thresholds for log file parallel write”| Average Wait Time | Assessment |
|---|---|
| < 1 ms | Excellent — NVMe or write-cached SAN |
| 1–3 ms | Good — well-configured storage |
| 3–10 ms | Investigate — storage contention or suboptimal placement |
| 10–20 ms | Poor — redo log storage is a bottleneck |
| > 20 ms | Critical — redo I/O severely degraded |
When to Investigate
Section titled “When to Investigate”log file parallel writeaverage latency has increased over time without hardware changes- A specific redo log group member is on slower storage than others
- Log switch rate is very high (many small redo logs)
- The database is running on a virtualized host with shared I/O resources
log file syncis elevated andlog file parallel writeconfirms storage is the cause
Diagnostic Queries
Section titled “Diagnostic Queries”1. Background Wait Statistics for LGWR
Section titled “1. Background Wait Statistics for LGWR”-- log file parallel write in background wait eventsSELECT event, total_waits, total_timeouts, ROUND(time_waited / 100, 2) AS total_secs, ROUND(average_wait * 10, 2) AS avg_wait_ms, ROUND(max_wait / 100, 2) AS max_wait_secsFROM v$system_eventWHERE event IN ('log file parallel write', 'log file sync', 'log file switch completion', 'log file switch (checkpoint incomplete)')ORDER BY time_waited DESC;
-- Compare LGWR write time vs session commit wait timeSELECT se.event, se.total_waits, ROUND(se.average_wait * 10, 2) AS avg_msFROM v$system_event seWHERE se.event IN ('log file sync', 'log file parallel write')ORDER BY se.event;2. Redo Log Configuration — Size, Members, and Current Status
Section titled “2. Redo Log Configuration — Size, Members, and Current Status”-- Full redo log group status with file locationsSELECT l.group#, lf.member, ROUND(l.bytes / 1024 / 1024, 0) AS size_mb, l.members AS member_count, l.status, l.archived, l.sequence#, TO_CHAR(l.first_time, 'YYYY-MM-DD HH24:MI:SS') AS first_switch_timeFROM v$log lJOIN v$logfile lf ON l.group# = lf.group#ORDER BY l.group#, lf.member;3. Log Switch History — Frequency Analysis
Section titled “3. Log Switch History — Frequency Analysis”-- Log switch frequency by hour (7-day history)SELECT TO_CHAR(first_time, 'YYYY-MM-DD HH24') AS hour, COUNT(*) AS log_switches, ROUND(60.0 / COUNT(*), 1) AS avg_mins_between_switches, MIN(ROUND((LEAD(first_time) OVER (ORDER BY sequence#) - first_time) * 1440, 1)) AS min_mins_between_switchesFROM v$log_historyWHERE first_time > SYSDATE - 7GROUP BY TO_CHAR(first_time, 'YYYY-MM-DD HH24')ORDER BY 1 DESCFETCH FIRST 168 ROWS ONLY;
-- Identify the fastest and slowest log switchesSELECT sequence#, TO_CHAR(first_time, 'YYYY-MM-DD HH24:MI:SS') AS switch_time, ROUND((LEAD(first_time) OVER (ORDER BY sequence#) - first_time) * 1440, 2) AS mins_for_this_logFROM v$log_historyWHERE first_time > SYSDATE - 1ORDER BY mins_for_this_log ASCFETCH FIRST 20 ROWS ONLY;4. Redo Generation Rate — Size Redo Logs Appropriately
Section titled “4. Redo Generation Rate — Size Redo Logs Appropriately”-- Average redo generated per minute from AWR (requires Diagnostics Pack)SELECT TO_CHAR(s.begin_interval_time, 'YYYY-MM-DD HH24:MI') AS snap_time, ROUND((e_stat.value - b_stat.value) / 1024 / 1024, 2) AS redo_mb_in_interval, ROUND((s.end_interval_time - s.begin_interval_time) * 1440, 1) AS interval_mins, ROUND( (e_stat.value - b_stat.value) / 1024 / 1024 / NULLIF((s.end_interval_time - s.begin_interval_time) * 1440, 0), 2 ) AS redo_mb_per_minuteFROM dba_hist_sysstat b_statJOIN dba_hist_sysstat e_stat ON e_stat.snap_id = b_stat.snap_id + 1 AND e_stat.stat_name = b_stat.stat_name AND e_stat.dbid = b_stat.dbid AND e_stat.instance_number = b_stat.instance_numberJOIN dba_hist_snapshot s ON s.snap_id = e_stat.snap_id AND s.dbid = e_stat.dbid AND s.instance_number = e_stat.instance_numberWHERE b_stat.stat_name = 'redo size' AND s.begin_interval_time > SYSDATE - 3ORDER BY s.begin_interval_time DESCFETCH FIRST 72 ROWS ONLY;5. I/O Statistics for Log Files
Section titled “5. I/O Statistics for Log Files”-- File-level I/O performance for redo log filesSELECT filetype_name, small_read_reqs, small_write_reqs, ROUND(small_write_svctime / NULLIF(small_write_reqs, 0), 4) AS avg_write_ms, large_write_reqs, ROUND(large_write_svctime / NULLIF(large_write_reqs, 0), 4) AS avg_large_write_msFROM v$iostat_fileWHERE filetype_name = 'Log File';Root Causes
Section titled “Root Causes”1. Redo Log Members on Slow Storage
Section titled “1. Redo Log Members on Slow Storage”The most common and impactful cause. Every write to a redo log group must be confirmed on all members before LGWR signals completion. If any member resides on slow storage — spinning HDD, heavily loaded SAN LUN, NFS — every write pays that storage’s latency, regardless of how fast the other members are.
Critical placement rules:
- Never place redo log members on the same spindles as data files (causes head contention)
- Never place redo logs on NFS mounts (network latency + no OS write cache guarantees)
- Avoid tablespace-level compression or encryption for redo log volumes
- In ASM, use a separate dedicated disk group for redo logs
2. Log Groups with Unequal Member Performance
Section titled “2. Log Groups with Unequal Member Performance”If a group has two members — one on fast SSD and one on slow HDD — LGWR always waits for the slow member. The fast member provides zero benefit for latency, only for redundancy.
Detection: Monitor per-file I/O stats with V$IOSTAT_FILE or V$FILESTAT to identify members with different response times.
3. Too Many Log Group Members
Section titled “3. Too Many Log Group Members”While multiplexing redo logs is essential for redundancy, having 3+ members per group on separate slow storage paths multiplies write I/O. Each additional member is an additional write that must complete. Two members on fast storage is typically the right balance for most environments.
4. Log Files Too Small — Excessive Log Switch Rate
Section titled “4. Log Files Too Small — Excessive Log Switch Rate”Small redo logs (e.g., 50 MB default in older databases) generate rapid log switches. During each switch, LGWR must wait for the next group to be available (not ACTIVE/CURRENT). If the next group is still being archived or a checkpoint is incomplete, LGWR stalls. This shows up as log file switch (checkpoint incomplete) or log file switch (archiving needed) waits alongside elevated log file parallel write.
Target: Redo log switch frequency of 15–30 minutes at peak load. Log size = redo MB/minute × 20 minutes.
5. Asynchronous I/O Not Available
Section titled “5. Asynchronous I/O Not Available”Without OS-level async I/O, LGWR uses synchronous writes to each member sequentially. With async I/O, LGWR issues all member writes in parallel and waits for completion — dramatically improving throughput for multi-member groups. Unavailable on some file systems (e.g., older ext3, certain NFS mounts).
6. OS/Hypervisor I/O Throttling
Section titled “6. OS/Hypervisor I/O Throttling”In virtualized environments (VMware, KVM, cloud VMs), the hypervisor may throttle I/O throughput or impose latency during storage overcommit conditions. LGWR, being a latency-sensitive sequential writer, is particularly impacted. I/O latency spikes from host-level contention directly increase log file parallel write.
Resolution Steps
Section titled “Resolution Steps”Add Redo Log Groups with Correct Sizing
Section titled “Add Redo Log Groups with Correct Sizing”-- Calculate target log size (redo_mb_per_min * target_minutes_between_switches)-- Example: 50 MB/min * 20 min = 1000 MB = 1 GB per log file
-- Add new properly-sized groups on fast storageALTER DATABASE ADD LOGFILE GROUP 4 ( '/fast_storage/redo/redo04a.log', '/mirror_storage/redo/redo04b.log') SIZE 1G;
ALTER DATABASE ADD LOGFILE GROUP 5 ( '/fast_storage/redo/redo05a.log', '/mirror_storage/redo/redo05b.log') SIZE 1G;
ALTER DATABASE ADD LOGFILE GROUP 6 ( '/fast_storage/redo/redo06a.log', '/mirror_storage/redo/redo06b.log') SIZE 1G;
-- Rotate through log groups to make old groups INACTIVEALTER SYSTEM SWITCH LOGFILE;ALTER SYSTEM CHECKPOINT;
-- Verify old groups are now INACTIVE before droppingSELECT group#, status, archived FROM v$log;
-- Drop old undersized groupsALTER DATABASE DROP LOGFILE GROUP 1;ALTER DATABASE DROP LOGFILE GROUP 2;ALTER DATABASE DROP LOGFILE GROUP 3;Move Log Members to Faster Storage
Section titled “Move Log Members to Faster Storage”-- Can't move a CURRENT log — must drop and re-add members-- First, add new members on fast storage to all groups:ALTER DATABASE ADD LOGFILE MEMBER '/fast/redo/redo01b.log' TO GROUP 1;ALTER DATABASE ADD LOGFILE MEMBER '/fast/redo/redo02b.log' TO GROUP 2;ALTER DATABASE ADD LOGFILE MEMBER '/fast/redo/redo03b.log' TO GROUP 3;
-- Force log switches to populate new membersALTER SYSTEM SWITCH LOGFILE;ALTER SYSTEM CHECKPOINT;
-- Wait for groups to be INACTIVE, then drop slow membersALTER DATABASE DROP LOGFILE MEMBER '/slow_nfs/redo/redo01a.log';ALTER DATABASE DROP LOGFILE MEMBER '/slow_nfs/redo/redo02a.log';ALTER DATABASE DROP LOGFILE MEMBER '/slow_nfs/redo/redo03a.log';Enable LGWR I/O Slaves (For High-Write Systems)
Section titled “Enable LGWR I/O Slaves (For High-Write Systems)”-- In 12c+, LGWR can use I/O slave processes for very high redo generation rates-- Check if LGWR workers are activeSELECT name, value FROM v$parameterWHERE name IN ('_use_single_log_writer', 'lgwr_io_slaves');
-- For very high redo systems, allow LGWR to spawn I/O slaves-- (This is usually automatic in 12c+ based on workload)ALTER SYSTEM SET lgwr_io_slaves = 4 SCOPE=SPFILE;-- Requires restart; use only if directed by Oracle SupportValidate Async I/O is Active
Section titled “Validate Async I/O is Active”-- Confirm async I/O is enabledSELECT name, value FROM v$parameterWHERE name IN ('disk_asynch_io', 'filesystemio_options');
-- Enable async I/O if disabled (requires restart)ALTER SYSTEM SET disk_asynch_io = TRUE SCOPE=SPFILE;
-- For file-system redo logs (not ASM), set filesystem async I/O:ALTER SYSTEM SET filesystemio_options = SETALL SCOPE=SPFILE;-- Options: NONE, DIRECTIO, ASYNCH, SETALLPrevention & Tuning
Section titled “Prevention & Tuning”1. Standardize on Correct Log Sizing During Database Creation
Section titled “1. Standardize on Correct Log Sizing During Database Creation”Establish redo log sizing based on expected peak redo generation from the start. Increasing log sizes later requires dropping and recreating log groups, which is an outage risk during the transition.
-- Recommended log group count and size formula:-- Minimum 3 log groups-- Each log sized for 20-30 minutes of peak redo generation-- Example sizing check:SELECT ROUND(MAX(redo_mb_per_min) * 25, 0) AS recommended_log_size_mbFROM ( SELECT (e_stat.value - b_stat.value) / 1024 / 1024 / NULLIF((s.end_interval_time - s.begin_interval_time) * 1440, 0) AS redo_mb_per_min FROM dba_hist_sysstat b_stat JOIN dba_hist_sysstat e_stat ON e_stat.snap_id = b_stat.snap_id + 1 AND e_stat.stat_name = b_stat.stat_name AND e_stat.dbid = b_stat.dbid JOIN dba_hist_snapshot s ON s.snap_id = e_stat.snap_id AND s.dbid = e_stat.dbid WHERE b_stat.stat_name = 'redo size' AND s.begin_interval_time > SYSDATE - 14);2. Periodic Redo Health Check
Section titled “2. Periodic Redo Health Check”Include redo log performance in weekly DBA checks:
- Review average
log file parallel writefrom AWR - Check log switch frequency from
V$LOG_HISTORY - Confirm all log members are
VALIDand on appropriate storage
3. Monitor in Data Guard Environments
Section titled “3. Monitor in Data Guard Environments”In Data Guard configurations, redo must also be transmitted to standby databases. High log file parallel write on the primary combined with high log file sync may indicate standby redo apply lag is feeding back to the primary (in Maximum Protection mode). Review V$DATAGUARD_STATS and consider switching to Maximum Performance or Availability mode.
-- Check Data Guard redo transport performanceSELECT dest_id, dest_name, status, target, archiver, schedule, transmit_mode, async_blocksFROM v$archive_destWHERE status = 'VALID' AND target = 'STANDBY';Related Wait Events
Section titled “Related Wait Events”- log file sync — The foreground session wait for LGWR completion; the user-facing manifestation of this wait
- db file sequential read — Compare latency profiles to understand relative storage performance
- direct path read — Direct I/O patterns; useful baseline for storage capability