log file parallel write - Tune Oracle Redo Log I/O

log file parallel write

Overview

Wait Event Class: System I/O

Parameters: files (number of log files being written), blocks (number of redo blocks written), requests (number of I/O requests issued)

log file parallel write is the wait recorded by the Log Writer (LGWR) background process itself as it writes redo entries from the log buffer to the online redo log files on disk. This is a System I/O class event — meaning it is a background process wait — as opposed to log file sync, which is the foreground session wait.

The Relationship Between log file sync and log file parallel write

These two events are tightly coupled:

A committing session waits on log file sync
While the session waits, LGWR is performing log file parallel write
When log file parallel write completes, LGWR signals all waiting sessions, ending their log file sync waits

The “parallel” in the name refers to LGWR writing to all members of a log group simultaneously (in parallel I/O operations). For a log group with 2 members, LGWR issues 2 parallel writes and waits for both to complete before signaling sessions.

Why This Event Appears in AWR

Although log file parallel write is a background wait, it appears in AWR background wait event reports. Its average latency is the most direct measure of redo log I/O performance. A well-tuned system has:

log file parallel write average ≈ log file sync average (LGWR scheduled promptly)
Both values below 2–5 ms

If log file sync is significantly higher than log file parallel write, the excess is LGWR CPU scheduling latency — often seen in overcommitted virtual machines.

When This Wait Is a Problem

Thresholds for log file parallel write

Average Wait Time	Assessment
< 1 ms	Excellent — NVMe or write-cached SAN
1–3 ms	Good — well-configured storage
3–10 ms	Investigate — storage contention or suboptimal placement
10–20 ms	Poor — redo log storage is a bottleneck
> 20 ms	Critical — redo I/O severely degraded

When to Investigate

log file parallel write average latency has increased over time without hardware changes
A specific redo log group member is on slower storage than others
Log switch rate is very high (many small redo logs)
The database is running on a virtualized host with shared I/O resources
log file sync is elevated and log file parallel write confirms storage is the cause

Diagnostic Queries

1. Background Wait Statistics for LGWR

-- log file parallel write in background wait events
SELECT
    event,
    total_waits,
    total_timeouts,
    ROUND(time_waited / 100, 2)     AS total_secs,
    ROUND(average_wait * 10, 2)     AS avg_wait_ms,
    ROUND(max_wait / 100, 2)        AS max_wait_secs
FROM v$system_event
WHERE event IN ('log file parallel write', 'log file sync',
                'log file switch completion', 'log file switch (checkpoint incomplete)')
ORDER BY time_waited DESC;

-- Compare LGWR write time vs session commit wait time
SELECT
    se.event,
    se.total_waits,
    ROUND(se.average_wait * 10, 2)  AS avg_ms
FROM v$system_event se
WHERE se.event IN ('log file sync', 'log file parallel write')
ORDER BY se.event;

2. Redo Log Configuration — Size, Members, and Current Status

-- Full redo log group status with file locations
SELECT
    l.group#,
    lf.member,
    ROUND(l.bytes / 1024 / 1024, 0)  AS size_mb,
    l.members                          AS member_count,
    l.status,
    l.archived,
    l.sequence#,
    TO_CHAR(l.first_time, 'YYYY-MM-DD HH24:MI:SS') AS first_switch_time
FROM v$log     l
JOIN v$logfile lf ON l.group# = lf.group#
ORDER BY l.group#, lf.member;

3. Log Switch History — Frequency Analysis

-- Log switch frequency by hour (7-day history)
SELECT
    TO_CHAR(first_time, 'YYYY-MM-DD HH24') AS hour,
    COUNT(*)                                AS log_switches,
    ROUND(60.0 / COUNT(*), 1)              AS avg_mins_between_switches,
    MIN(ROUND((LEAD(first_time) OVER (ORDER BY sequence#) - first_time) * 1440, 1))
                                            AS min_mins_between_switches
FROM v$log_history
WHERE first_time > SYSDATE - 7
GROUP BY TO_CHAR(first_time, 'YYYY-MM-DD HH24')
ORDER BY 1 DESC
FETCH FIRST 168 ROWS ONLY;

-- Identify the fastest and slowest log switches
SELECT
    sequence#,
    TO_CHAR(first_time, 'YYYY-MM-DD HH24:MI:SS') AS switch_time,
    ROUND((LEAD(first_time) OVER (ORDER BY sequence#) - first_time) * 1440, 2) AS mins_for_this_log
FROM v$log_history
WHERE first_time > SYSDATE - 1
ORDER BY mins_for_this_log ASC
FETCH FIRST 20 ROWS ONLY;

4. Redo Generation Rate — Size Redo Logs Appropriately

-- Average redo generated per minute from AWR (requires Diagnostics Pack)
SELECT
    TO_CHAR(s.begin_interval_time, 'YYYY-MM-DD HH24:MI')  AS snap_time,
    ROUND((e_stat.value - b_stat.value) / 1024 / 1024, 2) AS redo_mb_in_interval,
    ROUND((s.end_interval_time - s.begin_interval_time) * 1440, 1) AS interval_mins,
    ROUND(
        (e_stat.value - b_stat.value) / 1024 / 1024 /
        NULLIF((s.end_interval_time - s.begin_interval_time) * 1440, 0), 2
    ) AS redo_mb_per_minute
FROM dba_hist_sysstat   b_stat
JOIN dba_hist_sysstat   e_stat ON  e_stat.snap_id        = b_stat.snap_id + 1
                                AND e_stat.stat_name      = b_stat.stat_name
                                AND e_stat.dbid           = b_stat.dbid
                                AND e_stat.instance_number = b_stat.instance_number
JOIN dba_hist_snapshot  s      ON  s.snap_id              = e_stat.snap_id
                                AND s.dbid                 = e_stat.dbid
                                AND s.instance_number      = e_stat.instance_number
WHERE b_stat.stat_name = 'redo size'
  AND s.begin_interval_time > SYSDATE - 3
ORDER BY s.begin_interval_time DESC
FETCH FIRST 72 ROWS ONLY;

5. I/O Statistics for Log Files

-- File-level I/O performance for redo log files
SELECT
    filetype_name,
    small_read_reqs,
    small_write_reqs,
    ROUND(small_write_svctime / NULLIF(small_write_reqs, 0), 4) AS avg_write_ms,
    large_write_reqs,
    ROUND(large_write_svctime / NULLIF(large_write_reqs, 0), 4) AS avg_large_write_ms
FROM v$iostat_file
WHERE filetype_name = 'Log File';

Root Causes

1. Redo Log Members on Slow Storage

The most common and impactful cause. Every write to a redo log group must be confirmed on all members before LGWR signals completion. If any member resides on slow storage — spinning HDD, heavily loaded SAN LUN, NFS — every write pays that storage’s latency, regardless of how fast the other members are.

Critical placement rules:

Never place redo log members on the same spindles as data files (causes head contention)
Never place redo logs on NFS mounts (network latency + no OS write cache guarantees)
Avoid tablespace-level compression or encryption for redo log volumes
In ASM, use a separate dedicated disk group for redo logs

2. Log Groups with Unequal Member Performance

If a group has two members — one on fast SSD and one on slow HDD — LGWR always waits for the slow member. The fast member provides zero benefit for latency, only for redundancy.

Detection: Monitor per-file I/O stats with V$IOSTAT_FILE or V$FILESTAT to identify members with different response times.

3. Too Many Log Group Members

While multiplexing redo logs is essential for redundancy, having 3+ members per group on separate slow storage paths multiplies write I/O. Each additional member is an additional write that must complete. Two members on fast storage is typically the right balance for most environments.

4. Log Files Too Small — Excessive Log Switch Rate

Small redo logs (e.g., 50 MB default in older databases) generate rapid log switches. During each switch, LGWR must wait for the next group to be available (not ACTIVE/CURRENT). If the next group is still being archived or a checkpoint is incomplete, LGWR stalls. This shows up as log file switch (checkpoint incomplete) or log file switch (archiving needed) waits alongside elevated log file parallel write.

Target: Redo log switch frequency of 15–30 minutes at peak load. Log size = redo MB/minute × 20 minutes.

5. Asynchronous I/O Not Available

Without OS-level async I/O, LGWR uses synchronous writes to each member sequentially. With async I/O, LGWR issues all member writes in parallel and waits for completion — dramatically improving throughput for multi-member groups. Unavailable on some file systems (e.g., older ext3, certain NFS mounts).

6. OS/Hypervisor I/O Throttling

In virtualized environments (VMware, KVM, cloud VMs), the hypervisor may throttle I/O throughput or impose latency during storage overcommit conditions. LGWR, being a latency-sensitive sequential writer, is particularly impacted. I/O latency spikes from host-level contention directly increase log file parallel write.

Resolution Steps

Add Redo Log Groups with Correct Sizing

-- Calculate target log size (redo_mb_per_min * target_minutes_between_switches)
-- Example: 50 MB/min * 20 min = 1000 MB = 1 GB per log file

-- Add new properly-sized groups on fast storage
ALTER DATABASE ADD LOGFILE GROUP 4 (
    '/fast_storage/redo/redo04a.log',
    '/mirror_storage/redo/redo04b.log'
) SIZE 1G;

ALTER DATABASE ADD LOGFILE GROUP 5 (
    '/fast_storage/redo/redo05a.log',
    '/mirror_storage/redo/redo05b.log'
) SIZE 1G;

ALTER DATABASE ADD LOGFILE GROUP 6 (
    '/fast_storage/redo/redo06a.log',
    '/mirror_storage/redo/redo06b.log'
) SIZE 1G;

-- Rotate through log groups to make old groups INACTIVE
ALTER SYSTEM SWITCH LOGFILE;
ALTER SYSTEM CHECKPOINT;

-- Verify old groups are now INACTIVE before dropping
SELECT group#, status, archived FROM v$log;

-- Drop old undersized groups
ALTER DATABASE DROP LOGFILE GROUP 1;
ALTER DATABASE DROP LOGFILE GROUP 2;
ALTER DATABASE DROP LOGFILE GROUP 3;

Move Log Members to Faster Storage

-- Can't move a CURRENT log — must drop and re-add members
-- First, add new members on fast storage to all groups:
ALTER DATABASE ADD LOGFILE MEMBER '/fast/redo/redo01b.log' TO GROUP 1;
ALTER DATABASE ADD LOGFILE MEMBER '/fast/redo/redo02b.log' TO GROUP 2;
ALTER DATABASE ADD LOGFILE MEMBER '/fast/redo/redo03b.log' TO GROUP 3;

-- Force log switches to populate new members
ALTER SYSTEM SWITCH LOGFILE;
ALTER SYSTEM CHECKPOINT;

-- Wait for groups to be INACTIVE, then drop slow members
ALTER DATABASE DROP LOGFILE MEMBER '/slow_nfs/redo/redo01a.log';
ALTER DATABASE DROP LOGFILE MEMBER '/slow_nfs/redo/redo02a.log';
ALTER DATABASE DROP LOGFILE MEMBER '/slow_nfs/redo/redo03a.log';

Enable LGWR I/O Slaves (For High-Write Systems)

-- In 12c+, LGWR can use I/O slave processes for very high redo generation rates
-- Check if LGWR workers are active
SELECT name, value FROM v$parameter
WHERE name IN ('_use_single_log_writer', 'lgwr_io_slaves');

-- For very high redo systems, allow LGWR to spawn I/O slaves
-- (This is usually automatic in 12c+ based on workload)
ALTER SYSTEM SET lgwr_io_slaves = 4 SCOPE=SPFILE;
-- Requires restart; use only if directed by Oracle Support

Validate Async I/O is Active

-- Confirm async I/O is enabled
SELECT name, value FROM v$parameter
WHERE name IN ('disk_asynch_io', 'filesystemio_options');

-- Enable async I/O if disabled (requires restart)
ALTER SYSTEM SET disk_asynch_io = TRUE SCOPE=SPFILE;

-- For file-system redo logs (not ASM), set filesystem async I/O:
ALTER SYSTEM SET filesystemio_options = SETALL SCOPE=SPFILE;
-- Options: NONE, DIRECTIO, ASYNCH, SETALL

Prevention & Tuning

1. Standardize on Correct Log Sizing During Database Creation

Establish redo log sizing based on expected peak redo generation from the start. Increasing log sizes later requires dropping and recreating log groups, which is an outage risk during the transition.

-- Recommended log group count and size formula:
-- Minimum 3 log groups
-- Each log sized for 20-30 minutes of peak redo generation
-- Example sizing check:
SELECT
    ROUND(MAX(redo_mb_per_min) * 25, 0) AS recommended_log_size_mb
FROM (
    SELECT
        (e_stat.value - b_stat.value) / 1024 / 1024 /
        NULLIF((s.end_interval_time - s.begin_interval_time) * 1440, 0) AS redo_mb_per_min
    FROM dba_hist_sysstat  b_stat
    JOIN dba_hist_sysstat  e_stat ON e_stat.snap_id = b_stat.snap_id + 1
                                  AND e_stat.stat_name = b_stat.stat_name
                                  AND e_stat.dbid = b_stat.dbid
    JOIN dba_hist_snapshot s ON s.snap_id = e_stat.snap_id
                             AND s.dbid    = e_stat.dbid
    WHERE b_stat.stat_name = 'redo size'
      AND s.begin_interval_time > SYSDATE - 14
);

2. Periodic Redo Health Check

Include redo log performance in weekly DBA checks:

Review average log file parallel write from AWR
Check log switch frequency from V$LOG_HISTORY
Confirm all log members are VALID and on appropriate storage

3. Monitor in Data Guard Environments

In Data Guard configurations, redo must also be transmitted to standby databases. High log file parallel write on the primary combined with high log file sync may indicate standby redo apply lag is feeding back to the primary (in Maximum Protection mode). Review V$DATAGUARD_STATS and consider switching to Maximum Performance or Availability mode.

-- Check Data Guard redo transport performance
SELECT
    dest_id,
    dest_name,
    status,
    target,
    archiver,
    schedule,
    transmit_mode,
    async_blocks
FROM v$archive_dest
WHERE status = 'VALID' AND target = 'STANDBY';

log file sync — The foreground session wait for LGWR completion; the user-facing manifestation of this wait
db file sequential read — Compare latency profiles to understand relative storage performance
direct path read — Direct I/O patterns; useful baseline for storage capability