Troubleshooting memory issues
Sun/Oracle SPARC Enterprise T5220 Memory Fault Recovery Summary
Initial symptoms
The ILOM service processor reported:
Unsupported memory configuration
Later, after changes/reseating, the system reported:
/SYS/MB/CMP0/MCU0 Forced fail (IBIST)
Operating with a degraded memory configuration.
At OpenBoot, POST eventually showed:
POST Passed all devices.
ERROR: The following devices are disabled:
MB/CMP0/MCU0
Aborting auto-boot sequence.
This showed that the memory controller was no longer actively failing POST, but it had remained disabled by ASR/ILOM from a previous fault.
Root causes
There were two separate issues:
DIMMs were not in a supported T5220 population pattern. The T5220 requires DIMMs to be installed in specific slots. For example, a valid 4-DIMM configuration is:
J1001
J1401
J2001
J2401
Valid population counts are 4, 8, or 16 DIMMs, with correct slot order and compatible matching FB-DIMMs.
DIMM slot/contact issue.
Dirty or marginal DIMM contacts caused a memory-controller path fault during testing, which led ILOM/ASR to disable:
/SYS/MB/CMP0/MCU0
Successful recovery steps
- Powered down the host.
- Corrected the DIMM placement according to the supported T5220 slot population rules.
- Cleaned the DIMM slots and reseated the memory modules carefully.
- Started the host and observed that POST passed.
- Checked the disabled component state: -> show /SYS/MB/CMP0/MCU0
It showed:
component_state = Disabled
Re-enabled the memory controller manually:
-> set /SYS/MB/CMP0/MCU0 component_state=Enabled
Restarted the system.
Verified the result:
-> show faulty
Final result:
No faults found
Final conclusion
The successful fix was not only clearing an ILOM fault. The real recovery required:
Correct DIMM positions
- clean/reseated DIMM slots
- re-enable the ASR-disabled MCU0 component
- reboot/retest
After this, all machines showed an empty show faulty, confirming the hardware state was clean.