# Intel<sup>®</sup> N440BX Server System Event Log (SEL) Error Messages

Revision 1.00

5/11/98



#### **DISCLAIMERS**

#### Copyright ©1998 Intel Corporation

Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice.

Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

I<sup>2</sup>C is a two-wire communications bus/protocol developed by Philips. SMBus is a subset of the I<sup>2</sup>C bus/protocol and was developed by Intel. Implementations of the I<sup>2</sup>C bus/protocol or the SMBus bus/protocol may require licenses from various entities, including Philips Electronics N.V. and North American Philips Corporation.

The SEL Viewer may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

\*Third-party brands and names are the property of their respective owners.

## **Contents**

| 1 | INTRODUCTION1                                                                                                                                                                                         |
|---|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|   | 1.1 DOCUMENT ORGANIZATION 1   1.2 CONVENTIONS AND TERMINOLOGY 1   1.3 OVERVIEW 2   1.3.1 Sensors 4   1.3.2 Event Generator 4   1.3.3 Event Receiver 4   1.3.4 BIOS 4   1.3.5 System Event Log (SEL) 4 |
| • | 1.4 SEL VIEWER SCREEN                                                                                                                                                                                 |
| 2 |                                                                                                                                                                                                       |
|   | 2.1 Using The Sensor Table                                                                                                                                                                            |
| 3 | BIOS ERROR MESSAGES                                                                                                                                                                                   |
|   | 3.1.1Using The System Event logging Format Table103.2POST ERROR113.2.1Using the Post Error Table11                                                                                                    |
|   | Tables                                                                                                                                                                                                |
|   | Yable 1-1: Glossary1Yable 2-1: Sensor Type Codes8                                                                                                                                                     |
|   | Table 3-1: System Event Logging Format                                                                                                                                                                |
|   | Table 3-2: POST Error Messages                                                                                                                                                                        |
|   | Figures                                                                                                                                                                                               |
|   | Figure 1-1: Event Message Flow                                                                                                                                                                        |
|   | Figure 1-2: SEL Viewer Screen                                                                                                                                                                         |
|   | Figure 2-1: SEL Viewer in verbose mode                                                                                                                                                                |
| - | -0                                                                                                                                                                                                    |

## 1 Introduction

This document is provided as a reference to the information displayed by the System Event Log (SEL) Viewer on the Intel® N440BX server boards and server systems. It provides a tabular description of the event that has been recorded in the SEL.

## 1.1 Document Organization

This document is primarily composed of tables containing possible conditions that occur in the SEL along with a definition of the SEL data.

**Section 1** contains a brief introduction to the SEL and the SEL Viewer, along with a glossary of terms and acronyms used in this document.

**Section 2** contains a table of SEL information generated from the sensors on the Intel<sup>®</sup> N440BX server platform.

**Section 3** contains tables of SEL information generated from the N440BX BIOS System Events and POST Errors.

## 1.2 Conventions and Terminology

This document uses the following terms and abbreviations:

Table 1-1: Glossary

|        | <u> </u>                                                                                                                                                                                           |
|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Term   | Definition                                                                                                                                                                                         |
| BIOS   | Basic Input Output System.                                                                                                                                                                         |
| BSP    | Boot Strap Processor.                                                                                                                                                                              |
| BMC    | Baseboard Management Controller.                                                                                                                                                                   |
| Byte   | An 8-bit quantity.                                                                                                                                                                                 |
| CMOS   | In terms of this specification, this describes the PC/AT* compatible region of battery-backed 128 bytes of memory, which normally resides on the baseboard.                                        |
| DIMM   | Dual-inline Memory Module. Name for the plug in modules used to hold the system's DRAM (Dynamic Random Access Memory).                                                                             |
| ECC    | Error-correcting Code. Refers to a set of additional bits on system RAM that are used to provide a check code that is used to verify memory data integrity.                                        |
| EMP    | Emergency Management Port.                                                                                                                                                                         |
| EvMRev | Event Message Revision                                                                                                                                                                             |
| FRB    | Fault Resilient Booting. A term used to describe system features and algorithms that improve the likelihood of the detection of, and recovery from, processor failures in a multiprocessor system. |
| FRU    | Field Replaceable Unit.                                                                                                                                                                            |
| HSC    | Hot-Swap Controller. Name for the microcontroller that implements the SAF-TE command set and controls the fault lights and drive power on a N440BX Backplane.                                      |
| IERR   | Internal Error. A signal from the Pentium® II Xeon™ processor indicating an internal error condition.                                                                                              |

| Term   | Definition                                                                                                                                                                                                                                                                                                                                                                                                                      |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| IPMB   | Intelligent Platform Management Bus. Name for the architecture, protocol, and implementation of a special bus that interconnects the baseboard and chassis electronics and provides a communications media for system platform management information.                                                                                                                                                                          |
| IPMI   | Intelligent Platform Management Interface. This protocol is used for communication between microcontrollers, System Management Software, and other 'intelligent' devices on the IPMB.                                                                                                                                                                                                                                           |
| NMI    | Non-maskable Interrupt. The highest priority interrupt in the system, after SMI. This interrupt has traditionally been used to notify the operating system fatal system hardware error conditions, such as parity errors and unrecoverable bus errors.                                                                                                                                                                          |
| NVRAM  | Non-Volatile RAM.                                                                                                                                                                                                                                                                                                                                                                                                               |
| PERR   | Parity Error. A signal on the PCI bus that indicates a parity error on the bus.                                                                                                                                                                                                                                                                                                                                                 |
| POST   | Power On Self Test.                                                                                                                                                                                                                                                                                                                                                                                                             |
| SAF-TE | SCSI Accessed Fault-tolerant Enclosure specification. Describes a set of SCSI commands whereby drive fault status can be sent to an enclosure for the purpose of presenting that fault information with external indicators, such as fault lights. Other commands are provided so certain management information about the enclosure, such as temperature, voltage, number of drive bays, power status, etc., can be retrieved. |
| SCU    | System Configuration Utility.                                                                                                                                                                                                                                                                                                                                                                                                   |
| SDR    | Sensor Data Record. A data record that provides platform management sensor type, locations, event generation, and access information.                                                                                                                                                                                                                                                                                           |
| SEL    | System Event Log. A non-volatile storage area and associated interfaces for storing system platform event information for later retrieval.                                                                                                                                                                                                                                                                                      |
| SERR   | System Error. A signal on the PCI bus that indicates a 'fatal' error on the bus.                                                                                                                                                                                                                                                                                                                                                |
| SM     | Server Management.                                                                                                                                                                                                                                                                                                                                                                                                              |
| SMI    | System Management Interrupt.                                                                                                                                                                                                                                                                                                                                                                                                    |
| SMS    | Server Management Software.                                                                                                                                                                                                                                                                                                                                                                                                     |
| SSU    | System Setup Utility.                                                                                                                                                                                                                                                                                                                                                                                                           |

#### 1.3 Overview

The System Event Log (SEL) is a non-volatile repository for system events. The SEL Viewer provides an interface for the server administrator to view the SEL. The administrator can use this information to:

- Monitor the server for both warnings, such as when the chassis door on a server has been opened, or potential critical problems, such as when a processor has failed or a temperature threshold has been crossed.
- Examine all system event log entries recorded by the Baseboard Management Controller (BMC). These events can be generated from the BMC, Hot Swap Controller (HSC), BIOS and all the events generated by the firmware.
- Examine SEL records by sensor Type and Number in hex or verbose mode
- Examine SEL records by event Type in hex or verbose mode
- Examine SEL records from a previously stored binary file in hex or verbose mode.

Information moves from the Intel® N440BX server system to the SEL Viewer as

## demonstrated in the following diagram:

Figure 1-1: Event Message Flow



#### 1.3.1 Sensors

The Intel® N440BX Server System contains sensors that monitor System Health. For example, a management controller that scans temperatures and voltages provides an interface to this information as 'temperature sensors' and 'voltage sensors'.

In the event that a sensor reading exceeds the predefined range, an Event Message is generated. This Event Message is passed to the Baseboard Management Controller (BMC). The BMC passes the event message to the System Event Log (SEL) where it becomes available for querying by the System Event Log Viewer.

### 1.3.2 Event Generator

The BMC itself will typically be responsible for monitoring and managing the system board. For example, monitoring baseboard temperatures and voltages. As such, the BMC will also be an Event Generator, sending the Event Messages that it generates internally to its Event Receiver functionality.

#### 1.3.3 Event Receiver

The Event Receiver is the device that receives Event Messages. In the main computing unit, a particular management controller, referred to as the Baseboard Management Controller (BMC), is normally the Event Receiver for the system.

#### 1.3.4 BIOS

The system BIOS (Basic I/O System) firmware serves many important roles in platform management. It loads and initializes the system management hardware interfaces so they can be accessed later by System Management Software and the Operating System. The BIOS works with the system hardware and management controllers during POST to implement checks of the system and management hardware.

### 1.3.5 System Event Log (SEL)

The System Event Log (SEL) is the repository for the Event Messages. The System Event Log is implemented as non-volatile storage to ensure that Critical Events entered into the SEL can be retrieved for 'post-mortem analysis' should a system failure occur. The platform's System Event Log is typically of limited size (~3 to ~8 KB, depending on the implementation). On Intel® N440BX server systems, the SEL can hold up to 206 entries or about 3 KB. Therefore, it is important to clear the SEL periodically.

If the SEL becomes full it will not delete previous entries. This prevents the first event record, which may provide the most important information on a critical event, from being deleted before it can be viewed. If, however, management software

such as Intel<sup>®</sup> Server Control (ISC) has been installed on the server system, it will manage the SEL. ISC will automatically delete older SEL entries and prevent the Log from completely filling. Since ISC provides a variety of alerting capabilities for critical events, it eliminates the need to retain older SEL entries until they are viewed.

#### 1.4 SEL Viewer Screen

The SEL Viewer is the user interface to the SEL. This interface can be run both from the Emergency Management Port (EMP) and the System Setup Utility (SSU). It extracts information from the SEL and presents it to the user in either a hex or verbose format. It also provides support for the user to save the current SEL data to a file for later use or clear the current SEL records at the server. If using SSU, the records can be cleared directly from the interface. If using the EMP, clearing the log has to be done through the BIOS set up.

Figure 1-2 gives a better idea of the type of information that can be gathered by looking at the SEL Viewer. For documentation purposes, the EMP interface is used here.



Figure 1-2: SEL Viewer Screen

**Rec. ID** – A unique id that is generated for each event in the SEL.

**Event Type** - Indicates what the event pertains to. Currently holds the value of "System Event". This field is for future use.

**Time Stamp** - The time and the date that the error was generated (Pre\_Init Timestamp means no timestamp is available. This should not appear on a post production system).

**Generator ID** - This field identifies the device that has generated the Event Message.

**EmvRev.** - This field is used to identify different revisions of the Event Message format. Currently hold the value of "#02". This field is for future use.

**Sensor Type** - Indicates the event class or type of sensor that generated the Event Message. See table 2-1 for the list of Sensor Types.

**Sensor Number** - A unique number (within a given sensor device) representing the 'sensor' within the management controller that generated the Event Message. Sensor numbers are used for both identification and access of sensor information, such as getting and setting sensor thresholds. See table 2-1 for the list of Sensor Types.

**Event Description** - Short description of the event that generating the entry in the SEL Viewer.

## 2 Sensor Type Codes

The Sensor Type Code Table provides information regarding:

- The type of sensor generating the SEL entry
- The name of the sensor
- The microcontroller which initiated the SEL entry
- The warning or error which initiated the SEL entry

## 2.1 Using The Sensor Table

Compare the **Sensor Type and Number** message from the SEL Viewer to **the Sensor Type / Number in Verbose** column in the table below. Use the **Sensor Name** column in the table to determine the physical component which has generated the SEL message and the **Generator ID** column to locate the microcontroller which reported the event to the SEL. If more detailed information regarding the event is available from the Event Description column in the SEL Viewer.

For example, if there is a "Fan #10" entry in the **Sensor Type and Number** column of the SEL Viewer, by looking at the Sensor Type Codes table (figure 2-1 below) it can be determined that the Baseboard Fan 1 caused this entry. The entry was reported through the Baseboard Management Controller on the Intel<sup>®</sup> N440BX server board.



Figure 2-1: SEL Viewer in verbose mode

Table 2-1: Sensor Type Codes

| Sensor Type and Number in<br>Verbose | Sensor Name                                                    | Generator ID |
|--------------------------------------|----------------------------------------------------------------|--------------|
| Reserved                             | 00h                                                            |              |
| Temperature #13                      | Primary Processor Socket Temp (disabled by default)            | BMC          |
| Temperature #14                      | Secondary Processor Socket Temp (disabled by default)          | BMC          |
| Temperature #15                      | Baseboard Temp1 (disabled by default)                          | BMC          |
| Temperature #16                      | Baseboard Temp2 (disabled by default)                          | BMC          |
| Temperature #17                      | Processor1 Temp                                                | BMC          |
| Temperature #18                      | Processor2 Temp                                                | BMC          |
| Temperature #19                      | Baseboard Temp1                                                | BMC          |
| Temperature #1A                      | Baseboard Temp2                                                | BMC          |
| Temperature #01                      | Backplane Temperature                                          | HSC          |
| Voltage #01                          | Baseboard 5V                                                   | BMC          |
| Voltage #02                          | Baseboard 3.3V                                                 | BMC          |
| Voltage #03                          | Primary Processor                                              | BMC          |
| Voltage #04                          | Secondary Processor                                            | BMC          |
| Voltage #05                          | Processor 2.5V                                                 | BMC          |
| Voltage #06                          | 5v Standby                                                     | BMC          |
| Voltage #07                          | Baseboard SCSI-A Term1                                         | BMC          |
| Voltage #08                          | Baseboard SCSI-A Term2                                         | BMC          |
| Voltage #09                          | Baseboard SCSI Term3                                           | BMC          |
| Voltage #0A                          | Baseboard –12V                                                 | BMC          |
| Voltage #0B                          | Baseboard SCSI-B Term1                                         | BMC          |
| Voltage #0C                          | Processor 1.5V                                                 | BMC          |
| Voltage #0D                          | Baseboard -5V                                                  | BMC          |
| Voltage #0E                          | Baseboard 12                                                   | BMC          |
| Fan #0F                              | Baseboard Fan0                                                 | BMC          |
| Fan #10                              | Baseboard Fan1                                                 | BMC          |
| Fan #11                              | Processor Fan0                                                 | BMC          |
| Fan #12                              | Processor Fan1                                                 | BMC          |
| Fan #0C                              | Backplane Fan1 speed                                           | HSC          |
| Fan #0D                              | Backplane Fan2 speed                                           | HSC          |
| Physical Security #26                | Chassis Intrusion                                              | BMC          |
| Secure Mode Violation Attempt #27    | EMP Password (at the time of connecting to the server)         | BMC          |
| Secure Mode Violation Attempt #28    | Secure Mode (violation while the system is in the secure mode) | BMC          |
| Processor #1B                        | Processor1 Status                                              | BMC          |
| Processor #1C                        | Processor2 Status                                              | BMC          |
| Memory #1F                           | DIMM1 Presence                                                 | BMC          |
| Memory #20                           | DIMM2 Presence                                                 | BMC          |
| Memory #21                           | DIMM3 Presence                                                 | BMC          |
| Memory #22                           | DIMM4 Presence                                                 | BMC          |
| Drive Bay #02                        | Drive Slot 0 Status                                            | HSC          |
| Drive Bay #03                        | Drive Slot 1 Status                                            | HSC          |
| Drive Bay #04                        | Drive Slot 2 Status                                            | HSC          |
| Drive Bay #05                        | Drive Slot 3 Status                                            | HSC          |
| Drive Bay #06                        | Drive Slot 4 Status                                            | HSC          |
| Drive Bay #07                        | Drive Slot 0 Presence                                          | HSC          |

| Sensor Type and Number in<br>Verbose | Sensor Name           | Generator ID |
|--------------------------------------|-----------------------|--------------|
| Drive Bay #08                        | Drive Slot 1 Presence | HSC          |
| Drive Bay #09                        | Drive Slot 2 Presence | HSC          |
| Drive Bay #0A                        | Drive Slot 3 Presence | HSC          |
| Drive Bay #0B                        | Drive Slot 4 Presence | HSC          |
| POST Error #25                       | See Section 3.2       | BIOS         |
| Watchdog #25                         | Watchdog Event        | BMC          |
| System Event #EF                     | See Table 3.1         | BIOS         |

## 3 BIOS Error Messages

The BIOS is responsible for monitoring and logging certain System Events, Memory Errors and Critical Interrupts. The BIOS sends an event request message to BMC to log the event. Some errors such as the processor failure are logged during early POST.

### 3.1.1 Using The System Event logging Format Table

In the table below, **Sensor Type and Number** both in verbose and hex, the first 2 bytes of the **Event Description** in hex, the **Event Type** and The **Event Description** in verbose for the BIOS generated error are listed in the table below.

For example, if in verbose mode, the **Sensor Type and Number** is "Memory #EF" and the **Event Description** begins with the words "Correctable ECC...", the event type is a "Single bit memory Error".

Sensor Type and Sensor Type Event **Event Type Event Description** Number in verbose and Number Description in in hex hex System Event 12 EF E7 01 -- --System Boot Event System Boot Event ... #EF E7 00 -- --System Event 12 EF System System Boot Event ... #EF Reconfiguration Memory #EF 0C EF Correctable ECC ... E7 20 -- --Single Bit Memory Error Memory #EF 0C EF E7 21 -- --**Double Bit Memory** Non-Correctable ECC ... Error Memory #EF 0C EF Memory Parity Error E7 02 -- --Parity ... E7 01 -- --**Bus Timeout** Bus Timeout ... Critical Interrupt 13 EF #EF I/O CHK I/O Channel check NMI Critical Interrupt 13 EF E7 02 -- --#EF Critical Interrupt 13 EF E7 03 -- --Software NMI Software NMI ... #EF Critical Interrupt 13 EF E7 04 -- --PCI PERR PCI PERR ... #EF 13 EF PCI SERR PCI SERR ... Critical Interrupt E7 05 -- --#EF

Table 3-1: System Event Logging Format

#### 3.2 Post Error

Currently the BIOS events that are generated in the pre boot period, such as clearing CMOS, are logged during early stage as POST Error #25 under the Sensor

Type and Number column. This section contains a table of the POST Errors, which help the user clarify the error message.

### 3.2.1 Using the Post Error Table

To use this table, view the SEL Viewer entries in hex mode. The Event Description contains 4 bytes. Reverse the last two bytes of the description and look for the corresponding information in the **Code** column of the table below. Use the **Error Message** column to determine the error that caused the entry in the SEL and the **Pause on Error** column to determine how the BIOS will react to this error.

For example, if in hex mode, the Event Description is "00 a1 52 81", the last two bytes are "52 81", as shown in figure below. Look at the corresponding Error message in the table. In this example the corresponding error message to the **Code** "52 81" is "NVRAM (CMOS) Data Invalid, NVRAM cleared" meaning the CMOS was cleared.

Note: In other documents in which the POST Error Message table is used, the 2 bytes of the Codes are going to be in reverse order. In this document, the **Code** column in table 3-2 has been modified to help the user access information easily.



Figure 3-1: SEL Viewer in hex mode

Table 3-2: POST Error Messages

| Code  | Error Message                                                 | Pause on Error |
|-------|---------------------------------------------------------------|----------------|
| 62 01 | BIOS unable to apply BIOS update to processor 1               | Yes            |
| 63 01 | BIOS unable to apply BIOS update to processor 2               | Yes            |
| 64 01 | BIOS does not support current stepping for processor 1        | Yes            |
| 65 01 | BIOS does not support current stepping for processor 2        | Yes            |
| 00 02 | Failure Fixed Disk                                            | No             |
| 10 02 | Stuck Key                                                     | No             |
| 11 02 | Keyboard error                                                | No             |
| 12 02 | Keyboard Controller Failed                                    | Yes            |
| 13 02 | Keyboard locked - Unlock key switch                           | Yes            |
| 20 02 | Monitor type does not match CMOS - Run SETUP                  | No             |
| 30 02 | System RAM Failed at offset:                                  | No             |
| 31 02 | Shadow Ram Failed at offset:                                  | No             |
| 32 02 | Extended RAM Failed at offset:                                | No             |
| 50 02 | System battery is dead - Replace and run SETUP                | Yes            |
| 51 02 | System CMOS checksum bad - Default configuration used         | Yes            |
| 60 02 | System timer error                                            | No             |
| 70 02 | Real time clock error                                         | No             |
| 97 02 | ECC Memory error in base (extended) memory test in Bank xx    | Yes            |
| B2 02 | Incorrect Drive A type - run SETUP                            | No             |
| B3 02 | Incorrect Drive B type - run SETUP                            | No             |
| D0 02 | System cache error - Cache disabled                           | No             |
| F5 02 | DMA Test Failed                                               | Yes            |
| F6 02 | Software NMI Failed                                           | No             |
| 01 04 | Invalid System Configuration Data - run configuration utility | No             |
| None  | System Configuration Data Read Error                          | No             |
| 03 04 | Resource Conflict                                             | No             |
| 04 04 | Resource Conflict                                             | No             |
| 05 04 | Expansion ROM not initialized                                 | No             |
| 06 04 | Warning: IRQ not configured                                   | No             |
| 04 05 | Resource Conflict                                             | Error          |
| 05 05 | Expansion ROM not initialized                                 | No             |
| 06 05 | Warning: IRQ not configured                                   | No             |
| 01 06 | Device configuration changed                                  | No             |
| 02 06 | Configuration error - device disabled                         | No             |
| 00 81 | processor 0 failed BIST                                       | Yes            |
| 01 81 | processor 1 failed BIST                                       | Yes            |
| 04 81 | processor 0 Internal Error (IERR) failure                     | Yes            |
| 05 81 | processor 1 Internal Error (IERR) failure                     | Yes            |
| 06 81 | processor 0 Thermal Trip failure                              | Yes            |
| 07 81 | processor 1 Thermal Trip failure                              | Yes            |
| 08 81 | Watchdog Timer failed on last boot, BSP switched.             | Yes            |
| 0A 81 | processor 1 failed initialization on last boot.               | Yes            |
| 0B 81 | processor 0 failed initialization on last boot.               | Yes            |
| 0C 81 | processor 0 disabled, system in Uni-processor mode            | Yes            |
| 0D 81 | processor 1 disabled, system in Uni-processor mode            | Yes            |
| 0E 81 | processor 0 failed FRB Level 3 timer                          | Yes            |
| 0F 81 | processor 1 failed FRB Level 3 timer                          | Yes            |
| 10 81 | Server Management Interface failed to function                | Yes            |
| 20 81 | IOP sub-system is not functional                              | Yes            |

| Code  | Error Message                              | Pause on Error |
|-------|--------------------------------------------|----------------|
| 50 81 | NVRAM (CMOS) Cleared by Jumper             | Yes            |
| 51 81 | NVRAM (CMOS) Checksum Error, NVRAM cleared | Yes            |
| 52 81 | NVRAM (CMOS) Data Invalid, NVRAM cleared   | Yes            |