Walkthrough: Two Steps to Agent Health

The mini Admin Dashboard provides an immediate indication of Agent health. When you notice errors or problems, you can click on the mini Admin Dashboard to jump immediately to the full Admin Dashboard to examine the details. For details, see mini Admin Dashboard.

This topic describes how to assess/restore Agent health in "two steps" using the Admin Dashboard:

  1. Quickly identify any Agents in the system with problematic statuses and/or events.

  2. Assess the problem and perform the required corrective action.

You can also handle ITM On-Prem (ObserveIT) health monitoring by receiving digest summaries of system events via email notifications. For details, see Configuring Email Notification Settings for Events.

To assess/restore Agent health using the Admin Dashboard

  1. In the Agents portal of the Admin Dashboard, you can see a list of Agent groups with their error statuses and the number of Agents that have errors.

    For example:

  2. To view popup details about the statuses of the Agents in a group, hover the mouse over the group's colored status bar.

    For example:

    Clicking on a status in the popup links directly to the Servers page, displaying the Servers status filtered according to the Group type and status criteria. For descriptions of the different Agent status types, see Assessing Agent Statuses and Details.

  3. If any of the Agents in the group were tampered with or incurred data loss in the past 7 days, place the mouse over the Tampered With icon or Data Loss icon to view the date of the last occurrence.

    For example:

  4. To view details of the Agent group member with errors on the endpoint, click the error number (1in the above example).

    The Endpoints page opens, displaying the endpoint that has the error. By expanding the details, you can see that the Status Details field displays "Tampered With". The colored severity bars indicate the event severity level (in this example, Red which indicates a high severity).

  5. Click the Error link (or the System Events link) in the Endpoints list to view the event in the System Events list where you can view expanded details.

    You can view additional information, such as the name of the file that was tampered with, in the HealthEventsLog file located under C:\Program Files\ObserveIT\ObserveITAgent\Trace. On Unix systems, the health_monitor.log file is located under /opt/observeit/agent/run.

  6. Assess the problem and perform the required corrective action. You can look in the HealthEventsLog (or health_monitor.log)file to verify if the file is missing or was changed. If the file is missing, it is recommended to reinstall the Agent with the latest software version used (or copy the file from another location). If the file was modified, then correct it as needed.

  7. When you have resolved the event, the Agent group's status will be displayed in the Admin Dashboard as OK (green). The mini Admin Dashboard will also appear "error-free".

    It may take a few minutes for the dashboard to be updated with the resolved status. If the event was not resolved properly, the status will not be updated.

    Following is an example of how the Admin Dashboard looks with no errors (i.e., all Agents' statuses are OK):

    The Tampered With icon stays on the Admin Dashboard for up to one week after the tampering event occurred (as a reminder that tampering had occurred on this Agent group within the last week). The row remains shaded orange as well, to easily identify which Agent group was tampered with. If you want to remove the Tampered With icon, Admin users can refresh the dashboard by clicking the Reset icon (that appears next to Agents).

For further details about Agents, Agent statuses, and system events see: