Introduction

Archiving Server : Archiving Server enable you to archive IM communications and meeting content for compliance reasons. If you do not have legal compliance concerns, you do not need to deploy Archiving Server. An SQL Server Back End Server is required to implement Archiving.

Monitoring Server : Monitoring Server collects data about the quality of your network media, in both Enterprise voice calls and A/V conferences. It also collects call error records (CERs), which you can use to troubleshoot failed calls. Additionally, it collects usage information in the form of call detail records (CDRs) about various Lync Server features so that you can calculate return on investment of your deployment and plan the future growth of your deployment. A monitoring server role can be deployed to collect statistical usage metrics for IM, conferencing and Enterprise voice by tracking call detail records. Monitoring Server is typically collocated with the Microsoft Lync Server 2010, Archiving Server. An SQL Server Back End Server is required to implement a Monitoring Server.

Discovery with the agent

Collector Type: Agent

Category: Application Monitors

Application Name: Microsoft Lync Front End Servers

Global Template Name : Microsoft Lync Front End Servers DotNet v4

Pre-requisites : For Lync monitors need Microsoft .NET Framework 4.

Collected Metrics

Metric NameDisplay NameDescription
DBStoreQueueLatencyDBStoreQueueLatencyThis component monitor returns the average time, in milliseconds, that a request is held in the queue of the BackEnd Database Server. If the topology is healthy, this counter averages less than 100 ms.
DBStoreQueueDepthDBStoreQueueDepthThe average number of database requests waiting to be executed. The backend might be busy and is unable to respond to requests quickly.This might be a temporary condition. If the problem persists please ensure that the hardware and software requirements.
MSMQ_TotalMessagesInAllQueuesMSMQ_TotalMessagesInAllQueuesNumber of times the application has been restarted during the web server's lifetime.
SIP_503ResponseRateSIP_503ResponseRateThis component monitor returns the rate of 503 responses generated by the server, per second. The 503 code corresponds to the server being unavailable. On a healthy server, you should not receive this code at a steady rate.
SIP_504ResponseRateSIP_504ResponseRateThis component monitor returns the rate of 504 responses generated by the server, per second. A few 504 responses to clients (for clients disconnecting abruptly) is to be expected, but this counter mainly indicates connectivity issues with other servers.
SIP_ConnectionsActiveSIP_ConnectionsActiveThis component monitor returns the number of established connections that are currently active. A connection is considered established when peer credentials are verified (e.g. via MTLS), or the peer receives a 2xx response.
SIP_TLSConnectionsActiveSIP_TLSConnectionsActiveThis component monitor returns the number of established TLS connections that are currently active. A TLS connection is considered established when the peer certificate, and possibly the host name, are verified for a trust relationship.
memory.committedbytesMemory CommittedBytesActive Extended Mode SAs is the number of currently active extended mode security associations.
memory.pagespersecMemory PagesPersecCurrent State Entries is the number of state entries in the table. A state entry is a pair of IPv6 addresses that is authorized to pass through from a public to an internal interface.
SIP_SendsOutstandingSIP_SendsOutstandingThis component monitor returns the number of messages that are currently present in the outgoing queues. If you receive error message 504, investigate the results from this counter. Doing so will indicate which servers are having problems.
SIP_AvgOutgoingQueueDelaySIP_AvgOutgoingQueueDelayThis component monitor returns the average time, in seconds, that messages have been delayed in outgoing queues.
SIP_FlowControlledConnectionsDroppedSIP_FlowControlledConnectionsDroppedThis component monitor returns the total number of connections dropped because of excessive flowcontrol. You will need to baseline this counter by testing and monitoring the server's health. The returned value should be as low as possible.
SIP_AvgFlowControlDelaySIP_AvgFlowControlDelayThis component monitor returns the average delay, in seconds, in message processing when the socket is flowcontrolled. You will need to baseline this counter by testing and monitoring the server's health. The returned value should be as low as possible.
SIP_IncomingRequestRateSIP_IncomingRequestRateThis component monitor returns the rate of received requests, per second. You will need to baseline this counter by testing and monitoring the user load.
SIP_IncomingMessageRateSIP_IncomingMessageRateThis component monitor returns the rate of received messages, per second. You will need to baseline this counter by testing and monitoring the user load.
SIP_EventsInProcessingSIP_EventsInProcessingThis component monitor returns the number of SIP transactions, or dialog state change events, that are currently being processed. You will need to baseline this counter by testing and monitoring the user load.
SIP_500ResponseRateSIP_500ResponseRateThis component monitor returns the rate of 500 responses generated by the server, per second. This can indicate that there is a server component that is not functioning correctly.
SIP_AvgHoldingTimeForIncomingMessageSIP_AvgHoldingTimeForIncomingMessageThis component monitor returns the average time that the server held the incoming messages currently being processed. If this counter is more than 10 seconds (12 seconds maximum), then the server goes into throttling mode.
SIP_AddressSpaceUsageSIP_AddressSpaceUsageThis component monitor returns the percentage of available address space currently in use by the server process. The returned value should be as low as possible.
SIP_PageFileUsageSIP_PageFileUsageThis component monitor returns the percentage of available page file space currently in use by the server process. The returned value should be as low as possible.
SIP_IncomingMessagesTimedOutSIP_IncomingMessagesTimedOutThe number of incoming messages currently being held by the server for processing for more than the maximum tracking interval. This server is too busy and is unable to process user requests in timely fashion.
IM_NumberOfActiveConferencesIM_NumberOfActiveConferencesThis component monitor returns the number of active instant messaging conferences. You will need to baseline this counter by testing and monitoring the user load.
IM_NumberOfConnectedIMUsersIM_NumberOfConnectedIMUsersThis component monitor returns the number of connected instant messaging users in all conferences. You will need to baseline this counter by testing and monitoring the user load.
IM_WithThrottledSIPConnectionsIM_WithThrottledSIPConnectionsThis component monitor returns the number of throttled Sip connections. If the value is greater than ten, it could indicate that Peer is not processing requests in a timely fashion. This can happen if the peer machine is overloaded.
IMMCU_NumberOfConferencesIMMCU_NumberOfConferencesNumber of instant messaging conferences. Ideally it should be evenly distributed across all frontend servers.
IM_MCUHealthStateIM_MCUHealthStateThe Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation.
IM_MCUDrainingStateIM_MCUDrainingStateThis component monitor returns the current draining status of the MCU. Possible values: 0 = Not requesting to drain. 1 = Requesting to drain. 2 = Draining. When a server is drained, it stops taking new connections and calls.
User_services_DBStoreSprocLatencyUser_services_DBStoreSprocLatencyThis component monitor returns the average time, in milliseconds, it takes to execute a stored procedure call. A healthy state is considered to be less than 100 ms. Server health decreases as latency increases to 12 seconds, when server throttling begins.
User_services_NumberOfFailedHTTPConnectionsUser_services_NumberOfFailedHTTPConnectionsThis component monitor returns the rate of connection attempt failures, per second. You will need to baseline this counter by testing and monitoring the server's health.
Memory_PagesPerSecMemory_PagesPerSecIf a page has to be retrieved from the disk instead of from the memory, there is a negative impact to performance; the rate at which pages in memory are swapped with those in the disk needs to be below a 500 pages per second.
AVMCU_NumberofAudiovideoconferencesAVMCU_NumberofAudiovideoconferencesNumber of audiovideo conferences. Ideally it should be evenly distributed across all frontend servers.
ASMCU_NumberOfApplicationSharingConferencesASMCU_NumberOfApplicationSharingConferencesNumber of applicationsharing conferences. Ideally it should be evenly distributed across all frontend servers.
DATAMCU_HealthStateDATAMCU_HealthStateThe Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current health of the data sharing MCU. 0 = Normal. 1 = Loaded. 2 = Full. 3 = Unavail.
DATAMCU_DrainingStateDATAMCU_DrainingStateThe Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current draining status of the data sharing MCU. 0 = Not requesting to drain. 1 = Req.
DataMCU_EstimatedConferenceWorkitemsLoadDataMCU_EstimatedConferenceWorkitemsLoadThe estimated time to process all pending items on the session queues measured in milliseconds.
DataMCU_StateOfSessionQueuesDataMCU_StateOfSessionQueuesThe state of the session queues. It indicates if the Data MCU is over loaded.
DATAMCU_NumberOfDataSharingConferencesDATAMCU_NumberOfDataSharingConferencesNumber of datasharing conferences. Ideally it should be evenly distributed across all frontend servers.
ApplicationSharingMCU_HealthStateApplicationSharingMCU_HealthStateThe Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current health of the application sharing MCU. 0 = Normal. 1 = Loaded. 2 = Full.
ApplicationSharingMCU_DrainingStateApplicationSharingMCU_DrainingStateThe Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current draining status of the application sharing MCU. 0 = Not requesting to drain.
AudioVideoMCU_HealthStateAudioVideoMCU_HealthStateThe Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current health of the audiovideo MCU. 0 = Normal. 1 = Loaded. 2 = Full. 3 = Unavailable.
AudioVideoMCU_DrainingStateAudioVideoMCU_DrainingStateThe Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current draining status of the audiovideo MCU. 0 = Not requesting to drain. 1 = Req.
AddressBook_SearchResponseTimeAddressBook_SearchResponseTimeThe average processing time for a address book search request in milliseconds. It could be due to backend database performance issues. Verify CPU load on backend database machine. Upgrade hardware if needed.
AddressBook_SearchFailureRateAddressBook_SearchFailureRateThe persecond rate of failed address book search requests. It could be due to backend database performance issues. Verify backend database is running and accessible.
LS_AV_Auth_Edge_BadRequestsReceivedPerSecondLS_AV_Auth_Edge_BadRequestsReceivedPerSecondThe number of bad requests received/sec. This error occurs when an unexpectedly high rate of invalid requests is received by the A/V Authentication Service. This could be the result of an attempt to misuse the A/V Authentication Service.
PolicyDecisionPoint_ClientConnectionsAuthenticationTimeoutFailuresPerSecondPolicyDecisionPoint_ClientConnectionsAuthenticationTimeoutFailuresPerSecondThe persecond rate of client connections timing out before receiving an authenticated message. Connection from client timed out because it was not authenticated within the specified time. Check if there are any certificate issues between the machines.
PolicyDecisionPoint_ConnectionsTimedOutPerSecondPolicyDecisionPoint_ConnectionsTimedOutPerSecondThe persecond rate of sessions that have timed out before the first packet arrived. No packets were received in the connection. This can happen when some client is trying to attack the server by creating connections and consuming resources from Bandwidth.
PolicyDecisionPoint_ServerConnectionsAuthenticationTimeoutFailuresPerSecondPolicyDecisionPoint_ServerConnectionsAuthenticationTimeoutFailuresPerSecondThe persecond rate of server connections timing out before receiving an authenticated message. Connection from server timed out because it was not authenticated within the specified time. Check if there are any certificate issues between the machines.
SIP_ConnectionsRefusedDueToServerOverloadSIP_ConnectionsRefusedDueToServerOverloadThe persecond rate of the connections that were refused with Service Unavailable response because the server was overloaded. If the problem persists, please ensure that hardware and software requirements for this server meets the user usage characteristic.
ExpandDistributionList_ResponseTimeInmsExpandDistributionList_ResponseTimeInmsAverage processing time for a successful request to be completed in milliseconds. It indicates if there are any Active Directory performance issues.
ExpandDistributionList_SOAPExceptionRateExpandDistributionList_SOAPExceptionRateThe persecond rate of Soap exceptions.
AddressBookFileDownload_FailedRequestsPerSecondAddressBookFileDownload_FailedRequestsPerSecondThe persecond rate of failed Address Book file requests. High rate of failure can be caused by authentication issues or network connectivity issues.
LSCommunicatorWebApp_FailedDataCollaborationAuthenticationRequestsPerSecondLSCommunicatorWebApp_FailedDataCollaborationAuthenticationRequestsPerSecondThe number of failed Data Collaboration authentication request per second. Attempts to authenticate incoming client connections for data collaboration failed. This may indicate a network attack.
LSCommunicatorWebApp_NumberOfDataCollaborationConnectionFailuresWithDataCollaborationServersLSCommunicatorWebApp_NumberOfDataCollaborationConnectionFailuresWithDataCollaborationServersThe number of Data Collaboration connection failures with Data Collaboration servers. Connection closed by local party or remote party or network issues. Check availability of Web Conferencing Server servers.
LSCommunicatorWebApp_ThrottledClientDataCollaborationConnectionsPerSecondLSCommunicatorWebApp_ThrottledClientDataCollaborationConnectionsPerSecondThe number of Data Collaboration client connections closed due to throttling per second. Client Data Collaboration was closed because client failed to read data in a timely manner. This may indicate a network failure or organized attack.
CallPark_FailedCallParkRequestsCallPark_FailedCallParkRequestsThe total number of park requests that failed.
CallPark_FailedRequestsBecauseNoOrbitIsAvailableCallPark_FailedRequestsBecauseNoOrbitIsAvailableThe total number of park requests failed because no orbit available. Consider adding more orbits using management console or the Power Shell commands to manage orbit ranges.
CallPark_FailedTransfersToFallbackURICallPark_FailedTransfersToFallbackURIThe total number of failed fallback attempts. The fallback destination might not be reachable.
AudioVideoConferencing_NumberOfOccasionsConferenceProcessingIsDelayedAudioVideoConferencing_NumberOfOccasionsConferenceProcessingIsDelayedNumber of occasions conference processing is delayed. This issue may occur if the Audio Video Conferencing server is overloaded, or is not getting enough CPU resources to process audio in real time.
SIP_MessagesPerSecondDroppedDueToUnknownDomainSIP_MessagesPerSecondDroppedDueToUnknownDomainThe persecond rate of messages that could not be routed because the message domain is not configured and does not appear to belong to a federated partner. The Access Edge Server received SIP messages with an unknown domain.
IMMCU_ThrottledSIPConnectionsIMMCU_ThrottledSIPConnectionsThe number of throttled Sip connections . Peer is not processing requests in a timely fashion.This can happen if the peer machine is overloaded.