eG Monitoring
 

Measures reported by RUMAppTest

User experience with a web site is a key measure of web site performance. Typically, the following factors differentiate a ‘satisfactory’ user experience from a ‘slow’ or ‘frustrating’ experience:

  • Slow page views

  • Javascript errors in pages

When users complain of poor experience with a web site, administrators must first have to determine which of the aforesaid factors is adversely impacting user experience - slow pages? error pages? or both? But, this knowledge is not enough to help administrators undo the damage to revenue and reputation that user experience issues cause. For that, administrators also need to know exactly “why” and “where” the slowness occurred and “what” caused the Javascript error, so that they can quickly initiate measures to eliminate the bottleneck and ensure that user experience improves rapidly and significantly!

The challenge in isolating the root-cause of slow page views particularly, is the multi-tier nature of the page loading process. when a user requests for a web page by hitting the URL, the browser first unloads the page (if any) already loaded on to it. Then, the browser checks whether the request needs to be redirected to a different URL. If so, it redirects the request to the other URL. Next, the browser searches the App Cache for resources to be loaded to the requested page. If one/more resources are available in the cache, the browser loads the resources and then attempts to connect to the web/web application server that hosts the web site for fetching the requested page. When connecting to the server, the browser first looks up the DNS to resolve the server domain name to its IP address. If domain resolution is successful, the browser then establishes a TCP connection with the server via its IP address and port and transmits the page request to the server via that connection. The server then responds with the requested page. Upon receipt of a response from the server, the client browser builds the contents of the page by loading the document object model, and finally renders the page.

A slowdown in any one event that is part of the page loading process can affect the loading time of the page - for instance, if domain lookup takes longer than usual, page load time will increase. This is why, administrators often struggle to precisely pinpoint where the bottleneck is!

This is where the Web Site test helps. This test tracks page view requests to a monitored web site/web application, and measures the time taken by the requested pages to load. Administrators are promptly alerted if the load time slides below configured thresholds. Under such circumstances, the test supplements the administrator's troubleshooting efforts by additionally capturing and reporting the time taken by every step of the page loading process. This enables administrators to instantly and accurately isolate the exact step at which the slowdown may have occurred - during domain lookup? When connecting to the server via TCP? when processing the request at the server end? When loading the document? Or when rendering the page? Moreover, the detailed diagnosis of this test accurately pinpoints the slow pages (in terms of load time) and also provides the load time breakup per page, using which administrators can quickly and precisely identify which page is slow and why! The test also reports the count of pages with Javascript errors and provides detailed diagnostics that reveal what these errors are. This way, the test proactively alerts administrators to issues that may potentially affect user experience with a web site, pinpoints the root-cause of such issues, and thus helps administrators take corrective action, well before users notice.

Note:

  • The metrics reported by the RUMAppTest test forms the basis for the execution of all other tests mapped to the Real User Monitor component. So, if the RUMAppTest test is excluded or disabled, then none of the other tests will run.

  • The RUMAppTest test tracks requests to only the base pages of a monitored web site/web application; not to the AJAX or iFrame pages that the web application may support. To capture the responsiveness of requests to AJAX and iFrame pages, you need to use the RUMPageTypeTest test instead.

By default, the eG agent can send a maximum of 50 million characters to the eG manager, when reporting detailed diagnostics for a test for a single measurement period. If this limit is exceeded by a test during a measurement period, then the detailed diagnostics reported by that test will be automatically truncated and the additional characters dropped. Moreover, a message to this effect will also be logged in the eG agent's error log. If such errors are logged frequently for a particular test, you may want to seriously consider increasing this character limit of the detailed metrics collected by that test. For this purpose, do the following:

  • Edit the eg_tests.ini in the <EG_INSTALL_MANAGER}DIR>\manager\config directory of the eG manager installation.

  • Go to the [MAX_DD_UPLOAD_LENGTH] section of the file.

  • In this section, look for the parameter that corresponds to the <Internal_test_name> of the test for which the character limit has to be increased.

  • Once you find the parameter, set the value of that parameter to a number of your choice.

  • Finally, save the file.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Page_Requests Indicates the total number of times pages in this web site/web application were viewed by users. Number This is a good measure of the traffic to your web site/web application, and also reveals how popular your web site is.

An unusually high number of page views could be a cause for concern, as it could be owing to a malicious virus attack or an unscrupulous attempt to hack your web site/web application. Either way, be wary of sudden, but significant spikes in the page view count!
Apdex_Score Indicates the apdex score of this web site/web application. Number Apdex (Application Performance Index) is an open standard developed by an alliance of companies. It defines a standard method for reporting and comparing the performance of software applications in computing. Its purpose is to convert measurements into insights about user satisfaction, by specifying a uniform way to analyze and report on the degree to which measured performance meets user expectations.

The Apdex method converts many measurements into one number on a uniform scale of 0-to-1 (0 = no users satisfied, 1 = all users satisfied). The resulting Apdex score is a numerical measure of user satisfaction with the performance of enterprise applications. This metric can be used to report on any source of end-user performance measurements for which a performance objective has been defined.

The Apdex formula is:

Apdext = (Satisfied Count + Tolerating Count / 2) / Total Samples

This is nothing but the number of satisfied samples plus half of the tolerating samples plus none of the frustrated samples, divided by all the samples.

A score of 1.0 means all responses were satisfactory. A score of 0.0 means none of the responses were satisfactory. Tolerating responses half satisfy a user. For example, if all responses are tolerating, then the Apdex score would be 0.50.

Ideally therefore, the value of this measure should be 1.0. A value less than 1.0 indicates that the user experience with the web site/web application has been less than satisfactory.
Avg_Page_Load_Time Indicates the average time taken by the pages in this web site to load completely. ms This is the average interval between the time that a user initiates a request and the completion of the page load of the response in the user's browser. In the context of an Ajax request, it ends when the response has been completely processed.

If the value of this measure is consistently high for a web site/ web application, there is reason to worry. This is because, it implies that the web site/web application is slow in responding to requests. If this condition is allowed to persist, it can adversely impact user experience with the web site/web application. You may want to check the Apdex_Score in such circumstances to determine whether/not user experience has already been affected. Regardless, you should investigate the anomaly and quickly determine where the bottleneck lies - is it at the front end? network? or backend? - so that the problem can be fixed before users even notice any slowness! For that, you may want to compare the values of the Avg_Front_End_Time, Avg_Network_Time and Avg_Response_Avail_Time measures of this test.

If the Avg_Front_End_Time is the highest, it indicates that the problem is with the web site/web application front end - this can be attributed to a slowdown in page rendering or in DOM building. If the Avg_Network_Time is the highest, it denotes that the network is the problem source. This in turn can be caused by TCP connection latencies and delays in domain look up. On the other hand, if the Avg_Response_Avail_Time measure registers the highest value, it indicates that the problem lies with the web site/web application backend - i.e., the web/web application server that is hosting the web site/web application being monitored.

You can also use the detailed diagnosis of this measure to identify the exact pages in the web site/web application that are presently slow. With the help of the detailed metrics provided per page, you can even precisely pinpoint the root-cause of the slowness of that page.
Unique_User_session Indicates the number of distinct users who are currently accessing this web site/web application. Number  
Request_Per_Minute Indicates the number of times the pages in this web site/web application were viewed per minute. Number This is a good indicator of the level of activity on the web site.

An unusually high value for this measure may require investigation.
Percentage_Normal Indicates the percentage of page views with a normal user experience. Percent The value of this measure indicates the percentage of page views in which users have neither experienced any slowness, nor encountered any Javascript errors.

Ideally, the value of this measure should be 100%. A value less than 100% indicates the existence of one/more slow/error-prone pages in the web site. A value less than 50% is indicative of a serious problem, where most of the page views are either slow or have encountered Javascript errors. Under such circumstances, to know what exactly is affecting user experience, compare the value of the Percentage_Slow with that of the Percentage_Error. This will reveal the reason for the poor user experience with the web site/web application - slow pages? or Javascript errors?
Percentage_Slow Indicates the percentage of page views that are slow in loading. Percent Ideally, the value of this measure should be 0. A value over 50% implies that you are in a spot of bother, with over half of the page views being slow. Use the detailed diagnosis of the Slow_Requests measure to identify the slow pages and isolate the root-cause of the slowness - is it the browser? the network? or the backend?
Percentage_Error Indicates the percentage of page views that have encountered JavaScript errors. Percent Ideally, the value of this measure should be 0. A value over 50% implies that you are in a spot of bother, with over half of the page views experiencing JavaScript errors. Use the detailed diagnosis of this measure to identify the error pages and to know what Javascript error has occurred in which page. This will greatly aid troubleshooting!
Slow_Requests Indicates the number of times pages in this web site/web application took very long to be viewed. Number A page view is considered to be slow when the average time taken to load that page exceeds the SLOW TRANSACTION CUTOFF configured for this test.

Ideally, a page should load quickly. The value 0 is hence desired for this measure. If the value of this measure is high, it indicates that users frequently experienced slowness when accessing pages in the web site/web application. To know which page views are slow and why, use the detailed diagnosis of this measure.
Error_Requests Indicates the number of times JavaScript errors occurred when viewing the pages in this web site/web application. Number Ideally, the value of this measure should be 0. A high value indicates that many JavaScript errors are occurring when viewing pages in the web site/web application. Use the detailed diagnosis of this measure to identify the error pages and to know what Javascript error has occurred in which page. This will greatly aid troubleshooting!
Satisfied_Requests Indicates the number of times pages were viewed in the web site without any slowness. Number A page view is considered to be slow when the average time taken to load that page exceeds the SLOW TRANSACTION CUTOFF configured for this test. If this SLOW TRANSACTION CUTOFF is not exceeded, then the page view is deemed to be ‘satisfactory’. To know which page views are satisfactory, use the detailed diagnosis of this measure.

Ideally, the value of this measure should be the same as that of the Page_Requests measure. If not, then it indicates that one/more page views are slow - i.e., have violated the SLOW TRANSACTION CUTOFF.

If the value of this measure is much lesser than the value of the Tolerated_Requests and the Frustrated_Requests, it is a clear indicator that web site performance is below-par. In such a case, use the detailed diagnosis of the Tolerated_Requests and the Frustrated_Requests measures to know which pages are slow and why.
Tolerated_Requests Indicates the number of tolerating page views to the web site/web application. Number If the Avg_Page_Load_Time of a page exceeds the SLOW TRANSACTION CUTTOFF configuration of this test, but is less than 4 times the SLOW TRANSACTION CUTOFF (i.e., < 4 * SLOW TRANSACTION CUTOFF), then such a page view is considered to be a Tolerating page view.

Ideally, the value of this measure should be 0. A value higher than that of the Satisfied_Requests measure is a cause for concern, as it implies that the overall user experience with the pages in the web site is less than satisfactory. To know which pages are contributing to this sub-par experience, use the detailed diagnosis of this measure. The detailed metrics will also enable you to accurately isolate what is causing the Tolerated_Requests - a problem with the frontend? network? or backend?
Frustrated_Requests Indicates the number of frustrated page views to this web site/web application. Number If the Avg_Page_Load_Time of a page is over 4 times the SLOW TRANSACTION CUTTOFF configuration of this test (i.e., > 4 * SLOW TRANSACTION CUTOFF), then such a page view is considered to be a Frustrated page view.

Ideally, the value of this measure should be 0. A value higher than that of the Satisfied_Requests measure is a cause for concern, as it implies that the overall user experience with the pages in the web site is less than satisfactory. To know which pages are contributing to this sub-par experience, use the detailed diagnosis of this measure. The detailed metrics will also enable you to accurately isolate what is causing the Frustrated_Requests - a problem with the frontend? network? or backend?
Desktop_Page_Requests Indicates the number of times this web site/web application was accessed from client desktops. Number To know which pages in the web site were accessed by desktop users and to evaluate the experience of the desktop users with each of these pages, use the detailed diagnosis of this measure. In the process, slow pages can be identified and the reason for the slowness can be pinpointed.
Mobile_Page_Requests Indicates the number of times this web site/web application was accessed from mobile phones. Number To know which pages in the web site were accessed by mobile phone users and to evaluate the experience of these users with each of the pages, use the detailed diagnosis of this measure. In the process, slow pages can be identified and the reason for the slowness can be pinpointed.
Tablet_Page_Requests Indicates the number of times this web site/web application was accessed from tablets. Number To know which pages in the web site were accessed by tablet users and to evaluate the experience of these users with each of the pages, use the detailed diagnosis of this measure. In the process, slow pages can be identified and the reason for the slowness can be pinpointed.
Avg_Front_End_Time Indicates the interval between the arrival of the first byte of text response and the completion of the response page rendering by the browser. ms In a typical page loading process, the Avg_Front_End_Time denotes the time from the responseStart event to the loadEventEnd. This process includes document downloading, processing, and page rendering. This time is therefore the sum of the Avg_Dom_Download_Time and the Avg_Page_Rendering_Time.

If the Avg_Page_Load_Time of the web site/web application exceeds its threshold, then you may want to compare the value of this measure with that of the Avg_Network_Time and Avg_Response_Avail_Time to zoom into the source of the slowness - is it the front end? the network? or the backend?

If the Avg_Front_End_Time is the highest, it indicates that the problem is with the web site/web application front end - this can be attributed to a slowdown in page rendering or in DOM building. To nail the precise cause of the slowdown, compare the values of the Avg_Dom_proc_Time, Avg_Dom_Download_Time, and the Avg_Page_Rendering_Time measures. On the basis of this comparison, you will be able to tell whether the slowness occurred when downloading the document, when processing it, or when rendering the response page in the browser.
Avg_Page_Rendering_Time Indicates the time taken to complete the download of remaining resources, including images, and to finish rendering the page. ms A high value of this measure indicates that the web site/web application is taking too long to render the page in the browser. This can adversely impact the Avg_Front_End_Time, which in turn can prolong the Avg_Page_Load_Time. Ideally therefore, the value of this measure should be low.
Avg_Dom_Ready_Time Indicates the time taken to make the complete HTML document (DOM) available for JavaScript to apply rendering logic. ms The value of this measure is the sum of the Avg_Dom_Download_Time and the Avg_Dom_proc_Time measures. If the value of this measure is very high, then you may want to compare the Avg_Dom_Download_Time and the Avg_Dom_proc_Time measures to figure out what is delaying DOM building - downloading? Or processing?

A high value for this measure can adversely impact the Avg_Front_End_Time, which in turn can prolong the Avg_Page_Load_Time. Ideally therefore, the value of this measure should be low.
Avg_Dom_Download_Time Indicates the time taken to download the complete HTML document on the browser. ms Higher the download time of the document, longer will be the time taken to make the document available for page rendering. As a result, the overall user experience with the web site/web application will be affected! This is why, a low value is desired for this measure at all times.
Avg_Dom_proc_Time Indicates the time taken to build the Document Object Model (DOM) and make it available for JavaScript to apply rendering logic. ms An unusually high value for this measure is a clear indicator that DOM building is taking longer than normal. In consequence, page rendering will be delayed, thus adversely impacting user experience with the web site/web application. Ideally therefore, the value of this measure should be low.
Avg_First_Byte_Time Indicates the interval between the time that a user initiates a request and the time that the browser receives the first response byte. In the context of an Ajax request, this is the interval between the Ajax request dispatch and the time that the browser receives the first response byte. ms The Avg_First_Byte_Time is the time that elapsed between navigationStart and responseStart. The value of this measure is also the sum of Avg_Response_Avail_Time, Avg_DNS_Time, and Avg_TCP_Time. This means that an abnormal increase in any of the above-mentioned time values will increase the value of this measure.

If the first response byte from the target web site/web application is itself received slowly, it is bound to have a cascading effect on all events that follow - such as, document downloading, processing, and page rendering. Ultimately, this will impact the page load time as seen by end-users. This is why, if the Avg_First_Byte_Time violates its threshold, administrators need to instantly switch to the troubleshooting mode and rapidly isolate what is causing it - is DNS lookup taking a long time? is the network connection to the web site/web application latent? or is the web server/web application server hosting the web site slow in processing requests? By comparing the values of the Avg_Response_Avail_Time, Avg_DNS_Time, and Avg_TCP_Time measures, administrators can swiftly and accurately figure out the exact reason why there was a delay in receiving the first response byte.

If this comparison reveals that the Avg_DNS_Time is the highest, it implies that domain name resolution by the DNS server is taking a long time and impacting responsiveness. If the Avg_TCP_Time is found to be the culprit, then blame the network connection for delaying the transmission of the response byte. If the Avg_Response_Avail_Time is higher than the rest, you can be rest assured that the source of the problem lies with the server hosting the web site/web application.
Avg_Response_Avail_Time Indicates the interval between the start of processing of a request on the browser to when the browser receives the response. ms The Avg_Response_Avail_Time is the time spent between the requestStart event and responseStart event.

Ideally, a low value is desired for this measure, as high values will certainly hurt the Apdex_Score of the web site/web application.

The key factor that can influence the value of this measure is the request processing ability of the web server/web application server that is hosting the web site/web application being monitored.

Any slowdown in the backend web server/web application server - caused by the lack of adequate processing power in or improper configuration of the backend server - can significantly delay request processing by the server. In its aftermath, the Avg_Response_Avail_Time will increase, leaving users with an unsatisfactory experience with the web site/web application.
Avg_Network_Time Indicates the elapsed time since a user initiates a request and the start of fetching the response document from the server or application. ms The time spent between navigationStart and requestStart makes up the Avg_Network_Time. This includes the time to perform DNS lookups and the time to establish a TCP connection with the server. In other words, the value of this measure is nothing but the sum of the Avg_DNS_Time and Avg_TCP_Time measures.

Ideally, the value of this measure should be low. A very high value will often end up delaying page loading and damaging the quality of the web site service. In the event that the server connection time is high therefore, simply compare the values of the Avg_DNS_Time, and Avg_TCP_Time measures to know to what this delay can be attributed - a delay in domain name resolution? Or a poor network connection to the server?
Avg_DNS_Time Indicates the time taken by the browser to perform the domain lookup for connecting to this web site/web application. ms A high value for this measure will not only affect DNS lookup, but will also impact the Avg_Network_Time and Avg_Page_Load_Time of the web site/web application. This naturally will have a disastrous effect on user experience.
Avg_TCP_Time Indicates the time taken by the browser to establish a TCP connection with the server. ms A bad network connection between the browser client and the server can delay TCP connections to the server As a result, the Avg_Network_Time too will increase, thus impacting page load time and overall user experience with the web site/web application.