In order to troubleshoot slow Azure Virtual Desktop logons, it is helpful to have a complete understanding of the application and how it operates. Here, we will outline the architecture and key components of AVD, the impact of DNS on the connection flow, and how eG Innovations offers time saving tools to troubleshoot problems with AVD before they negatively impact your organization’s productivity.
About Azure Virtual Desktop (AVD) Technology
Azure Virtual Desktop (AVD) is a desktop and application virtualization offering from Microsoft and based on Azure Cloud infrastructure. AVD is the only virtual desktop infrastructure (VDI) that delivers multi-session Windows 10 optimized for Office 365 ProPlus and support for Remote Desktop Services (RDS) environments. The architecture of AVD is shown below and you can see how Microsoft manage the Control Plane (middle part) whilst you are responsible for managing the Resource plane (right side).
As you can see the AVD architecture has many moving elements, and all these elements need to perform efficiently for the user to logon and consume the resources available in the resource plane. It is insufficient to monitor AVD host pools and sessions as key components such as the brokering by the AVD broker and authentication components need full coverage whether that be Azure AD, Azure AD Connector or any on-premises Active Directory synchronized to the AVD environment.
The Connection Flow Sequence in Azure Virtual Desktop Access
Microsoft has documented the connection flow sequence – see, “Understanding Azure Virtual Desktop network connectivity”, a representation is shown below:
- Using a supported Azure Virtual Desktop client, the user subscribes to the Azure Virtual Desktop Workspace.
- Azure Active Directory (Azure AD) authenticates the user and returns the token used to enumerate resources available to a user.
- Client passes token to the Azure Virtual Desktop feed subscription service.
- Azure Virtual Desktop feed subscription service validates the token.
- Azure Virtual Desktop feed subscription service passes the list of available desktops and RemoteApps back to the client in the form of digitally signed connection configuration.
- Client stores the connection configuration for each available resource in a set of .rdp files.
- When a user selects the resource to connect, the client uses the associated .rdp file and establishes the secure TLS 1.2 connection to the closest Azure Virtual Desktop gateway instance and passes the connection information.
- Azure Virtual Desktop gateway validates the request and asks the Azure Virtual Desktop broker to orchestrate the connection.
- Azure Virtual Desktop broker identifies the session host and uses the previously established persistent communication channel to initialize the connection.
- Remote Desktop stack initiates the TLS 1.2 connection to the same Azure Virtual Desktop gateway instance as used by the client.
- After both client and session host connected to the gateway, the gateway starts relaying the raw data between both endpoints, this establishes the base reverse connect transport for the RDP (Remote Desktop Protocol).
- After the base transport is set, the client starts the RDP handshake.
How DNS Plays a Key Role in AVD Connection Flow Sequence
Jack Tracey documents the importance of DNS and its role in the AVD connection flow, see ”DNS Importance With WVD – Jack Tracey – Cloud & Networking”. If the user’s DNS points to a DNS server that is not geographically local, perhaps in a different region or country, then the user will experience a bad user experience since the AVD traffic decides the location of the Front Door service, Gateway and Traffic manager based on DNS level.
The AVD Reverse Connect Mechanism
Azure Virtual Desktop uses the Reverse Connect mechanism for establishing the AVD session and for carrying RDP traffic. I have explained in a previous article, that whilst this ensures security by ensuring listening on open ports does not occur, it also obscures information such as the client’s endpoint IP address that admins are used to having access to on-premises. My article also covers how it is possible to capture the end-user IP address (public IP address) within AVD for security, compliance, threat-hunting and GDPR requirements.
The Role of FSLogix User Profile and its Implications
When you login and logout from the desktop, user profiles are transferred from/to the centralized storage location. This frequently caused elevated logon duration. In recent years, Microsoft has adopted an innovative technology – the “FSLogix Container” in AVD to capture the profile information and stored as a Virtual hard disk (VHD(X)) either on Azure Files or Azure NetApp File services, etc and mounted this disk to the user when he logs into the desktop. FSLogix Container architecture is shown below
Sometimes, the FSLogix container hard disk can fail to attach to the user, and this affects the user profile needs. Additionally, the FSLogix container hard disk takes time to mount and affects the AVD user logon duration. James Kindon has some insightful articles available (see FSLogix Synchronous Processing (jkindon.com)) covering how FSLogix affects the user-logon experience.
Proactively monitoring the FSLogix process within the logon is essential, eG Enterprise will proactively alert administrators of attachment errors and unacceptable performance problems associated with FSLogix. Out-of-the-box thresholds and alerts are configured to ensure FSLogix events and metrics are continually monitored, and anomalous trends highlighted.
The Roles of Azure Active Directory and Azure AD Connect and their Implications
AVD heavily depends on Azure Active Directory (Azure AD) infrastructure. It is incredibly important to monitor Azure AD. I have explained in further detail about this in previous articles, see: “Azure AD Monitoring Tips & Strategies” and “Monitor Azure Active Directory (AD) Users”. AVD users are part of the On-Premises Active Directory Domain and with the help of Azure AD Connect, you can allow the users to consume AVD resources from the resource plane. If there is a problem in Azure AD Connect, it will also impact the AVD user’s logon process. So, it is equally important to monitor the Azure AD Connect component, the details of which I have covered in my article, “Azure Active Directory (Azure AD) – 101”.
Consider the scenario when the AVD user accounts need to be part of both On-Premises AD and Azure AD. Before the user session is created on the session host, the user should be authenticated firstly on Azure AD and secondly against On-Premises AD. This is where a crucial step “Identity Matching” is done by Azure to ensure the actual user is the same person logging in at both places. Tom Hickling has a good and detailed overview of this process available, see “The WVD Log in process and Active Directory topologies explained”.
As you can see there are many activities are happening behind the scenes when the user tries to logon and many things can go wrong. Since Microsoft manages the Control Plane elements like Web Access, Brokering, Load-Balancing session hosts, Azure AD, etc., Monitoring the user logon duration is complex in AVD. Microsoft has enabled the diagnostics in the connection sequence and available in the following Azure Monitor tables below:
- Connection
- Checkpoint
- Error
- Management
- HostRegistration
You can write Kusto (KQL (Kusto Query Language)) queries from the above Monitor tables and find the logon duration, failures, etc. Alternatively, you can import the workbook from public repositories into your tenant and monitor the AVD logon duration and failures. These are reactive and rely on manual intervention, hardly ideal when your users are facing elevated logon duration or issues in the logon process and calling help desk.
How does eG Enterprise help to Troubleshoot Azure Virtual Desktop Logon Issues?
eG Enterprise monitors the AVD Control Plane collecting and reporting on numerous metrics, events, and logs in the Microsoft AVD Broker model for complete observability. This data is a valuable and essential resource for troubleshooting many root-cause issues that impact the logon duration and cause connection errors, etc. The data is correlated within eG Enterprise’s AIOps (Artificial Intelligence for Operations) engine to automate root-cause diagnosis and differentiate secondary symptoms.
eG Enterprise automatically populates the active AVD users in the “User Experience Dashboard”, shown below
Clicking on the username provides a detailed breakdown of the individual user showing the various performance metrics collected for this user and recent alerts generated for this user. This view is particularly useful for help and service desk type workflows when a user has called or reported and issue. As this data is collected by process for every logon it is available for retrospective investigation even if the user session has terminated or the user was unable to report the issue until much later (a common scenario in healthcare when medics prioritize patients above reporting slow logons!).
Below, you can see the user has the following issues:
- Total logon processing duration is high
- FSLogix user profiler container disk has failed to attach to this user
- Group policy processing took considerable time
eG Enterprise shows the following logon metrics for each AVD user session in the “Microsoft AVD Broker” model.
These metrics are computed by the eG Agent automatically for each new AVD user session that logs on into the AVD session host in the host pool. These metrics are available for both Desktop and RemoteApp sessions accessed through either HTML5 or AVD Remote Desktop Clients. As with all data collected by eG Enterprise, the AIOps engine correlates and filters alerts triggered by thresholds configured on key metrics out of the box likely to indicate issues or concerning trends. Thresholds can be manually tuned by administrators or tuned automatically with our machine learning technologies that auto-baseline your systems. In this case both the “Logon duration (Seconds)” and “Total Logon processing duration (Seconds)” have exceeded acceptable thresholds and triggered an alert.
Detailed Diagnosis on “Total sessions” reveals other useful information, such as:
- Azure AD username
- On-Premises username
- Client type
- Client OS
- Client version, and many more
The measure “Total Logon processing duration” shows the overall logon duration of the user session and the breakdown of this metric shown below.
You can see “Logon duration” has consumed most of the duration and eG Enterprise can show the breakdown of the logon duration as shown below.
From the screen above, you can clearly see two measures are having more than 45 seconds and increases the logon duration for this user session.
Two measures are of particular interest here:
- Group policy processing duration
- FSLogix service duration
Since Microsoft manages the Control Plane, it is difficult to troubleshoot why the Group policy and FSLogix took time for this user session. eG Enterprise monitors the session host in the host pool and collects all the important data for troubleshooting. With the help of session host monitoring, eG Enterprise shows you the breakdown of the “Group policy processing duration” when you click the user in the “User Experience Dashboard”.
In the above screen, you can see the Client-Side Extension named “Group Policy Drive Maps” took 43 seconds and increases the “Group policy processing duration” metrics.
eG Enterprises monitors the AVD Session hosts and the impact of group policy processing on them. You can see the metric “Total client-side extension processed time” took 44 seconds for the same user. Also, you can the number of client-side extensions applied and the mode of the Group policy processing which is “Synchronize” and runs all the client-side extensions in foreground.
eG Enterprises also monitors the FSLogix user and office 365 profile containers. With this ability, eG Enterprise detects the user profile container disk for this user has failed to attach and the reason is also captured and presented below.
Reason for user profile container disk failure
FSLogix event log monitoring also reveals the same issue for this issue
eG Enterprise has a AVD logon report and you can get this report at user/session wise for a user. A sample AVD logon report has shown below
Conclusions
By proactively monitoring end-to-end the components involved in the AVD Logon process, administrators can proactively avert issues and avoid users encountering problems. Moreover, by capturing comprehensive data from logs, metrics, and events by process for every single logon, administrators can retrospectively analyze issues and audit their systems and SLAs (Service Level Agreements) and capacity plan.
eG Enterprise is an Observability solution for Modern IT. Monitor digital workspaces,
web applications, SaaS services, cloud and containers from a single pane of glass.
Learn More:
- Azure Monitoring
- Azure AVD (Azure Virtual Desktops) Monitoring
- An Overview of Azure Active Directory (Azure AD) – 101,
- How to monitor and audit Azure AD Users
- Monitoring and Alerting on Azure AD App Client Secret and Certificate Expirations
- Microsoft Monitoring: One-Stop Shop for Everything Microsoft
- Azure AD Audit Logs
- Azure AD Monitoring
- Other technologies supported by eG Enterprise