Prévia do material em texto
Dynatrace Certification -Study Notes Welcome to Dynatrace Documentation | Dynatrace Docs Mission Control: (only for managed DT) DT managed cluster hosted in customer env is remotely monitored by DT One team via this . This team helps in troubleshooting DT issues and upgrades (once in 4 weeks). Following info are collected by MC. · Usage and billing information License consumption data containing detailed information of hourly license utilization. See Export licensing data. · Dynatrace cluster health status We gather basic Dynatrace Managed deployment statistics for quick alerting in case of infrastructure issues, and to provide configuration automation. Mission Control gathers such information as number of nodes, status of Dynatrace services, or disk partitions usage. · Dynatrace Cluster events tracking Events like server starts/shutdowns, added/removed nodes, and ActiveGate registrations are tracked automatically. Our Dynatrace ONE team can remotely analyze and address problems or incompatibilities with your Dynatrace server based on system events. If you should ever need to contact Dynatrace ONE, you won't need to collect the required log files for problem details—Mission Control gathers this data for you automatically. · Software updates Dynatrace Managed software updates are mandatory and are typically published every four weeks. You can customize the timing of Dynatrace Managed updates (daily or weekly). Updates are automatically communicated to your users at least 24 hours in advance. Dynatrace Managed updates are fast and allow monitoring to continue seamlessly. Monitored Environment Dynatrace monitoring environment is where all your Dynatrace performance analysis takes place. Dynatrace OneAgent sends all captured monitoring data to your monitoring environment for analysis. A monitoring environment is analogous to an analysis server that provides all Dynatrace application-performance analysis functionality, including all dashboards, charts, reports and other tools. SaaS ME : hosted on DT cloud Managed : hosted on customer DC Different ME can be created based on · deployment env like DEV, QA, PROD · based on geographies (US, APAC, EMEA) ActiveGate: · Acts as a relay between the OneAgent and Dynatrace . Note : ActiveGate is optional · Reduce firewall changes/ Lower network bandwidth · Load Balance · Compress traffic between datacenters and Dynatrace · Run extensions to monitor components that can’t be instrumented directly with a OneAgent · Access sealed network EAG : for SaaS : EAG is dedicated to 1 env and connects with one-agent in same environment. E.g PROD EAG connects to PROD one-agent, PERF EAG connects to PERF one-agent. CAG : for Managed : CAG connects to all one agents in all envs and routes traffic . · EAG is bound to a specific environment, and thus only handles traffic from OneAgent instances that belong to the same monitoring environment. Also, it can handle traffic from OneAgents, not from other ActiveGates · EAG by default connects to a DT Cluster, setting up CAG in between needs special setup · CAG can connect to multiple one-agents in multiple environments and converge that traffic to one Dynatrace cluster · CAG can receive traffic from both directly One-agent and EAG · CAG is installed via CMC and EAG is installed from monitoring environment · CAG is not used for SaaS AG deployment Strategy : Managed deployments | Dynatrace Docs ( Ind irect conn via EAG ) ( Direct conn with one-agent ) ( Ind irect conn via EAG ) Additional capabilities of AG: • Monitors AWS, Azure, vCenter, Kubernetes, Pivotal Cloud Foundry using API-both • Store memory dumps -both • Run synthetic tests from within own (private) network (synthetic test can also be simulated from DT public locations all over the world) (only CAG) • Receive Real User Monitoring beacons /agentless RUM (only CAG) • Run extensions (only EAG) • Mainframe-zOS monitoring (only EAG) DT hierarchy Application Service Process Host Datacenter One Agent Can be set up at host, Process Groups and Application level OneAgent can communicate directly to Dynatrace cluster (port 443) or it can communicate via Dynatrace ActiveGate (port 9999) Installation options: Full stack (DEFAULT): Infra only + distributed trace + micro-service/code level+ CPU and Memory profiler Infrastructure only: Metrics, log files/events, Process (for Legacy apps), Containers , Network, Servers How to stop OneAgent ? UI : Settings > Monitoring overview > select host, Process Groups and Application and slide Monitoring switch to the Off CLI : Windows : net stop "Dynatrace OneAgent" Linux : service oneagent stop How to stop Monitoring? Hosts: Hosts > Setting > Monitor this host (On/Off) Hosts > Setting > Full Stack Monitoring (On/Off) Setting > Monitoring overview > Hosts > Monitor this host (On/Off) Application: Setting > Monitoring overview > Application > Monitoring (On/Off) Process Group: Setting > Monitoring overview > Process Group > Monitoring (On/Off) Monitoring Overview has 3 tabs: Applications/Hosts/Process Groups OneAgent update : OneAgent auto-update is enabled by default but can be disabled globally, per host group, or per host.but this is configurable at the global, host group, and host level. At each level, you can choose one of the following OneAgent update strategies: · Update automatically, regardless of maintenance windows. · Update automatically during a selected maintenance window. · Do not automatically update. With this option, we don't notify you when instances of OneAgent are outdated. Additionally: · At the host group level, you can choose to inherit the global OneAgent update settings. · At the host level, you can choose to inherit the host group OneAgent update settings. Override Order Host settings > host group settings > global settings. Host groups A host can be assigned to a host group during or after OneAgent installation. Host groups are displayed, for example, on the Monitoring overview page Host groups can be used for to the following in bulk · OneAgent updates (Global update , Don’t update all hosts in this host-groups, update all hosts in this host-groups) · Anomaly detection Set GLOBAL anomaly thresholds without overriding them. Also Host groups can also be used in tagging rules to apply @ host/process-group/service level · Host groups can be changed/re-assigned only re-installing one-agent or One agent CTL · A host can ALWAYS belong to 1 host-group Process groups Dynatrace automatically merges related processes into process groups. A “process group” is a logical cluster of processes that belong to the same application or deployment unit and perform the same function across multiple hosts. e.g all Java processes is grouped as Java process group. Processes are host-centric, associated with a single machine in your environment Process-groups are not single host-centric, associated with multiple machines in your environment A connection btwn 2 processes ages out and is no longer shown in Smartscape if has been inactive for more than 72 hours and shown as dashed line if inactive > 2 hrs but < 72 hrs Process Detection : Settings > Processes and containers > Process group detection section, which has the following pages: · Edit(On/Off) built-in rules: On the Built-in detection rules page, you can enable or disable specific process group detection toggles. Hover over the info icon to the right of each toggle for details. · Own customized rule : On the Simple detection rules and Advanced detection rules pages, you can add your own process group detection rules, which will override the default ones. · Unsupported tech : On the Declarative process grouping page, you can monitor specific processes of a technology that is unknown to Dynatrace. Valuable process group types that Dynatrace reports on by default: • Application processes (for example, Java, .NET, Node.js, PHP and Go) • Web server processes (for example, Apache and IIS) • Databases processes (for example, MSSQL, Oracle, MySQL and Cassandra) • Processes that have an open TCP listening port • Processes for which CPU, memory usage or network traffic exceeds 5% within 3 samples taken within 5 minutes Process KPIs: Process CPU • JVM metrics • AppServer metrics • System performance • Web server requests Enable automatic deep monitoring By default, automatic deep monitoring is set to On to enable Dynatrace OneAgent to run deep monitoring on all detected processes . Exceptions : .NET and Go processes Enable automatic deep monitoring doesn’t take precedence over any individual process monitoring rules you may have set up. If a process monitoring rule indicates that Dynatrace should monitor a certain process, and Enable automatic deep monitoring is Off, the individual rule will take precedence and Dynatrace will monitor the respective process. Therefore, each process monitoring rule is an exception to the general monitoring policy. Custom rules can be also set (Processes and containers > Custom process monitoring rules.) to exclude a process from monitoring e.g you can create a rule that OneAgent shouldn't be injected into any process in Cloud Foundry spaces that contain the string customer Services Services cover • Web requests/http • Web services/http • Database Calls/jdbc • Message Queues/jms • Custom Services Define custom service for unsupported tech-stack (supported stack: Java, .NET, PHP, Go, and Node.js ). Java and PHP support real time update which does not require RESTART for any change to take effect-Setting > Serverside service monitoring >Deep Monitoring > Enable Real time update Service flow (downstream view) provides an overview of all services and queues that a selected service makes requests to and the time spent within those services • Understand the call chain sequence of a service • View all the response time contributors for a service • View affected tiers during active or resolved problems Service backtrace (upstream view) • Understand what services call the selected service • Analyze the performance of a service from the perspective of the calling clients Custom service detection Dynatrace automatically detects and monitors most server-side services in your environment with no configuration required. If your application doesn’t rely on standard frameworks like (Java, .NET, PHP, Go, and Node.js )., · (Java, .NET, PHP, Go, and Node.js) you need to define them as custom services. With a custom service you can instruct Dynatrace which method, class to be instrumented . · (Unsupported tech-stacks) Not monitored Opaque service: Unsupported technology , code level visibility not available BUT can be detected by requests made from calling service . Service : Request ===================================================================================== Key request: Make specific dynamic URL/API calls as key. Once done, · it can be pinned to Dashboard, (not all views/charts) · Custom Anomaly detection rules baseline can be set (available ONLY for key request) · Can update Apdex threshold @ request/action level (available ONLY for key request) · Request history is retained for 10 days with 10-second granularity Request attribute: Capture key-value attributes from URL, HTTP req headers and Metadata Application/RUM With One-agent A small piece of JavaScript code needs to be placed inside the HTML header which is handled automatically by the OneAgent if installed on the web server serving the applications’ web pages. When One-agent can’t be installed (due to No ROOT access/No supported tech-stack/3rd party hosted sites) · Agentless RUM: Use when one-agent can’t be installed (manual JS injection in every page HTML header) · Browser extension: Use when no access to HTML page (for SaaS applications) Define conversion goals You can define conversion goals for specific user actions to understand how successfully you're meeting your conversion milestones—for example, successful checkouts, newsletter signups, or demo signups. You can add goals for reaching specific user actions or destination URLs and also for session information, for example, sessions with more than 10 user actions. For this example, a user would need to complete at least 10 user actions in a single session to reach this conversion goal. You can define a maximum of 20 conversion goals per application. · Conversion goals: User action name, number of user actions, destination, session duration · An action can be set as both conversion goal and Key user action User actions: · Load action (full page load in response to an user event), · XHR action (API calls within page), · Custom action (loading of a particular JS func load) KPI : User action duration, Visually complete, Speed Index, Dom interactive , HTML downloaded, Time to first byte, … Appdex Apdex rating floats between 0 and 1: • between 0.94 to 1.0 equates to Excellent performance • above 0.85 equates to Good performance • between 0.7 and 0.85 equates to Fair performance • below 0.7 equates to Poor performance • below 0.5 is considered Unacceptable Note : User actions with JavaScript errors and HTTP errors are reported as “Frustrating” irrespective of Apdex score User sessions A user session is a group of user actions that are performed in an application during a limited period of time. Each user session geolocation is detected from the IP address. In web applications IPs are automatically identified from web requests HTTP headers Session Identification: For mobile apps, Dynatrace identifies individual users based on the specific mobile device they use. For web applications, user identification is achieved by storing a persistent cookie within each user's browser The extended users concept covers use cases where multiple users or user tags share one device Live vs Active : · A live session is a user who was active once before, but whose session has not yet been ended. · An active session is a user who has been confirmed still active at a given time i.e doing some action in selected time-frame Dynatrace stores information in session cookies and local storage. This enables the grouping of user actions into one user session. This information is erased when the following occurs: · The browser is closed (** not the browser tab) · The user clears their browser cookies /local storage · After 35 minutes of browser inactivity. (MOB :10 min) · When the session duration reaches 8 hours. (MOB: 6 hours) · Via the RUM JavaScript API by calling the dtrum.endSession() function. Once a user session has about 200 user actions, a new session is created and all subsequent user actions are included in the new user session In Dynatrace , a user session is visible in the search within 4 minutes. However, this could sometimes exceed to 10.5 minutes This data includes all user actions and high level performance data. Using either the Dynatrace API or Dynatrace User Sessions Query Language (USQL) Session details screenshot Session Replay You can record your customers' interactions with your web application and replay each click and action in a movie-like experience (however it is recorded as a series of events in the DOM of the page & replayed like a movie). Session Replay also makes it easy for your QA teams to reproduce production issues, which your developers can use to bridge the gap between code and user experience All user inputs are masked by default. Contents that can be masked are · Form fields · Password fields · Content collected from masked input fields on previous pages · Hidden attribute Session replay doesn’t capture · Frames · Shadow DOM · Canvas · WebGL · Web components · Plugins, such as Adobe Flash Player, Java applets, and non-HTML technologies · Documents, such as PDF files and Word documents · Pseudo CSS files · Movie replay 1. Session replay Script time-out= 5 min 2. Session replay Event time-out = 60 sec Use of browser cookies in DEM 1. Track user behavior 2. Monitor site performance and usage Mobile RUM iOS : CocoaPods,Swift, Carthage, Oneagent SDK Android : Groovy, Kotlin , OneAgent SDK Synthetic Monitoring Browser monitors • Run from any of the 70+ Dynatrace global locations or own private location running on an ActiveGate • Frequency up to every 5 minutes • Single-URL browser monitor is the equivalent of a simulated user visiting the application webpage using a modern, updated web browser. • Browser clickpaths are simulated user visits that monitor the application’s business critical workflows. For Browser clickpaths : install Dynatrace Synthetic Recorder (a Google Chrome browser extension) HTTP monitor • Uses simple HTTP requests • Frequency up to 1 minute • It can be used to check website / API endpoint availability • Executed from an ActiveGate and requires a special ActiveGate configuration (Linux-only By forcing recording in incognito mode, we can now guarantee that no browser data such as cookies or sessionStorage/localStorage data will influence clickpath recording.Inognito mode will improve the quality of clickpath recordings and ensure that each clickpath is executed in the same way across all synthetic locations. Baseline/Threshold Traffic spike/drop detection rules needs -1 week Error and slowdown detection needs- 20 % of 1 week Once established it is evaluated over 5-min (for fast changing values) and 15-min (for slow changing values) sliding time interval Automatic baseline is available within 2 hours of service /application detection Application baseline calculates values of these dimensions 1. Geo-location 2. Browser 3. User action 4. OS Service baseline calculates values of these dimensions 1. Service Methods ( Memory 80% ) ( CPU 95% ) ( GC 40% ) DB/Service/Application - Anomaly detection rules Anomaly detection rules · response time degradations, · increases in failure rate, · service load drops/ spikes, · failed database connects (only for DB) · KPI degradation (only for Application). KPI= Visually complete, Speed Index, Dom interactive , HTML downloaded etc Infra only · Host · Host-group · Disk Problems Dynatrace monitors certain metrics against auto-created baselines or fixed thresholds and reports problems as soon as a breach is detected. David does the RCA and Impact analysis. Problem severity 1. Monitoring unavailable: (OneAgents lose connection with the Dynatrace server) 2. Availability 3. Error 4. Slowdown 5. Resource 6. Custom (not analyzed by Davis) 7. Info (no alert) e.g S/W updates, config changes,deployment During its lifespan, a problem might raise its severity level. e.g a problem might begin in slowdown level (3) and then be raised automatically to availability level (1) when an outage is detected Alerts Use alert profiles to filter problems · Severity · Management Zones · Tags · Duration of a problem: Push notifications via the Dynatrace mobile app | Dynatrace Docs : MBOL can push alerts both for SaaS and Managed (fronted by an internet accessible cluster gateway) Baselining/Thresholding - Problems - Alerts Reporting · Service quality reports – based on problems · Availability reports – based on synthetic Weekly subscribers receive the report every Monday at midnight. Monthly subscribers receive the report on the first Monday of the month at midnight. Availability reports are available after the first weekly run (generated on Mondays after 00:00 or 12:00 AM). Service quality reports Application score: The Application score is the average of your application Apdex value and the percentage of user actions that are not affected by problems. Services score: This score represents the percentage of service calls that were successful and unaffected by problems. Shows Infrastructure score: This score is the percentage of host time during which no problems were encountered. Availability reports · Percent availability · Performance (in seconds) · Number of outages · Total downtime · Number of synthetic monitors run · Slowest geo-locations · Slowest day Network Monitoring By default, your homepage includes the Network status tile that shows you three key overall network health metrics: Processes, Hosts and Volume. The Environment detail section of the Network page consists of three tabs: Hosts, Interfaces, and Processes. DT does network monitoring both at process & host level and the KPIs are · Traffic · Connectivity · Retransmission networks Settings > Monitoring overview and on the Hosts tab, clicking the Monitoring: Off/On . Retransmission rates should not exceed 0.5% on local area and 2% in Internet & cloud-based networks. Retransmission rates above 3% negatively affect user experience in most modern applications. If network monitoring overhead increases above 5% of available CPU, Dynatrace, throttling occurs. The network module is then paused for slightly less than 3 minutes. After this time, the network is re-enabled Detection Rules DB is detected by Dynatrace using · DB IP address & port · DB schema · DB vendor Service is detected by Dynatrace using · Web-server name · Web-app ID · Context root Maintenance windows: 1. Once (only specify time range) 2. daily 3. weekly (one day during specified time window) 4. monthly (one day during specified time window) For recurring 2/3/4 specify BOTH time range & date range (start and end date) You can opt to Disable synthetic monitor execution during a maintenance window. During the maintenance window, HTTP and browser monitors within the scope of the maintenance window are not executed. Actions : Intervals/Period- Things to remember 114,528,1400,15 Session and RUM data is kept for 35 days ( SaaS cluster is updated every 2 weeks ) Log Monitoring Log retention: 5-90 days Log data ingestion is limited by default to 10,000 log events per minute per cluster. Dynatrace automatically discovers, analyzes, and stores logs every 60 seconds. Logs can be auto-discovered by Oneagent OR can be tagged to Process Groups Turn off log autodiscovery If you don't want Dynatrace to automatically discover new log files on a specific monitored host, you can turn off log autodiscovery. 1. On the host, open the log analytics configuration file for editing. · On Linux: /var/lib/dynatrace/oneagent/agent/config/ruxitagentloganalytics.conf · On Windows: %PROGRAMDATA%\dynatrace\oneagent\agent\config\ruxitagentloganalytics.conf 2. Set the following: AppLogAutoDetection = false OneAgent restart is not required. Log Monitoring API - POST ingest logs Pushes custom logs to Dynatrace. This endpoint requires an ActiveGate with the Log analytics collector module enabled. This module is enabled by default on all of your ActiveGates. DB Monitoring Database insights (infra level monitoring supported only for Oracle) Currently we provide database insights for Oracle databases. Database insights provides an infrastructure monitoring perspective into your database, Extension-based monitoring Dynatrace monitors a number of database technologies out of the box with your OneAgent deployment, as well as using ActiveGate extensions you can run from an ActiveGate connecting remotely to your database server. Supported out of the box by Oneagent Cassandra Couchbase CouchDB PostgreSQL Redis Extension required (via EAG) IBM DB2 Microsoft SQL MySQL SAP HANA Additional Notes · Mask IP address, GPS location, personal data in URI, user actions, user tracking, opt-in mode and Do not track · Host health statistics (CPU, Memory, Disk, or NIC) · OneAgent auto-update is enabled by default but can be disabled globally, per host group, or per host. · Conversion goals: User action name, number of user actions, destination, session duration · Mobile KPI: App usage : user experience, crashes, network performance, called services Every Problems have a Business Impact Analysis section only if Screenshots Network One-agent Active Gate Mobile app KPIs Users/ApDex, Request/Errors, Errors/Crashes/Called Service Applications Page screenshot Name ApdexLoad Action XHR Action Errors 3rdparty/CDN DB & Service screenshot Hosts Hosts – detailed KPIs (Full Stack) SLOs inbuilt Goutam Sarkar