This commit is contained in:
83
devdocs/specs/core-ops-support-info-log.txt
Normal file
83
devdocs/specs/core-ops-support-info-log.txt
Normal file
@@ -0,0 +1,83 @@
|
||||
SYSOPS HEALTH CHECK / METRICS
|
||||
|
||||
OK, considered this and a log is a log and all logs are relevant to sysops people so I'm going to treat all logging the same regardless and make an effort to ensure each log entry
|
||||
is tagged with the relevant class name
|
||||
|
||||
CRITICAL ISSUES
|
||||
- Check for critical issues in a health check periodic job which also logs and metrics
|
||||
- Critical issues should be logged first then sent via notification for system operators if subscribed
|
||||
-
|
||||
|
||||
METRICS
|
||||
- metrics should be gathered in DB and reported on via UI for ops users and potentially in other formats down the road
|
||||
|
||||
|
||||
|
||||
TODO LIST OF THINGS CODED THAT NEED TO BE LOGGED
|
||||
- Items in code tagged with this:
|
||||
- //TODO: core-log-sysop
|
||||
- Generator failures
|
||||
- IJobBiz derived objects failures
|
||||
|
||||
- configuration changes ???
|
||||
- Install and uninstall feature changes
|
||||
- Warnings (low disk space, slowness monitoring, db issues) (during health check JOB??)
|
||||
|
||||
|
||||
"HEALTH CHECK" JOB
|
||||
- things that need to be metric a sized are commented with //OPSMETRIC
|
||||
- Maybe a "health check" job or "checkup" job that periodically asseses things and reports findings
|
||||
- works in conjunction with metrics gathered maybe?
|
||||
- Metrics would be a system that for example could get free disk space then get it again a few days later and project ahead to getting low and warning or simple when down to 10% warn or etc
|
||||
- Anything we'd like to see from a support point of view would be useful too
|
||||
- Go over the research doc to see what was recommended
|
||||
- Dig up that guys example project on his blog that he was going to add metrics to.
|
||||
- Brainstorm a list of recent support issues and what could be a benefit in dealing with them
|
||||
- "Slowness" comes up a lot.
|
||||
|
||||
|
||||
Ops Metrics
|
||||
CONFIRMED REQUIRED
|
||||
- Gather in memory and flush to db on a schedule is best
|
||||
- CASE 3562 If found, count of mismatch of attached files in database vs file system
|
||||
- CASE 3523 Log major ops related configuration changes (before and after snapshot)
|
||||
- CASE 3502 Log feature or route or endpoint usage count as a snapshot metric so can compare month to month.
|
||||
- CASE 3502 Log record count in each table or at least major ones as a snapshot metric so can compare month to month.
|
||||
- CASE 3497 ACTIVE user count - Log user login, last login and login per X period
|
||||
- CASE 3499 "Slow" I want to know if anything is slow, not what the user says but what the code determines
|
||||
|
||||
RESEARCH / IDEAS / EXAMPLES
|
||||
- Metric types:
|
||||
- https://www.app-metrics.io/getting-started/metric-types/
|
||||
- Code example that deals with this issue:
|
||||
- https://github.com/AppMetrics/AppMetrics/tree/dev/src/App.Metrics.Core
|
||||
- Need more than one window into the data, for example we need a last few minutes (5?) view so people can see at a glance what is happening NOW
|
||||
- But also need to know what was it historically. So maybe we need a NOW algorithm but also a HISTORICAL algorithm.
|
||||
- Maybe a sliding scale of recency, so a 5 minute view, a THIS WEEK view and then a month to month view beyond that??
|
||||
- LIBRARIES
|
||||
- Health check Health Checks give you the ability to monitor the health of your application by writing a small tests which returns either a healthy, degraded or unhealthy result.
|
||||
- https://www.app-metrics.io/health-checks/
|
||||
- APP METRICS
|
||||
- https://github.com/AppMetrics/AppMetrics
|
||||
- Different types of metrics are Gauges, Counters, Meters, Histograms and Timers and Application Performance Indexes
|
||||
- METRICS of a system:
|
||||
- Network. Network metrics are related to network bandwidth usage.
|
||||
- System. System metrics are related to processor, memory, disk I/O, and network I/O.
|
||||
- Platform. Platform metrics are related to ASP.NET, and the .NET common language runtime (CLR).
|
||||
- Application. Application metrics include custom performance counters "Application Instrumentation".
|
||||
- Service level. Service level metrics are related to your application, such as orders per second and searches per second.
|
||||
- USEFUL INFO HERE FOR SYSTEM METRICS LIKE MEMORY ETC: This document from Microsoft gives generally accepted limits for things like CPU threshold, memory etc in actual percentages
|
||||
- Section "System Resources" here https://msdn.microsoft.com/en-us/library/ff647791.aspx#scalenetchapt15_topic5
|
||||
|
||||
- USEFUL EXAMPLE dashboard for web applications:
|
||||
- https://sandbox.stackify.com/Stacks/WebApps
|
||||
|
||||
|
||||
- some kind of internal metrics to track changes over time in operations with thresholds to trigger logs maybe?
|
||||
- Has to be super fast, maybe an internal counter / cache in memory and a periodic job that writes it out to DB, i.e. don't write to db metrics on every get operation etc
|
||||
- Average response time?
|
||||
- Busyness / unique logins or tokens in use? A way to see how many distinct users are connecting over a period of time so we know how utilized it is?
|
||||
- Utilization?
|
||||
- Areas / routes used in AyaNova and how often / frequently they are used (we could use this for feature utilization)
|
||||
- CPU peak usage snapshot
|
||||
- Disk space change over time snapshots
|
||||
Reference in New Issue
Block a user