2018-06-28 23:41:48 +00:00
commit 515bd37952
256 changed files with 29890 additions and 0 deletions
--- a/devdocs/specs/core-ops-support-info-log.txt
+++ b/devdocs/specs/core-ops-support-info-log.txt
@@ -0,0 +1,83 @@
+SYSOPS HEALTH CHECK / METRICS
+
+OK, considered this and a log is a log and all logs are relevant to sysops people so I'm going to treat all logging the same regardless and make an effort to ensure each log entry
+is tagged with the relevant class name
+
+CRITICAL ISSUES
+- Check for critical issues in a health check periodic job which also logs and metrics
+- Critical issues should be logged first then sent via notification for system operators if subscribed
+- 
+
+METRICS
+- metrics should be gathered in DB and reported on via UI for ops users and potentially in other formats down the road
+
+
+
+TODO LIST OF THINGS CODED THAT NEED TO BE LOGGED
+- Items in code tagged with this: 
+    -  //TODO: core-log-sysop
+- Generator failures
+- IJobBiz derived objects failures
+
+- configuration changes ???
+- Install and uninstall feature changes
+- Warnings (low disk space, slowness monitoring, db issues) (during health check JOB??)
+
+
+"HEALTH CHECK" JOB
+- things that need to be metric a sized are commented with //OPSMETRIC
+- Maybe a "health check" job or "checkup" job that periodically asseses things and reports findings
+- works in conjunction with metrics gathered maybe?
+   - Metrics would be a system that for example could get free disk space then get it again a few days later and project ahead to getting low and warning or simple when down to 10% warn or etc
+- Anything we'd like to see from a support point of view would be useful too
+- Go over the research doc to see what was recommended
+- Dig up that guys example project on his blog that he was going to add metrics to.
+- Brainstorm a list of recent support issues and what could be a benefit in dealing with them
+- "Slowness" comes up a lot.
+
+
+Ops Metrics
+    CONFIRMED REQUIRED
+    - Gather in memory and flush to db on a schedule is best
+    - CASE 3562 If found, count of mismatch of attached files in database vs file system
+    - CASE 3523 Log major ops related configuration changes (before and after snapshot)
+    - CASE 3502 Log feature or route or endpoint usage count as a snapshot metric so can compare month to month.
+    - CASE 3502 Log record count in each table or at least major ones as a snapshot metric so can compare month to month.
+    - CASE 3497 ACTIVE user count - Log user login, last login and login per X period
+    - CASE 3499 "Slow" I want to know if anything is slow, not what the user says but what the code determines
+
+    RESEARCH / IDEAS / EXAMPLES
+    - Metric types:
+        - https://www.app-metrics.io/getting-started/metric-types/
+    - Code example that deals with this issue:
+        - https://github.com/AppMetrics/AppMetrics/tree/dev/src/App.Metrics.Core
+    - Need more than one window into the data, for example we need a last few minutes (5?) view so people can see at a glance what is happening NOW
+        - But also need to know what was it historically.  So maybe we need a NOW algorithm but also a HISTORICAL algorithm.
+        - Maybe a sliding scale of recency, so a 5 minute view, a THIS WEEK view and then a month to month view beyond that??
+    - LIBRARIES 
+        - Health check Health Checks give you the ability to monitor the health of your application by writing a small tests which returns either a healthy, degraded or unhealthy result. 
+            - https://www.app-metrics.io/health-checks/
+        - APP METRICS
+            - https://github.com/AppMetrics/AppMetrics
+    - Different types of metrics are Gauges, Counters, Meters, Histograms and Timers and Application Performance Indexes
+    - METRICS of a system:
+        - Network. Network metrics are related to network bandwidth usage.
+        - System. System metrics are related to processor, memory, disk I/O, and network I/O.
+        - Platform. Platform metrics are related to ASP.NET, and the .NET common language runtime (CLR).
+        - Application. Application metrics include custom performance counters "Application Instrumentation".
+        - Service level. Service level metrics are related to your application, such as orders per second and searches per second.
+    - USEFUL INFO HERE FOR SYSTEM METRICS LIKE MEMORY ETC: This document from Microsoft gives generally accepted limits for things like CPU threshold, memory etc in actual percentages
+        - Section "System Resources" here https://msdn.microsoft.com/en-us/library/ff647791.aspx#scalenetchapt15_topic5
+
+    - USEFUL EXAMPLE dashboard for web applications:
+        - https://sandbox.stackify.com/Stacks/WebApps
+
+
+    - some kind of internal metrics to track changes over time in operations with thresholds to trigger logs maybe?
+    - Has to be super fast, maybe an internal counter / cache in memory and a periodic job that writes it out to DB, i.e. don't write to db metrics on every get operation etc
+    - Average response time?
+    - Busyness / unique logins or tokens in use?  A way to see how many distinct users are connecting over a period of time so we know how utilized it is?
+    - Utilization?
+    - Areas / routes used in AyaNova and how often / frequently they are used (we could use this for feature utilization)
+    - CPU peak usage snapshot
+    - Disk space change over time snapshots