Files
raven/devdocs/specs/core-reporting.txt
2021-06-07 18:17:06 +00:00

284 lines
15 KiB
Plaintext

REPORTING SPECS
##################################
2020-09-25 OUTSTANDING REPORTING STUFF
This is stuff that may or may not be relevant in future, keeping here from my notes
REPORTING STUFF THAT IS FUTURE OR ON HOLD
MOVE THIS TO A v.next case or into reporting SPEC doc for future reference if not a coding issue now
todo: bizrule for report scale value at server
and maybe other shit as well
todo: Need a setting that warns people about printing too much data, i.e. "That's a lot of data, are you sure you want to render that, it will be slow" or something to that effect
todo: REPORT BACKEND - report delete throws server exception about db context re-use or something to that effect
No exact steps top repro, but happens for Joyce with win64 release build if go in and edit same template a few times from widget list then attempt to delete it
try to repro on debug build figure it out, could be a big issue if it's not specifically a report issue but wider biz object issue
COULD NOT REPRO HERE
todo: (On hold pending testing) pdf options UI and passthrough
OUTSTANDING
Docs
DOCS reference pages
https://stackoverflow.com/questions/49943479/puppeteer-header-and-footertemplate-doesnt-work#49996999
https://pptr.dev/#?product=Puppeteer&version=v5.3.0&show=api-pagepdfoptions
todo: document about the troubleshooting section items here if applicable:
https://jsreport.net/learn/chrome-pdf
which may or may not apply in our case
Testing
DONE
basically need to be able to select every option and send it through
options:
http://www.puppeteersharp.com/api/PuppeteerSharp.PdfOptions.html
http://www.puppeteersharp.com/api/PuppeteerSharp.Page.html#PuppeteerSharp_Page_PdfAsync_System_String_PuppeteerSharp_PdfOptions_
https://pptr.dev/#?product=Puppeteer&version=v5.3.0&show=api-pagepdfoptions
page numbers control
Test this, it might do what we need as it has a template for pdf footer and page number is part of it
http://www.puppeteersharp.com/api/PuppeteerSharp.PdfOptions.html
look at jsreport what do they include in their pdf post processing parameters and capabilities
need to add pdfkit or whatever it's called at the front.
todo: I have console logging capture code now in backend, but it's doing nothing really, just logs if exception thrown
it might be handy if it returned the log value and any other diagnostic info with render return data when user is in designer
i.e. have a further property Diagnostic bool and if set then returns diagnostic data
and return property with pdf name would be in an object with additional properties for diagnosis etc
But only if it proves helpful or necessary
mainly this would allow a console log or error or trace to flow back to the user from the script being run at the server
//before getting into timeouts and shit make sure it's running as well as can be in docker
todo: look at guidance for running puppeteer (js) on alpine docker here: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#running-on-alpine
how jsreport is launching headless chrome, i.e. which settings and flags etc
https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#puppeteerlaunchoptions
https://jsreport.net/learn/chrome-pdf
https://github.com/jsreport/jsreport-chrome-pdf/blob/master/lib/conversion.js
(after looking at it, it's still a bit unclear, maybe not relevant as they seem to do a lot differently for that)
todo: look over this: https://github.com/puppeteer/puppeteer/issues/1834
todo: REPORT BACKEND - more timeouts for report rendering, hard kill after 30 seconds tops but adjustable interval maybe?
each process takes time in part, need to see which is taking which time the most and killable
i.e. run a huge report and see which exact step takes which time for each
we have the pre-render timing out maybe need more for each step where it's vulnerable to crashing / timing out
Devops droplet is overwhelmed by 10k records of widgets using the customfields example report
symptom is super high cpu usage (100%) pegged and probably virtual memory usage as well
timeout is kind of a hard core way to work around this issue, maybe instead it should be looking at excessive cpu and memory usage?
The metrics don't catch it because it happens too quickly for the metrics lifecycle
I want it to be able to handle it gracefully without crashing
Looks like jsreports has this issue too and they hard kill the process I think if necessary, but they actually re-use the same instance for reporting I think.
If it's not an issue then no need to resolve at the moment
UPDATE: No way to timeout the pdf generation built into puppeteer sharp so far, maybe some workarounds
I think the nuclear option would be to start timer once we know the process id of the chormium instance and that time automatically kills that process ID if still found around XX seconds later
(or in the form of a clean up job maybe like the temp file deleter job)
todo: reporting load test
test locally with 20k widgets, make it crash then determine what limits to set on it and properly return error when it's exceeded
right now it just says something about puppeteer and "crash"
NOT ABLE TO REPRO HERE
todo: MEMORY / PERFORMANCE / CRASHING
BOTTOM LINE:
It *sometimes* can't render at 5k records lots of pages on devops but mainly it does around 2.5 to 3 minutes avg, same on dev ws takes 1.5 m avg but no crash
uses a lot of ram, also cpu does get pegged by chrome process at 95% so it's a bit of both I guess
If users have issue with server during reporting they should check ram and cpu
a minimal droplet appears fine for normal workload, it's the big report rendering that's the most inefficient
look at proper settings for rendering in docker on linux and try to make it resilient to crashing even if set too low or too little resources
IDEAS:
check guidelines for running in docker from link below
check command line params for best fit
see about re-using same or whatever
check cpu, memory usage upon render request, reject immediately if the system is overburdened
check for running chromium processes or some way to determine if in the middle of the last render for someone
Try a method to zap all chromium processes as a test to see if they *can* be killed whil stuck this way
To test use the "custom date time format helpers" with 5k records, that reliably freezes everything even when others all seem to run ok
(this in itself is curious but whatever)
What I'm seeing is on a long render it returns a 504 gateway timeout to the client but it's still churning away in the background
5k records and "custom date time format helpers" report will take about 3 minutes to render, maybe less and it will crash out for the client return 504 but then it will complete at some point
Confirmed it *does* complete because I was able to download the 4.12mb 2144 page pdf manually from the server temp folder via filezilla once I saw the cpu go down again
and chrome process stop
90% memory is the max used in do graph panel, probably docker is not letting it take all or something isn't
maybe swapping out is what's happening, peak memory usage seems to be half a gig or so but c
97% cpu usage is max
I'm thinking it's a memory issue more than a cpu issue because the cpu is hardly pegged at all while it's rendering right up until it appears to run out of memory
then it starts swapping and all hell breaks loose and the cpu pegs on the swap daemon
Need to test by moving up to 2gb of memory witha resize but keep the single vcpu and see what's what
also increase the timeout in nginx for the reverse proxy to wait
A temporary rendering caused memory shortage should not cause the system to come to a standstill
todo: memory usage and timeout is directly related to the amount of space taken up physically on the page
it's NOT related to the helpers as just putting static text on the page causes the same issue
it's memory taken to render to pdf probably and a byte is a byte even if it's blank white page
=-=-=-=-=-
########## ORIGINAL SPECS ###################
CASES
1734 - REPORTS:GRIDS: - grid filter name and summary of filter criteria available as fields to print on report
REQUIREMENTS
- All v7 reports ported to RAVEN
- ALL Fields even the ones that don't show on the report but are available for adding to a report in the editor need to be available
- REPORTS
- Report object has following properties:
- DataList name it's based off of
- Required fields from DataList
- Report template itself with it's own code and template requirements TBD
- Report columns returned: When user selects to show a report, client will fixup any missing columns from the datalistview currently in use
- For example they are viewing a table based on a TestWidgetDataList DataListview with only 3 columns in it
- They drop down the reports list which shows all reports based off TestWidgetDataList view
- They select a report to print.
- Report code looks at report's required fields from DatalistView and sees report uses 6 fields listed
- Code compares report fields to in use DataListview fields and appends any report required fields missing from current view to the right of the collection in the current DataListview
- When report is run it will have all fields this way returned but will still be sorted and filtered by table view
- As part of editing process user can select an existing datalistview to prime their report editing view
- A report can be selected from any client table that is based on the same view
//=======================
USEFUL REPORTING RELATED LINKS
https://github.com/jsreport/jsreport-core
https://github.com/jsreport/jsreport-core/blob/master/lib/render/engineScript.js
//actual render here
https://github.com/jsreport/jsreport-chrome-pdf/blob/d3fe318aac3628d8cb62f86f8f71314f21745798/lib/conversion.js
//PDF utils
https://github.com/jsreport/jsreport-pdf-utils
They use a Mozilla library called pdfjs and their utils are basically just wrappers around using it
https://github.com/mozilla/pdf.js
hub to docs here:
https://mozilla.github.io/pdf.js/
This is the jsreport designer libs used for reference: https://github.com/jsreport/jsreport-studio/blob/master/package.json
https://jsreport.net/learn/api
Report templates pre-designed and open source: https://github.com/wildbit/postmark-templates
HTML -> PDF
JSREPORT has a comparison table of various html to pdf tools here:
https://jsreport.net/learn/pdf-recipes
Headless Chrome
https://github.com/jsreport/jsreport-chrome-pdf
FAST SPEED (according to jsreport docs)
jsreport uses headless chrome by default which has built in pdf from html ability.
they use a NODE library Puppeteer for it, but there is a c# wrapper for .net core linux windows mac: https://github.com/hardkoded/puppeteer-sharp
some kind of example that may be relevant: https://github.com/kblok/netconfar-puppeteer-sharp-demo/blob/master/hacking-the-browser-api/Controllers/MediumController.cs#L13
Maybe not the only one for c# core, need to dig around
Issues:
Issue with header / footer not being settable apparently which is a big breaking issue for many biz reports usage
someone said that another pdf tool can be used to set those post processing but fuckery abounds
other solutions below apparently don't have this issue.
Even jsreport has listed workarounds and tools to resolve this
Update: apparently there are ways:
https://stackoverflow.com/questions/44575628/alter-the-default-header-footer-when-printing-to-pdf?noredirect=1&lq=1
see last comment seems relevant, also other linked cases all mention various things. Finally, could use a pdf writer tool to post process maybe.
There seem to be many potential issues with missing libraries, rights and sandbox and etc etc etc on linux
These things kind of turned me off this a bit, it's not plug and play and simple
LINKS:
https://github.com/hardkoded/puppeteer-sharp
https://github.com/puppeteer/puppeteer
https://stackoverflow.com/search?q=puppeteer-sharp
https://stackoverflow.com/questions/62042078/puppeteer-sharp-for-server-side-html-to-pdf-conversions
https://www.singlestoneconsulting.com/blog/how-to-generate-server-side-pdf-reports-puppeteer-d3-handlebars/
https://github.com/hardkoded/puppeteer-sharp/issues/1510 - shows being used by someone other than jsreport which buries all the details in endless libs
https://github.com/hardkoded/puppeteer-sharp/issues/1514
WeasyPrint
https://github.com/jsreport/jsreport-weasyprint-pdf
SLOWEST SPEED
https://github.com/Kozea/WeasyPrint based on python, does it's own rendering doesn't rely on a web browser engine like the rest
free, recommended by wkhtmltopdf author as an alternative
May be slow, slower than the other options likely, has some installation steps that are a bit convoluted but ironically only for windows as it's included in package managers
Very good support for modern css3 PAGE properties apparently
WRAPPER
https://github.com/balbarak/WeasyPrint-netcore/blob/master/src/Balbarak.WeasyPrint/WeasyPrintClient.cs
wkhtmltopdf
https://github.com/jsreport/jsreport-wkhtmltopdf
MEDIUM SPEED
https://wkhtmltopdf.org/downloads.html
Well used, old based on older webkit so doesn't support css3 but likely enough for our purposes
has an easy installer for all platforms
free
Has warnings about how unsanitized html can take down a server or own it somehow
WRAPPERS
https://github.com/carloscds/HtmlToPDFCore/tree/master/HtmlToPDFCore This one looks cool, all platforms supported includes binary possibly?
https://blog.elmah.io/generate-a-pdf-from-asp-net-core-for-free/
HTML -> DOCX
This is a possiblity that needs to be researched, instead of pdf go docx which is in theory multi platform and openable on other devices? Not sure
https://github.com/jsreport/jsreport-docx
https://github.com/EricWhiteDev/Open-Xml-PowerTools
HTML -> XLSX
https://github.com/jsreport/jsreport-xlsx
HTML -> TEXT
https://github.com/jsreport/jsreport-html-to-text
https://github.com/jsreport/jsreport-text
TEMPLATE ENGINE
https://github.com/jsreport/jsreport-handlebars
Handlebars by default for jsreport which is easy peasy to work with
PDF META DATA EDITING
https://github.com/jsreport/jsreport-pdf-meta
Render outputs
JSReport renders to different outputs, they call it recipes https://jsreport.net/learn/recipes
HTML - just outputs as html for viewing in the browser / printing from browser
PDF: https://jsreport.net/learn/pdf-recipes
Outputs 5 different pdf converters because they all support different feature sets which is ominous
BAR CODE STUFF
Bar codes: https://github.com/metafloor/bwip-js
https://stackoverflow.com/questions/19017512/use-canvas-inside-a-handlebars-template
https://github.com/metafloor/bwip-js#browser-usage
https://www.scandit.com/blog/types-barcodes-choosing-right-barcode/
https://github.com/metafloor/bwip-js/wiki/BWIPP-Barcode-Types
let opt = {
bcid: "code128", // Barcode type
text: "0123456789", // Text to encode
scale: 3, // 3x scaling factor
height: 10, // Bar height, in millimeters
includetext: true, // Show human-readable text
textxalign: "center" // Always good to set this
};