~dricottone/fmg-timesheets: parser/xml.py 8ebab2a421aa9bb58b0f3fd43e517190ba12e752

8ebab2a4 — Dominic Ricottone 2 years ago

Fully functional timesheet parser.

The timesheet parser is a complete success. Some minor issues were
ironed out in the XML parser as well.

Next steps: writing to a time series database and beginning analysis.

ae939a28 — Dominic Ricottone 2 years ago

Goodbye HTML, hello XML

Replaced HTML exporting/parsing with XML exporting/parsing. Also
replaced the 'high-level' function call with 'low-level' pdfminer
usage.

The XML parser handled validation and suppression of header/footer
content on its own.

From the PDF parser, XML is dumped to a file. From the XML parser, CSV
is dumped to a file. The new timesheet parser should read in that CSV
file.