~dricottone/fmg-timesheets

ref: 6899c67fa4f55e9d7a82f6b2cd780bc43185f3fc fmg-timesheets/main.py -rw-r--r-- 1.1 KiB
With data scraping complete, moving on to analysis

Basic summation of projects, as proof of concept

Basic SAS program for importing CSV data and storing as time series data
Adding exporters

Wrote and tested the long CSV exporter. Stubbed out the JSON exporter.
Fully functional timesheet parser.

The timesheet parser is a complete success. Some minor issues were
ironed out in the XML parser as well.

Next steps: writing to a time series database and beginning analysis.
Goodbye HTML, hello XML

Replaced HTML exporting/parsing with XML exporting/parsing. Also
replaced the 'high-level' function call with 'low-level' pdfminer
usage.

The XML parser handled validation and suppression of header/footer
content on its own.

From the PDF parser, XML is dumped to a file. From the XML parser, CSV
is dumped to a file. The new timesheet parser should read in that CSV
file.
Significant updates

Wrote time sheet parser that ingests and validates all semi-structured
data. Next step is to interpret left styles as dates, so that hours can
be parsed into a time entry object.

Updated HTML parser to more completely filter out unhelpful data, and to
internally build the array of doubles (data and left style).