~dricottone/fmg-timesheets: main.py 8ebab2a421aa9bb58b0f3fd43e517190ba12e752

8ebab2a4 — Dominic Ricottone 2 years ago

Fully functional timesheet parser.

The timesheet parser is a complete success. Some minor issues were
ironed out in the XML parser as well.

Next steps: writing to a time series database and beginning analysis.

ae939a28 — Dominic Ricottone 2 years ago

Goodbye HTML, hello XML

Replaced HTML exporting/parsing with XML exporting/parsing. Also
replaced the 'high-level' function call with 'low-level' pdfminer
usage.

The XML parser handled validation and suppression of header/footer
content on its own.

From the PDF parser, XML is dumped to a file. From the XML parser, CSV
is dumped to a file. The new timesheet parser should read in that CSV
file.

f441822f — Dominic Ricottone 2 years ago

Significant updates

Wrote time sheet parser that ingests and validates all semi-structured
data. Next step is to interpret left styles as dates, so that hours can
be parsed into a time entry object.

Updated HTML parser to more completely filter out unhelpful data, and to
internally build the array of doubles (data and left style).

c041ec57 — Dominic Ricottone 2 years ago

Initial commit