Fully functional timesheet parser.
The timesheet parser is a complete success. Some minor issues were
ironed out in the XML parser as well.
Next steps: writing to a time series database and beginning analysis.
Goodbye HTML, hello XML
Replaced HTML exporting/parsing with XML exporting/parsing. Also
replaced the 'high-level' function call with 'low-level' pdfminer
usage.
The XML parser handled validation and suppression of header/footer
content on its own.
From the PDF parser, XML is dumped to a file. From the XML parser, CSV
is dumped to a file. The new timesheet parser should read in that CSV
file.
Significant updates
Wrote time sheet parser that ingests and validates all semi-structured
data. Next step is to interpret left styles as dates, so that hours can
be parsed into a time entry object.
Updated HTML parser to more completely filter out unhelpful data, and to
internally build the array of doubles (data and left style).