~dricottone/fmg-timesheets

ref: 721dfa4e4fbe8ffcb9b0d1462bb7367022eeac50 fmg-timesheets/parser/timesheet.py -rw-r--r-- 8.8 KiB
Fully functional timesheet parser.

The timesheet parser is a complete success. Some minor issues were
ironed out in the XML parser as well.

Next steps: writing to a time series database and beginning analysis.
Implemented time entry extraction; no assert errors!

There is still a major issue ahead of 'structured' data:
Hours data is leaking between entries. There are entries with no hours
at all. There are almost certainly some entries that have hours out of
order.

It will likely be necessary to re-sort all items ahead of processing
based on top then left style attributes. This is going to have the
consequence of invalidating some of the work I've already put into
parsing the data as-is.

Good luck, future me.
Started implementing time entry extraction.

Time entries are now being parsed and validated, though there are
numerous issues still to sort out.

I have a feeling that further development will require passing around
the `top` style attributes in the same way I'm passing around the `left`
style attributes. TBD though.
Significant updates

Wrote time sheet parser that ingests and validates all semi-structured
data. Next step is to interpret left styles as dates, so that hours can
be parsed into a time entry object.

Updated HTML parser to more completely filter out unhelpful data, and to
internally build the array of doubles (data and left style).