~dricottone/blog

1900b7b7a3fd1d390eb2722aa0bf31f069d61170 — Dominic Ricottone 8 months ago 1a169a0
Content updates
A content/posts/bug_in_spss_excel_writer.md => content/posts/bug_in_spss_excel_writer.md +111 -0
@@ 0,0 1,111 @@
---
title: Bug in SPSS's Excel Writer
date: "2024-01-08T05:39:23+00:00"
draft: false
---

I was unpleasantly surprised to discover a corrupted Excel data file this week.
Luckily I was fully prepared to rebuild it with my pipeline in SPSS, but rather
puzzlingly, rebuilding did *not* cure the problem.
I was returned the exact same corruption error in the new file.

The first, and typically only, investigation for such issues is into the input
data files.
*Surely the corruption came from elsewhere.*
But after a few minutes of poking and prodding, I found nothing wrong there.
(No ASCII control characters, no embedded case delimiters, no unescaped value
delimiters, etc., etc.)

My next step was the excessive insertion of debug commands in my pipeline,
trying to determine where and when the issue first appears.
(Sadly SPSS lacks useful debug commands; LIST and some personal macros are the
best available tools.)
But the frustration continued, as everything seemed perfect up to the very end.

I tried to re-import the corrupted spreadsheet and, strangely, I had no issues
doing so.
Whatever issue Excel identified in the file, SPSS was content to work with.
This prompted me to do some comparisons, to see if any differences existed
before and after the Excel round-trip.
And finally I had a culprit:
my data was magically mutating when written to an Excel file.
Before, a string read like "Phase 2B: \_x0001\_".
After, it was "Phase 2B: □".

Now I export the mutated data into a text file for closer inspection of the
byte literals.
I found the ASCII control character for start of header.
Surely it isn't a coincidence that SOH corresponds to `01`, which I suppose
you can creatively write as `0001`?
But I've never heard of an escape scheme like "\_xHHHH\_".
Googling "\_x0001\_" gave me nothing of value.

I used a simple data list to test all of the first 10 codepoints (`00`-`09`).
These were all exported to an Excel file, read back into SPSS, and written to a
text file.
Interestingly the null byte turned into a space character.
Aside from that, I got exactly what I expected; a series of ASCII control
characters.
So "\_x0001\_" clearly isn't a special case.

The next thing I tested is whether the leading and trailing underscores were
important.
Indeed they are.
Now I am convinced this is a scheme for encoding data.

Finally I try "\_x0030\_", to be sure that this was a hexadecimal encoding.
The decimal `30` codepoint refers to another control character, while the hex
`30` codepoint refers to the zero character.
And yes, when I saw the "0" upon re-importing, this confirmed that I was
dealing with some sort of hexadecimal escape scheme.

----

The issue can perhaps best be demonstrated by trying to reconstruct the issue
within a first-party, fully supported, WYSIWYG editor.
I of course mean Microsoft Excel.

I create a new Excel file containing just "foo \_x0001\_ bar" in the first cell,
save, and exit.
I can immediately re-open the file, so clearly Excel has not written a corrupt
file.
What did Excel do with that value?

It requires some further digging, because modern Excel writes string values to
a separate `sharedStrings.xml` file in an effort to be more efficient.
But because I kept the reconstruction short and simple, it's a quick detour.
Excel took "foo \_x0001\_ bar" and actually wrote "foo \_x005F\_x0001\_ bar".
In case you don't have your handy ASCII codepage available, `5F` represents
the underscore character.

This is the neat parallel for the escaping strategy used on the web,
e.g. `<`.
`&lt;` wants to be read as "<" by any browser.
That behavior is effectively 'deferred' by encoding the leading character
instead,
so that the first pass of the interpreter renders the intended result.

----

It occurred to me much later that SOH is a very rare and unhelpful control
character.
`09`, the tab character, was far more likely to give me a useful Google search.
And in fact "\_x0009\_" was a much more informative search page.
I was lead down a rabbit hole of the XML 1.0 spec, Microsoft's documentation
for `DocumentFormat.OpenXml.Spreadsheet.CellValue` of the OpenXml API, and the
`ST\_Xstring` type from ECMA-376.

ASCII control characters must be escaped in an XML document like "\_xHHHH\_".
In other words, when SPSS wrote an unencoded "\_x0001\_" into an XML file,
it was inevitable that any spec-compliant XML parser would substitute that
literal with SOH.
SPSS should have written "\_x005F\_x0001\_" instead.

----

I had to jump through a variety of hoops to report this bug.
I wasn't surprised by this;
I certainly didn't expect any more from IBM.
But I decided it was worthwhile anyway.
This seems like a highly technical bug that could net me some 'internet cred'.


A content/posts/salazar_slytherin_is_a_druid.md => content/posts/salazar_slytherin_is_a_druid.md +115 -0
@@ 0,0 1,115 @@
---
title: Salazar Slytherin is a Druid
date: "2023-12-29T06:18:06+00:00"
draft: false
---

And now for something completely different.

----

A common method for teaching Dungeons & Dragons mechanics is to place them in
cultural context.
This is especially true for teaching races and classes.
Human fighters are self-explanatory,
but it's not obvious if elves are the *Santa's workshop* sort or the *JRPG*
sort.
Rogues can sound like an evil profile,
but remember Robin Hood who gave to the poor.

I personally believe that the druid class can be particularly challenging to
teach.
There are few druids in pop culture.
I've mostly seen new players advised to think about 'the brown wizard in the
Hobbit movies' (read: Radagast) for circumlocution.
Many people come away thinking that druids are the ugly duckling of magic
users.
Others come away fixating on the 'animal' thing,
with no respect to the intended philosophy of druids.

I have a novel proposal:
Salazar Slytherin from the Harry Potter universe is a prototypical druid.

----

One of core traits of the druid class,
and indeed the trait that attracts the most interest from new players,
is their connection to animals.
Druids can use magic to speak with animals and shapeshift into animal forms.
They partner with an animal familiar.
Personality-wise, they are pet parents.

Salazar was obviously a snake person.
They are the mascot and motif of his Hogwarts house.

His most well-known ability was Parseltongue,
to the point that the Gaunts
(up to and including Tom Riddle)
only had to demonstrate that ability to make a claim on Slytherin's legacy.
This is the clearest link to druidity for Salazar.
And the ability to talk with animals is rather frequently the *entire reason*
players pick that class.
Salazar will effectively communicate these details,
whereas Radagast's best parallel is having a rabbit-drawn sled.

Salazar's basilisk is also one of the most prominent familiars in the
Harry Potter universe.
I'd argue the only competition is Hedwig and Fawkes.
The basilisk and the chamber that Salazar built for it are the titular topics
of an entire book, after all.

And what a familiar it is!
Basilisks have a range of abilities that are both role-play and combat oriented,
while also being balanced by lore-derived weaknesses (i.e. roosters' crowing).
This could go a long way to preparing new players for the choice of a familiar.

Needless to say,
the fact that Salazar made that labyrinth inside the school further emphasizes
his care for the animal.
There can be little doubt that he was a *bit obsessed*.
While it can be unfortunate (as alluded to above),
*this* is what many new players fixate on for thinking about druid.
But surely Salazar's template could not make the misunderstanding *worse*.

----

Druids are also deeply connected to nature.
In D&D, this largely means elemental magics
(as opposed to god-derived magics;
or unnatural, evil magics).
The Harry Potter universe does not divide magic into elemental disciplines,
instead preferring categories like Transfiguration and Divination.

Let's leave aside the Dark Arts.

Let's instead consider Hogwarts for what it is.
A stone castle of traditional construction.
Located in Middle Of Nowhere, Scotland with just a small commercial town
nearby.
Beset by by merpeople-inhabited lakes and forbidden forests.
Isolated *not just* from the trappings of modern urbanism,
but also from the *(unnatural?)* distractions of government and press and the
consumer economy.
Salazar and the co-founders certainly seem to have a disposition towards
*nature*.

----

Druids are also devoted to a philosophy of balance that is colored by their
worldview.
Again, in D&D, this largely has to do with nature as elemental magics.
They are motivated to action in defence of this balance.

Salazar certainly had opinions about the balance of magical society.
Unfortunately, his worldview was colored by racism and bigotry.
He and his basilisk were happy to 'defend' that balance, too.

This is a stretch, I know,
but perhaps could demonstrate how an evil-aligned druid could be written?

----

In summary, while Radagast is an *option* for relaying the concept of druids in
Dungeons and Dragons,
I believe Salazar Slytherin is the *superior choice*.


M scripts/openring.sh => scripts/openring.sh +2 -0
@@ 13,5 13,7 @@ openring \
  -s https://andreabergia.com/post/index.xml \
  -s https://ludic.mataroa.blog/rss/ \
  -s https://tradediversion.net/feed/ \
  -s https://vincent.bernat.ch/en/blog/atom.xml \
  -s https://blog.cr.yp.to/feed.application=xml \
  < scripts/openring.html