Back in the 1980s, there existed a piece of hardware called the “TVGuardian,” which would attempt to censor incoming video in real-time. As recently covered by the wonderful YouTube channel Technology Connections, the TVGuardian reads captioning data as it’s sent and then replaces the bad word(s) with an alternative phrase and also mutes the audio.
Upon learning that the internal dictionary of offensive words is not listed anywhere in the manual, Ben Eater had the idea to extract it himself. After a quick teardown, he discovered a single 93LC86 EEPROM chip functioning in 8-bit mode for a total of 2,048 8-bit words. He then connected an Arduino Uno to the EEPROM’s SPI bus and read 16-byte chunks before dumping the contents to the serial monitor for further investigation.
One of the most interesting findings that Eater discovered was how the words were encoded in blocks of 256 bytes separated by a long string of null characters. Every bad word is an array of bytes for the ASCII characters themselves along with a terminating character and an extra byte at the end, whereas the replacement words are listed as simple character arrays indexed elsewhere. The final byte of each censored word contains flag bits that denote if the word is whitelisted, allowed in non-strict mode, and which G-rated word should replace it. To see this analysis in more detail, check out Eater’s video below!
Read more about this on: Arduino Blog