OT: Editing out "vulgar" words



I've spent hours searching the web for a solutions, but so far I haven't found one. I have found many papers on how you can't publish a list of bad words without getting in trouble for publishing bad words in the UK (not helpful and of questionable accuracy) and how any list of bad words has to include at a minimum a compendium of bad words from the 22 most frequently spoken languages (but doesn't list the languages or the words). Thus, I'm asking the HP3000-L for input.

How did I get into this? Since I'm still unemployed, I'm doing a little bit of photography and web site work in addition to job hunting.

In a web site project I'm working on, I need to "censor" posts that users make. Specifically, I need to remove "obscene, vulgar, offensive, abusive, hateful, harassing, profane, sexually oriented, and threatening" words, replacing each occurrence with the very long phrase "{text deleted by moderator}".

Obviously, my first question was "Do you have a list of the words you want removed? Of course the answer was "no." (LOL, what was I thinking, asking such a question!)

Which (of course) led to my second question "but you will provide the list, correct?" Of course the answer to that question was also "no."

My next question was "Do you have a list of words that people have complained about?" Turned out they did, but that only served to point out another problem - many of the "words" are only offensive when used in a certain context.

Example.
   "Next, Bob asked Dick about the 69 exception reports.  Dick
   replied that all were related to a robotics problem - a
   hydraulic line feeding a robotic arm blew, shutting down
   production."
became:
   "Next, Bob asked {text deleted by moderator} about the
   {text deleted by moderator} exception reports.  {text deleted
   by moderator} replied that all were related to a robotics
   problem - a hydraulic line feeding a robotic arm {text
   deleted by moderator}, shutting down production."

Okay, so maybe it is easier to get a chuckle out of the "censored" text. Still, the context problem remains.

Thus, I'm looking for:
-- a list of bad words
-- some context sensitive software (that runs on Linux) that:
   -- ALWAYS deletes words on list 1
   -- will ONLY delete words on list 2 if they are in an
      offensive context.
-- any info on similar "projects" (and their solutions) you are
   aware of.

Thanks in advance!

John
*** When replying to this message, please do not delete these ***
*** signature lines. Otakon Katsucon HP3000-L @classiccmp.org ***
*** DigitalCosplay.com    JohnKorbPhoto.com     JohnPKorb.com ***

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

.



Relevant Pages

  • Re: OT: Editing out "vulgar" words
    ... And how are you going to be able to tell the context of something? ... > bit of photography and web site work in addition to job hunting. ... > hydraulic line feeding a robotic arm blew, ... Bob asked {text deleted by moderator} about the ...
    (comp.sys.hp.mpe)
  • Re: OT: Editing out "vulgar" words
    ... And how are you going to be able to tell the context of something? ... > bit of photography and web site work in addition to job hunting. ... > hydraulic line feeding a robotic arm blew, ... Bob asked {text deleted by moderator} about the ...
    (comp.sys.hp.mpe)
  • Re: OT: Editing out "vulgar" words
    ... If the moderator agrees that the message content is objectionable, ... the moderator can remove/edit the message, and more importantly, remove ... hydraulic line feeding a robotic arm blew, ... Still, the context problem remains. ...
    (comp.sys.hp.mpe)
  • Re: RFD: delete newsgroup uk.rec.cycling.moderated
    ... for that pronoun in that context, even though I strongly suspect Ian ... power over the group than any other moderator. ... especially in the context of a discussion about the ... where there is another entirely natural interpretation. ...
    (uk.net.news.config)
  • Re: OT: Why People Are Not Free? (rant)
    ... >unless he or she was a moderator, because they don't let them through. ... you don't seen to understand the word "context". ... It appears that just because everyone else included in their posts ... as to the content of the post to which you are responding. ...
    (sci.electronics.design)