TREKS IN SCI-FI FORUM

Main Decks => General Topics => Topic started by: psikeyhackr on September 15, 2014, 07:59:14 PM

Title: SF Forensic Semantic Analyzer
Post by: psikeyhackr on September 15, 2014, 07:59:14 PM
For years I have found it annoying that most reviewers say very little about the science in science fiction.  It occurred to me that there can't be much science without using 'science words'.  So I wrote a computer program that searches .txt and .rtf files for science and fantasy words and counts them.

For anyone interested in experimenting with the word counting program I have uploaded Windows and Linux versions with GUI interfaces.  You must have Python on you computer to use them however.  I have uploaded and downloaded and tested the downloads so they should work.

Linux - sfforensic_L.pyc
https://www.sendspace.com/file/8awxir (https://www.sendspace.com/file/8awxir)

Windows - sfforensic_W.pyc
https://www.sendspace.com/file/pwqz4i (https://www.sendspace.com/file/pwqz4i)

The input file is:  ACC.AFalloMndust.txt with 439541 characters.

   It uses 79 SF words 450 times for an SF density of 1.024

The input file is:  OSC.EndersGame.txt with 582652 characters.

   It uses 42 SF words 214 times for an SF density of 0.367

Hard SF tends to get significantly higher SF densities.

Ender's Game is 33% longer than A Fall of Moondust but uses about half as many science words less than half as often.

psik
Title: Re: SF Forensic Semantic Analyzer
Post by: Bromptonboy on September 16, 2014, 07:12:25 AM
Hmmmmm..a systematic way to separate hard sci-fi from space opera.  :)
Title: Re: SF Forensic Semantic Analyzer
Post by: psikeyhackr on September 18, 2014, 08:39:18 AM
Quote from: Bromptonboy on September 16, 2014, 07:12:25 AM
Hmmmmm..a systematic way to separate hard sci-fi from space opera.  :)

It is definitely not 100% reliable.  Hitch Hiker's Guide to the Galaxy gets a density of 0.955 which is well within the Hard SF range.  Clarke's A Fall of Moondust gets 1.024.  But in general Hard SF scores above 0.8.  The Hunger Games came in really low.  The first two were less than 0.1.  They are beaten by Frankenstein and the word 'scientist' had not even been coined back then.

But I thought some readers might be interested in playing with it.
Title: Re: SF Forensic Semantic Analyzer
Post by: psikeyhackr on October 01, 2014, 10:49:44 PM
I got an email about my SF word counting program:
QuoteRobert J. Sawyer <sawyer@sfwriter.com>
   
1:33 PM (6 hours ago)
      
Fascinating!  What an interesting piece of research!  Thank you so, so much!

Please forgive me for being so long in replying -- I've been traveling.

All best wishes!

Rob

psik