Using Open Source Software to Unveil Anonymous Online Extremists

Dr Maurice Dawson
Dr Maurice Dawson

As the country watches and relives the horrific, violent insurrection at the nation’s capital earlier this year, it is clear that we are reactive in the fight against domestic terrorism.

Since 9/11, the United States has been focused on terrorists abroad. However, we have failed to watch and stop the growing insidious Internet activity of domestic extremists on multiple forums such as 8chan, Reddit threads, and more.

Researchers at the Illinois Institute of Technology and University of Nebraska Omaha spent considerable time perfecting techniques to uncover these threats. Open Source Intelligence (OSINT) tools and techniques were used to gather the information, then processed and analyzed by analytics tools such as the R Program. Leveraging proven techniques and methods found in documents such as the Army Techniques Publication 2-22.9 allowed the researchers to examine the problem further. The intelligence process of planning, preparing, collecting, and producing OSINT served as the fundamental steps to understand the landscape that includes nefarious actors.

Numerous Open Source Software were used with proprietary tools to extract and view the collected data. This includes metadata found in photos that provide the time stamp, camera used to take pictures, and geotag location. Items with text were placed into tools that allowed for sentiment analysis and word clouds to look at commonly used words in a blog by multiple contributors.

The El Paso shooting increased awareness of the 8chan social network and pages that are still being widely used. One of these pages was a subthread found on 4chan titled “Politically Incorrect.” In this thread, hate speech, racial bigotry, and religious intolerance are on public display in plain text. This allowed for an algorithm to be developed. The script is based on four different libraries: raw, requests, CSV, and time. Praw library will let the script use the API of Reddit based on the credentials of a Reddit user. Getting the code running fine requires a Reddit user’s username, password, user agent, client id, and client secret. All these parameters can be found under user settings, privacy and security settings, and inside the link app authorization. This configuration cannot be seen unless the user is logged in.

Numerous Open Source Software were used with proprietary tools to extract and view the collected data.

There are three parts to the script – Fetch Input Values, Fetch Data, and Build CSV Document. The script accepts a subreddit URL for a given date range. As an example, our code fetches sample data with information from all “Politically Incorrect” posts between two dates. Through the request’s library, we retrieve a JSON file, which has different keys and their corresponding values ​​including ID. Each post is identified by a unique ID. With that ID, we can retrieve each post and all related data, and store it in a list. Fetch Data searches for all comments with the corresponding IDs, and retrieves the associated data. The third part of the script builds a CSV document, which is structured in columns namely IDs, Author, Title, and Comments. The code finally builds a CSV file as the output.  Below shows the algorithm developed by the researchers.

Using these techniques promises a more proactive approach to addressing domestic extremism online in the hopes of deterring, detecting, and stopping the next domestic terrorist attack.


Dr Maurice Dawson

Dr Maurice Dawson is Illinois Tech Assistant Professor of Information Technology and Management and director of Illinois Tech’s Center for Cyber Security and Forensics Education (C2SAFE).