Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
This repository has been archived by the owner on May 24, 2021. It is now read-only.
/ fake-real-news Public archive

News classifier in the fight against disinformation. Group Project for COMP3359 @ HKU.

License

Notifications You must be signed in to change notification settings

vicw0ng-hk/fake-real-news

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fake-real-news

python fastai flask

A news classifier in the fight against disinformation by HUANG, Sheng & LI, Yik Wai. 🤝

Presentation

🏷️ "If you tell a lie big enough and keep repeating it, people will eventually come to believe it. The lie can be maintained only for such time as the State can shield the people from the political, economic and/or military consequences of the lie. It thus becomes vitally important for the State to use all of its powers to repress dissent, for the truth is the mortal enemy of the lie, and thus by extension, the truth is the greatest enemy of the State."

Joseph Goebbels, Reich Minister of Propaganda, Nazi Germany

Mission ⚓

What is our relationship with the truth (or, the reality)? 😕 That is a philosophical question. 🤓

Great minds struggle with this question.

Young Sheldon Cooper

Young Sheldon Cooper struggled with this problem and wanted to switch major to philosophy. But in the end, he returned to science when he realized physics theories could explain more patterns in nature.

Young Sheldon Cooper 2

As fake news spreads wildly today, we have been in a similar crisis. 😱 With so much information, how can you tell what is real and what is not? 😨 Especially, with the cost of reading a news article so low and the cost of verifying the facts so high, how can anyone make judgments on the authenticity of the news content? ☹️ Some may say nothing is real and lead a life without any thoughts on the world. 😫 Some may say they trust authoritative sources. But how do you define authoritative source? Is everything put out by these authoritative sources guaranteed to be true? 😖 Can we find patterns on these articles, also taking into account its sources, and then make a better judgment, or a more educated guess? 🙄

Indeed we CAN find linguistic patterns in news articles. 😃 These patterns may well have correlations to the realness or fakeness of these articles. This may be related to the fact that a lot of the fake news originates from authoritative governments and malign forces that don't usually give their writers complete journalism training. But it's not that simple. 🙃 Remember that news media have biases. For example, in the United States, conservative media such as Fox News and liberal news media such as MSNBC and CNN have different styles when reporting news 🥶, but that doesn't automatically mean that one style is equal to fake news reporting. (Check out Media Bias Ratings) To avoid such biases when training our model, we recognize news articles that cannot be easily categorized as real or fake, such as pieces that are strong in opinions. (Check out the types we have here 👈)

We concede that this approach is still flawed, which will be discussed in Limitations. 😑 However, it can give us somewhat of a reference when we are judging the authenticity of an article, as we have given explicit descriptions on how we categorize the articles. 🙂 Still, users should keep in mind that we are not the arbitor of truth and that the model cannot replace the work of a professional fact checker - it cannot visit the places where events happened; it cannot interview people involved in the stories; it cannot know the intention of the publishers when they put out the story... 🙃

THERE IS NO ALGORITHM FOR TRUTH.

Tom Scott

Reports 📚

  • Proposal 📑 pdf
  • Interim Report 📑 pdf
  • Final Report 📑 pdf

Running 🏃‍♂️ 🏃‍♀️

It's highly 🔝 recommended to run the app on a Unix-like system (GNU/Linux, macOS, ...). ‼️ Using Windows may cause some issues when installing dependencies. 😢

0. Cloning the repository ⬇️

git clone https://github.com/vicw0ng-hk/fake-real-news.git

Or, clone through SSH for better security. 🔐

git clone git@github.com:vicw0ng-hk/fake-real-news.git

Or, clone with GitHub CLI :octocat:

gh repo clone vicw0ng-hk/fake-real-news

Due to the large size of our model, it is stored with Git LFS, and because of GitHub's bandwidth limit 🚧, please use this link 👈 to download app/model/model.pkl and replace the file in the cloned directory.

1. Installing environment 🌴

This may be different depending on the virtualization technology you are using 🤷, but generally do

cd app/
pip3 install -r requirements.txt

2. Run the app! 🚅

python3 app.py

Methodology 🛠️

Check out the Methodology document.

Functionalities ⚙️

Check out the Functionalities document.

Limitations 📐

  • One major limitation is from the categorization of our dataset. 1️⃣ The dataset we have is a single-label dataset. But this is not in accord with the reality. For example, many conspiracies are highly political, hence a lot of the articles with the conspiracy tag may also fit into the political tag. Hence, by this feature of the dataset, accuracy of training has not been very high for some of the test cases. And it is susceptible to overfitting if we train too much for higher accuracy, which is why we chose to present the predictions in the app by probabilities. (Check out Functionalities)

  • Another limitation is our development time and resources. 2️⃣ We have a very large dataset (Check out Methodology). However, we cannot make full use of it because we have limited time and resourses allocated by GPU Farm is relatively restrictive compared to the size of our dataset. Hence, we used only a portion of the total data to train our model.

  • There is also the limitation of the capabilities of machines. 3️⃣ We can only use the content (plus its URL, Title and Authors) to decide the categorization of new articles. For some articles, humans could easily tell their nature and authenticity based on common sense and general knowledge. However, the model cannot think that way, so some of the easy-to-recognize evidence to a human is difficult to find for the model.

Terms and Conditions 📜

In addition to the restrictions of GNU Affero General Public License v3.0 of this repo, you also agree to the following terms and conditions:

YOUR USE OF THIS WEB APP CONSTITUTES YOUR AGREEMENT 
TO BE BOUND BY THESE TERMS AND CONDITIONS OF USE.

1. The classification of the text you submit to this 
web app is in no way legal recognition. The web app 
and/or its authors bear no legal responsiblities for 
its result. If you choose to publish the result, the 
web app and/or its authors shall not bear any legal 
consequences relating to this action.  
2. You shall be liable for the legal reponsibilities 
of the copyright of the text you submit to this web 
app. You shall gain the right to copy the text before 
you submit it to the web app. 
3. This web app shall not be used by any political 
organization and/or any entity, partially or entirely, 
directly or indirectly, funded and/or controlled by a 
political organization in any jurisdiction. 
4. In case of any discrepency with any other licenses, 
terms or conditions associated with this web app 
and/or its repository, this agreement shall prevail. 

About

News classifier in the fight against disinformation. Group Project for COMP3359 @ HKU.

Resources

License

Stars

Watchers

Forks