Scrape most reviewed news and tweet using Python

Last Updated : 17 May, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Many websites will be providing trendy news in any technology and the article can be rated by means of its review count. Suppose the news is for cryptocurrencies and news articles are scraped from cointelegraph.com, we can get each news item reviewer to count easily and placed in MongoDB collection.
 

Modules Needed

 

  • Tweepy: Tweepy is the Python client for the official Twitter API. Install it using the following pip command: 
     
pip install tweepy
  •  
  • MongoClient: The class MongoClient enables one to make successful MongoDB server connections with your code. Install it using the following pip command: 
     
pip install pymongo
  •  
  • Pyshorteners: Pyshorteners are used to shorten, brand, share, or retrieve data from links programmatically. Install it in the below ways
     
pip install pyshorteners
  •  

 

Authentication

In order to fetch tweets through Twitter API, one needs to register an App through their Twitter account. Follow these steps for the same:
 

  • Open this link https://apps.twitter.com/ and click the button: ‘Create New App’ 
     
  • Fill in the application details. You can leave the callback URL field empty. 
     
  • Once the app is created, you will be redirected to the app page. 
     
  • Open the ‘Keys and Access Tokens tab. 
     
  • Copy ‘Consumer Key’, ‘Consumer Secret’, ‘Access token’, and ‘Access Token Secret’ and paste them in the below code. 
     

Below is the implementation.
 

Python3




# Python program to get top 3 trendy news item 
   
  
import tweepy
import json
from datetime import date, timedelta, datetime
from pymongo import MongoClient
from html.parser import HTMLParser
import re
from pyshorteners import Shortener
  
  
NewsArrayIndex = 0
NewsArray = [None] * 3
  
class MyHTMLParser(HTMLParser):
      
    # This function collects the value
    # of href and stores in NewsArrayIndex 
    # variable
    def handle_starttag(self, tag, attrs):
          
        # Only parse the 'anchor' tag.
        global NewsArrayIndex
        if tag == "a":
             
            # Check the list of defined attributes.
            for name, value in attrs:
                 
                # If href is defined, print it.
                if name == "href":
                     
                    # print(value + "\t" + News1)                                   
                    NewsArray[NewsArrayIndex] = value
                     
                    # print(NewsArray)
                    NewsArrayIndex += 1
                      
# This function is the primary place
# to tweet the collected daily news
# News is retrieved from Coll_DailyNewsPlusReview 
# collection (MongoDB collection) This collection
# holds the value of " News Headlines, Its review 
# count, news link" and based upon the review count,
# top most # reviewed news are taken As twitter allows
# only 280 characters, the retrieved news link got
# shortened by using BITLY API Hashtags related to
# the news are added underneath the retrieved top
# 3 news (All together allowed characters are 280)
# Then top 3 news gets tweeted from a credential
# Finally per day basis the tweeted news are stored
# into another collection for audit purposes as well
# as for weekly posting
def tweetDailyNews():
      
    try:
          
        # This is the collection name in mongodb
        cursor_P = db1.Coll_DailyNewsPlusReview.find({"time": date_str})
                
        p0 = cursor_P[0]
        News = p0.get('News')
        sortedNews = sorted(News, key = lambda x: int(x[1]), reverse = True)
        print(sortedNews[0][0]+"--" + sortedNews[0][1],
              sortedNews[1][0] + ".."+ sortedNews[1][1],
              sortedNews[2][0] + ".." + sortedNews[2][1])
          
        hyperlink_format = '<a href ="{link}">{text}</a>'
        parser = MyHTMLParser()
        dailyNews = "Impactful News of the Day" + "\n"
          
        News0 = sortedNews[0][2]
        parser.feed(hyperlink_format.format(link = News0, text = News0))
          
        News1 = sortedNews[1][2]
        print("News1", News1)
        parser.feed(hyperlink_format.format(link = News1, text = News1))
          
        News2 = sortedNews[2][2]
        print(News2)
        parser.feed(hyperlink_format.format(link = News2, text = News2))
          
        # News shortening pattern
        BITLY_ACCESS_TOKEN ="20dab258cc44c7d017bcd1c1f4b24484a37b8de9"
        b = Shortener(api_key = ACCESS_TOKEN) 
          
        NewsArray[0] = re.sub('\n', '', NewsArray[0])
        response1 = b.bitly.short(NewsArray[0]) 
        response1 = response1['url']
          
        NewsArray[1] = re.sub('\n', '', NewsArray[1])
        response2 = b.bitly.short(NewsArray[1]) 
        response2 = response2['url']
           
        NewsArray[2] = re.sub('\n', '', NewsArray[2])
        response3 = b.bitly.short(NewsArray[2]) 
        response3 = response3['url']
          
        news1FewWords = sortedNews[0][0].split()
        dailyNews += news1FewWords[0] + " " 
        + news1FewWords[1] + " " + news1FewWords[2
        + "...." + response1 + "\n"
          
        news2FewWords = sortedNews[1][0].split()
        dailyNews += news2FewWords[0] + " " 
        + news2FewWords[1] + " " + news2FewWords[2
        + "...." + response2+"\n"
          
        news3FewWords = sortedNews[2][0].split()
        dailyNews += news3FewWords[0] + " " 
        + news3FewWords[1] + " " + news3FewWords[2]
        + "...." + response3 + "\n" + "# bitcoin \
        # cryptocurrency # blockchain # investor # altcoins\
        # fintech # investment"
        print(dailyNews)
          
        status = api.update_status(status = dailyNews)
        if status:
            for i in range(3):
                datas = {}
                datas['time'] = str(date.today())
                datas['posted_as'] = i
                datas['news'] = sortedNews[i][0]
                datas['shortenedlink'] = NewsArray[i]
                datas['reviewcount'] = sortedNews[i][1]
                datas['link'] = sortedNews[i][2]
                db1.Collection_tweeted_news.insert(datas)
                   
               
    except Exception as e:
        print(e)
        print("Error in getting today news data", str(date_str))
  
          
# Driver Code
News1 = ' '
News2 = ' '
  
date_str = str(date.today())
print("today", date_str)
client = MongoClient('mongodb://localhost:27017/')
  
# Connect your database here
db1 = client.xxxx 
  
# credentials to tweet
consumer_key ="xxxx"
consumer_secret ="xxxx"
access_token ="xxxx"
access_token_secret ="xxxx"
   
# authentication of consumer key and secret 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
   
# authentication of access token and secret 
auth.set_access_token(access_token, access_token_secret) 
api = tweepy.API(auth,
                 wait_on_rate_limit = True,
                 wait_on_rate_limit_notify = True)   
      
tweetDailyNews()


Output:
 

Impactful News of the Day 
Veteran Investor Says….https://bit.ly/2X1x51V 
Bitcoin Hashrate Drops….https://bit.ly/2T83xyS 
The VC Who….https://bit.ly/3czxVKb 
#bitcoin #cryptocurrency #blockchain #investor #altcoins #fintech #investment 
 

 



Previous Article
Next Article

Similar Reads

Build an Application to extract news from Google News Feed Using Python
Prerequisite- Python tkinter In this article, we are going to write a python script to extract news articles from Google News Feed by using gnewsclient module and bind it with a GUI application. gnewsclient is a python client for Google News Feed. This API has to installed explicitly first in order to be used. Installation The following terminal co
2 min read
Fetching top news using News API
News API is a simple JSON-based REST API for searching and retrieving news articles from all over the web. Using this, one can fetch the top stories running on a news website or can search top news on a specific topic (or keyword). News can be retrieved based on some criteria. Say the topic (keyword) to be searched is 'Geeksforgeeks' or might be co
5 min read
Retweet Tweet using Selenium in Python
Prerequisites : Selenium Webdriver.Web Driver MethodsWays to locate elements in a webpage In this article, we are going to see how to retweets for a particular hashtag and decide on whether it will be displayed in the Trending Section of Twitter or not. We can automate an account on Twitter into retweeting all the tweets related to any particular h
4 min read
Tweet using Python
Twitter is an online news and social networking service where users post and interact with messages. These posts are known as "tweets". Twitter is known as the social media site for robots. We can use Python for posting the tweets without even opening the website. There is a Python library which is used for accessing the Python API, known as tweepy
3 min read
Tweet Sentiment Analysis Using Python Streamlit
This article covers the sentiment analysis of by parsing the tweets fetched from Twitter using the streamlit Python framework. What is Sentiment Analysis? Sentiment Analysis is the process of ‘computationally’ determining whether a piece of writing is positive, negative or neutral. It’s also known as opinion mining, deriving the opinion or attitude
4 min read
Python Tweepy – Getting the date and time when a tweet was tweeted
In this article we will see how we can get the date and time when a status/tweet was posted/tweeted. The created_at attribute of the Status object provides us with a datetime object which tell us when the status was posted. Identifying the date when a status was posted in the GUI : In the above mentioned status, the status was posted on : Jun 11 In
2 min read
Python Tweepy – Getting the text of a tweet
In this article we will see how we can get the text of a status/tweet. A tweet can only have a maximum of 280 characters. The text attribute of the Status object provides us with the text of the status. Identifying the text of the status in the GUI : In the above mentioned status, the text of the status is : A programmer’s takeaway from the pandemi
3 min read
Python Tweepy – Getting the number of times a tweet has been retweeted
In this article we will see how we can get the number of times a tweet/status has been retweeted. The retweet_count attribute of the Status object provides us with the number of times a tweet has been retweeted . Identifying the number of times a tweet has been retweeted in the GUI : The above mentioned status has been retweeted 4 times. In order t
2 min read
Python Tweepy – Getting the source of a tweet
In this article we will see how we can get the source of a status/tweet. The source of the tweet tells us how the tweet was posted. Some examples of sources are : Twitter for Android Twitter for iPhone Twitter Web App The source attribute of the Status object provides us with the source of the status. Identifying the source of the status in the GUI
2 min read
Python Tweepy – Getting the number of times a tweet has been favourited
In this article we will see how we can get the number of times a tweet/status has been favourited. The favorite_count attribute of the Status object provides us with the number of times a tweet has been favourited. Identifying the number of times a tweet has been favourited in the GUI : The above mentioned status has been favourited 16 times. In or
2 min read
Practice Tags :
three90RightbarBannerImg