Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
64066b4
Init commit
InvokerAndrey Nov 10, 2019
3d5dc1d
Added argumet parser
InvokerAndrey Nov 13, 2019
b291b7f
Added rss reader class and main function
InvokerAndrey Nov 13, 2019
2872722
Implemented --json argument processing
InvokerAndrey Nov 13, 2019
247aae4
Refactored invoke mothods
InvokerAndrey Nov 13, 2019
4abd526
Implemented human-readable format
InvokerAndrey Nov 15, 2019
f14e675
Added --version argument
InvokerAndrey Nov 15, 2019
f1f134d
Implemented --verbose argument
InvokerAndrey Nov 16, 2019
5cb2846
Delete rss_reader.py
InvokerAndrey Nov 16, 2019
83343d7
Create README.md
InvokerAndrey Nov 16, 2019
d7eec79
Create requirements.txt
InvokerAndrey Nov 16, 2019
934e2da
Merge branch 'final_task' of https://github.com/BntuHater/PythonHomew…
InvokerAndrey Nov 16, 2019
5d1778d
Implemented [Iteration 2] Distribution
InvokerAndrey Nov 17, 2019
71ab39b
Implemented [Iteration 3] News caching
InvokerAndrey Nov 21, 2019
0ae55a5
Refactored [Iteration 3] News caching
InvokerAndrey Nov 21, 2019
10ab7b2
Now cached news dasplay specifically from the transmitted URL
InvokerAndrey Nov 21, 2019
c5b01e9
Implented --to-pdf argument
InvokerAndrey Nov 25, 2019
e4cdb8b
Implemented --to-pdf argument
InvokerAndrey Nov 27, 2019
f4f457d
Refactored code
InvokerAndrey Nov 27, 2019
304ddb9
fixed rss-reader --help
InvokerAndrey Nov 27, 2019
177135c
Implemented --to-html argument
InvokerAndrey Nov 28, 2019
656376a
added --to-html argument
InvokerAndrey Nov 28, 2019
47245d4
Included fonts for pdf into setup.py
InvokerAndrey Nov 29, 2019
bfe71bc
implemented couple exceptions
InvokerAndrey Nov 30, 2019
589dc8d
RSSException, --colorize, test_RSSReader
InvokerAndrey Dec 1, 2019
12f9051
refactored args
InvokerAndrey Dec 1, 2019
b2fce55
version 0.5.0
InvokerAndrey Dec 1, 2019
e7e4e7c
exception and requests
InvokerAndrey Dec 8, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,6 @@ venv.bak/

# mypy
.mypy_cache/

# IDE
.idea
19 changes: 19 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Copyright (c) 2018 The Python Packaging Authority

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
49 changes: 49 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# PythonHomework
[Introduction to Python] Homework Repository

# How to use
* pip install .
* rss-reader rss-reader.py "https://news.yahoo.com/rss/" --limit 2 --json
--to-pdf C:\Users\User_name\Desktop
* --date prints cached news that were parsed previously from the given URL
Creates folder cache and saves news in JSON files format
file name = date (like 20191125.json)
* For --to-pdf argument: specify the path to the folder
where 'news.pdf/cached_news.pdf' file will be saved.
The file will be overwritten after restarting the program.
Make sure to copy that file if you need it. Same thing with --to-html argument.
Also --to-html uses pictures from websites, so they wont be displayed without
internet connection
* Btw i use fonts for .pdf files to avoid encoding issues,
hope they will be installed correctly by 'pip install .'


# Parameters
* --help (show this help message and exit)
* --limit LIMIT (limit news topics if this parameter provided)
* --json (prints result as JSON in stdout)
* --verbose (outputs verbose status messages)
* --version (print version info)
* --date (It should take a date in YYYYmmdd format. For example:
--date 20191020The new from the specified day will be printed out.
If the news are not found error will be returned.)
* --to-pdf TO_PDF (It should take the path of the directory where new PDF file will be saved)
* --to-html TO_HTML (It should take the path of the directory where new HTML file will be saved)

# JSON structure
feed = {
'Title': 'feed title',
'Published': 'date',
'Summary': 'news description',
'Link': 'original link to news',
'Url': 'url of rss feed'
'Image': 'original link to the image'
}

# Progress
- [x] [Iteration 1] One-shot command-line RSS reader.
- [x] [Iteration 2] Distribution
- [x] [Iteration 3] News caching
- [x] [Iteration 4] Format converter
- [x] * [Iteration 5] Output colorization
- [ ] * [Iteration 6] Web-server
181 changes: 181 additions & 0 deletions app/RSSReader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
"""
Contains class RSSReader which receives arguments from cmd
and allows to parse URL with RSS feed and print it in stdout
in different formats
"""

import os
import json

import feedparser
from bs4 import BeautifulSoup
import dateutil.parser as dateparser
from colorama import init
from colorama import Fore
import requests

from app.rss_exception import RSSException


class RSSReader:
""" Reads news from RSS url and prints them """

def __init__(self, url, limit, date, logger, colorize=None):
self.url = url
self.limit = limit
self.date = date
self.logger = logger
self.colorize = colorize
init() # colorama

def get_feed(self):
""" Returns parsed feed and caches it"""
response = requests.get(self.url).text
news_feed = feedparser.parse(response)
for entry in news_feed.entries[:self.limit]:
self.cache_news_json(entry)
self.logger.info('News has been cached')
if not news_feed.entries:
raise RSSException('Did not parse any news')
return news_feed.entries[:self.limit]

def print_feed(self, entries):
""" Prints feed in stdout """

self.logger.info('Printing feed')

if self.colorize:
for entry in entries:
print(f'{Fore.GREEN}========================================================{Fore.RESET}')
print(f'{Fore.GREEN}Title:{Fore.RESET} {entry.title}')
print(f'{Fore.GREEN}Published:{Fore.RESET} {entry.published}')
print(f'{Fore.GREEN}Summary:{Fore.RESET} {BeautifulSoup(entry.summary, "html.parser").text}')
print(f'{Fore.GREEN}Image:{Fore.RESET} {self.get_img_url(entry.summary)}')
print(f'{Fore.GREEN}Link:{Fore.RESET} {entry.link}')
print(f'{Fore.GREEN}========================================================{Fore.RESET}')
else:
for entry in entries:
print('========================================================')
print(f'Title: {entry.title}')
print(f'Published: {entry.published}', end='\n\n')
print(f'Summary: {BeautifulSoup(entry.summary, "html.parser").text}', end='\n\n')
print(f'Image: {self.get_img_url(entry.summary)}')
print(f'Link: {entry.link}')
print('========================================================')

def get_img_url(self, summary):
""" Parses image url from <description> in rss feed """
soup = BeautifulSoup(summary, 'html.parser')
img = soup.find('img')
if img:
img_url = img['src']
return img_url
else:
return None

def print_feed_json(self, entries):
""" Prints feed in stdout in JSON format """

self.logger.info('Printing feed in JSON format')

for entry in entries:
feed = self.to_dict(entry)
if self.colorize:
print(Fore.GREEN + json.dumps(feed, indent=2, ensure_ascii=False) + Fore.RESET, end=',\n')
else:
print(json.dumps(feed, indent=2, ensure_ascii=False), end=',\n')

def to_dict(self, entry):
""" Converts entry to dict() format """

feed = dict()
feed['Title'] = entry.title
feed['Published'] = entry.published
feed['Summary'] = BeautifulSoup(entry.summary, "html.parser").text
feed['Link'] = entry.link
feed['Url'] = self.url
feed['Image'] = self.get_img_url(entry.summary)
return feed

def cache_news_json(self, entry):
""" Saves all printed news in JSON format (path = 'cache/{publication_date}.json')"""

date = dateparser.parse(entry.published, fuzzy=True).strftime('%Y%m%d')
directory_path = 'cache' + os.path.sep
if not os.path.exists(directory_path):
self.logger.info('Creating directory cache')
os.mkdir(directory_path)

file_path = directory_path + date + '.json'

feed = self.to_dict(entry)
news = list()
try:
with open(file_path, encoding='utf-8') as rf:
news = json.load(rf)
if feed in news:
# already cached
return
except FileNotFoundError:
self.logger.info('Creating new .json file')
except json.JSONDecodeError:
self.logger.info('Empty JSON file')

with open(file_path, 'w', encoding='utf-8') as wf:
news.append(feed)
json.dump(news, wf, indent=2)

def get_cached_json_news(self):
""" Returns the list of cached news with date from arguments """

file_path = 'cache' + os.path.sep + self.date + '.json'
cached_news = list()
try:
with open(file_path) as rf:
news = json.load(rf)
for new in news:
if new['Url'] == self.url:
cached_news.append(new)
if not cached_news:
# News with such url have not been found
raise FileNotFoundError
return cached_news[:self.limit]
except FileNotFoundError:
if self.colorize:
print(f'{Fore.RED}There are no cached news with such date by this url{Fore.RESET}')
else:
print('There are no cached news with such date by this url')
except json.JSONDecodeError:
# Empty json file
# Or no news by needed url
if self.colorize:
print(f'{Fore.RED}There are no cached news with such date by this url{Fore.RESET}')
else:
print('There are no cached news with such date by this url')
return False

def print_cached_feed(self, cached_feed):
""" Prints saved news in stdout """

self.logger.info('Printing cached feed')
for new in cached_feed:
if self.colorize:
print(f'{Fore.GREEN}---------------------------------------------------------{Fore.RESET}')
for key, value in new.items():
print(f'{Fore.GREEN}{key}:{Fore.RESET} {value}')
print(f'{Fore.GREEN}---------------------------------------------------------{Fore.RESET}')
else:
print('---------------------------------------------------------')
for key, value in new.items():
print(f'{key}: {value}')
print('---------------------------------------------------------')

def print_cached_feed_json(self, cached_feed):
""" Prints saved news in stdout in JSON format """

self.logger.info('Printing cached feed in JSON format')
for new in cached_feed:
if self.colorize:
print(Fore.GREEN + json.dumps(new, indent=2) + Fore.RESET, end=',\n')
else:
print(json.dumps(new, indent=2), end=',\n')
Empty file added app/__init__.py
Empty file.
6 changes: 6 additions & 0 deletions app/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
""" Package entry point """

from app.rss_reader import main

if __name__ == '__main__':
main()
77 changes: 77 additions & 0 deletions app/argparser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"""
Contains ArgParser class which allows parse arguments from cmd
"""

import argparse


__version__ = '0.5.0'


class ArgParser:
""" Reads arguments """

def __init__(self):
self.args = self.parse_args()

def parse_args(self):
""" Reads arguments from the cmd and returns them """

argparser = argparse.ArgumentParser(description='One-shot command-line RSS reader', prog='rss-reader')
argparser.add_argument(
'url',
type=str,
help='Input RSS url containing news'
)
argparser.add_argument(
'--limit',
type=int,
default=None,
help='Sets a limit for news output (default - no limit)'
)
argparser.add_argument(
'--json',
action='store_true',
help='Prints feed in JSON format in stdout'
)
argparser.add_argument(
'--version',
action='version',
version=f'%(prog)s version {__version__}',
default=None,
help='Prints version of program'
)
argparser.add_argument(
'--verbose',
action='store_true',
help='Prints all logs in stdout'
)
argparser.add_argument(
'--date',
type=str,
help='It should take a date in YYYYmmdd format. For example: --date 20191020'
'The new from the specified day will be printed out. If the news are not found error will be returned.'
)
argparser.add_argument(
'--to-pdf',
dest='to_pdf',
type=str,
help='It should take the path of the directory where new PDF file will be saved'
)
argparser.add_argument(
'--to-html',
dest='to_html',
type=str,
help='It should take the path of the directory where new HTML file will be saved'
)
argparser.add_argument(
'--colorize',
action='store_true',
help='Prints the result of the utility in colorized mode'
)
args = argparser.parse_args()
return args

def get_args(self):
""" Returns arguments """
return self.args
Binary file added app/fonts/NotoSans-Black.cw127.pkl
Binary file not shown.
Binary file added app/fonts/NotoSans-Black.pkl
Binary file not shown.
Binary file added app/fonts/NotoSans-Black.ttf
Binary file not shown.
Binary file added app/fonts/NotoSans-Thin.cw127.pkl
Binary file not shown.
Binary file added app/fonts/NotoSans-Thin.pkl
Binary file not shown.
Binary file added app/fonts/NotoSans-Thin.ttf
Binary file not shown.
Loading