#1 Installation with pip and more bugs

开启中
CyberTailor 请求将 26 次代码提交从 CyberTailor/master 合并至 tinyrabbit/master
共有 10 个文件被更改,包括 136 次插入307 次删除
  1. 7 2
      .gitignore
  2. 1 0
      MANIFEST.in
  3. 45 21
      README.md
  4. 0 27
      URLHelper.py
  5. 0 120
      antennaDB.py
  6. 0 98
      customFilters.py
  7. 0 39
      feedqueuetool.py
  8. 83 0
      gemini_antenna/URLHelper.py
  9. 0 0
      gemini_antenna/__init__.py
  10. 0 0
      gemini_antenna/cgi/__init__.py

+ 7 - 2
.gitignore

@@ -1,5 +1,10 @@
-antenna.sqlite
+.coverage
 __pycache__
-blocklist.txt
 antenna.log
+antenna.sqlite
+blocklist.txt
+build/
 customfilters/*
+dist/
+gemini_antenna.dist-info/
+gemini_antenna.egg-info/

+ 1 - 0
MANIFEST.in

@@ -0,0 +1 @@
+include tests/*

+ 45 - 21
README.md

@@ -26,13 +26,13 @@ I believe that a better way to aid discoverability in any community is to let pu
 
 ## Preparation
 
-Antenna runs on python3 and mostly uses modules available in core. Two exceptions I remember are `feedparser` and `sqlite3`. Please tell me if you try to run this and run into any undocumented requirements.
+Antenna runs on python3 and mostly uses modules available in core. Two exceptions I remember are `feedparser` and `gemcall`. Please tell me if you try to run this and run into any undocumented requirements.
 
 The current code base makes a few assumptions that may or may not be true for your system:
 
 * That there are two folders in the same directory named `antenna` and `public_gemini`, respectively.
-* That the user which runs the `queuefeeds.py` script has both read and write access to the SQLite3 database and the folder `antenna`, which the database file will be in.
-* That there are files `about.gmi`, `log`, and `submit` in the `public_gemini` folder, because the generated page will link to them.
+* That the user which runs `antenna-submit` and `antenna-filter` CGI scripts has both read and write access to the SQLite3 database and the folder `antenna`, which the database file will be in.
+* That there are files `about.gmi`, `log`, `filter` and `submit` in the `public_gemini` folder, because the generated page will link to them.
 
 My setup is a useful referense. It looks like this:
 
@@ -43,54 +43,78 @@ My setup is a useful referense. It looks like this:
     |       |
     |       +--- README.md
     |       +--- LICENSE
-    |       +--- antennaDB.py
+    |       +--- db.py
     |       +--- ingestfeeds-wrapper.sh
-    |       +--- queuefeed.py
-    |       +--- ingestfeeds.py
     |       +--- antenna.log
     |       +--- antenna.sqlite
     |       +--- blocklist.txt
     |
     +--- public_gemini/
             |
+            +--- cgi-bin/
+                    |
+                    +--- filter
+                    +--- submit
+                    +--- log
             +--- index.gmi
             +--- about.gmi
-            +--- submit
-            +--- log
 ```
 
-Nothing outside of `~antenna/public_gemini/` is publicly reachable. The file `about.gmi` is a handwritten file about my instance. The `index.gmi` file is generated by Antenna. The two scripts `log` and `submit` change the working directory to ~antenna/antenna/ and then runs `tail -n 50 antenna.log` and `./queuefeed.py` respectively.
+Nothing outside of `~antenna/public_gemini/` is publicly reachable. The file `about.gmi` is a handwritten file about your instance. The `index.gmi` file is generated by Antenna. The scripts `log`, `feed` and `submit` change the working directory to ~antenna/antenna/ and then run `tail -n 50 antenna.log`, `antenna-filter` and `antenna-submit` respectively.
 
 ## Installation
 
-Clone this repo:
+[![Packaging status](https://repology.org/badge/vertical-allrepos/gemini-antenna.svg)](https://repology.org/project/gemini-antenna/versions)
 
-```
-git clone https://notabug.org/tinyrabbit/gemini-antenna.git antenna
+```sh
+# install from this repository
+pip install git+https://notabug.org/tinyrabbit/gemini-antenna.git#egg=gemini-antenna
+# or use unofficial package on PyPI
+pip install antenna
 ```
 
-Enter the catalogue and create a database:
+Create a database in the `antenna` directory:
 
 ```
 cd antenna
 python3
-> import antennaDB
-> antennaDB.AntennaDB.createDB()
+> from gemini_antenna import db
+> db.AntennaDB.createDB()
 ```
 
-Make sure that the user that executes the `queuefeeds.py` script has read and write permissions to the directory `antenna` as well as `antenna/antenna.sqlite`.
+Make sure that the user that executes `antenna-submit` and `antenna-filter` scripts has read and write permissions to the directory `antenna` as well as `antenna/antenna.sqlite`.
 
-Create a cron job that runs the ingestfeeds.py via the wrapper (substitute for whichever user should run the ingest job, and which directory the `ingestfeeds-wrapper.sh` is in):
+Create a cron job that runs `antenna refresh` via the wrapper (substitute for whichever user should run the ingest job, and which directory the `ingestfeeds-wrapper.sh` is in):
 
-```
-sudo echo "*/10 * * * * antenna /home/antenna/antenna/ingestfeeds-wrapper.sh" > /etc/cron.d/antenna-ingestion
+```sh
+echo "*/10 * * * * antenna /home/antenna/antenna/ingestfeeds-wrapper.sh" | sudo tee /etc/cron.d/antenna-ingestion
 ```
 
 If there are any specific domains you'd like to block from publishing to your Antenna, please fill them in (one on each line) in a file named `blocklist.txt`, one URL per line and all starting with `gemini://` or other scheme and separator, depending on what you'd like to block.
 
+### Example server configuration (gmid)
+
+Custom database location can be set using `ANTENNA_DATAROOT` env variable.
+
+```gmid.conf
+# old aliases
+location "/antenna/log" {
+	block return 31 "/antenna/cgi-bin/log"
+}
+location "/antenna/filter" {
+	block return 31 "/antenna/cgi-bin/filter"
+}
+location "/antenna/submit" {
+	block return 31 "/antenna/cgi-bin/submit"
+}
+
+location /antenna {
+	root "/home/antenna/public_gemini"
+	cgi "cgi-bin/*"
+}
+```
+
 ## Contributing
 
 * Be kind, humble and open-minded in discussions.
 * Send pull requests or submit issues here, or contact me directly with suggestions/feedback/thoughts/bug reports: bjorn.warmedal@gmail.com or ew0k@tilde.team.
-
-

+ 0 - 27
URLHelper.py

@@ -1,27 +0,0 @@
-#!/usr/bin/env python3
-# vim: tabstop=4 shiftwidth=4 expandtab
-
-import re
-from os.path import exists
-
-class URLHelper():
-
-    def __init__(self, blocklist = "blocklist.txt"):
-        self.blockrules = []
-        if exists(blocklist):
-            blockfile = open(blocklist, "r")
-            self.blockrules = set(blockfile.read().split("\n"))
-            self.blockrules.remove("")
-            blockfile.close()
-
-    def isBlocked(self, url):
-        for rule in self.blockrules:
-            if url.startswith(rule):
-                return True
-        return False
-
-    # Naive URL validation
-    def mightBeAURL(self, url):
-        pattern = '^[\w]+://[^/]+\.[^/]+.*'
-        return True if re.match(pattern, url) else False
-

+ 0 - 120
antennaDB.py

@@ -1,120 +0,0 @@
-#!/usr/bin/env python3
-# vim: tabstop=4 shiftwidth=4 expandtab
-
-import sqlite3
-import os
-from multiFeedParsing import FeedEntry,TwtxtEntry
-
-# There's a lot of opening and closing going on here,
-# because several processes will be sharing one sqlite3
-# db, which is just a single file. We want to hog it
-# as little as possible to minimize the risk of
-# collisions. Some errors are tolerable; this is a
-# good enough effort.
-class AntennaDB():
-
-    def __init__(self, dbPath="antenna.sqlite"):
-        self.dbPath = dbPath
-
-    def createDB(self):
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.execute("CREATE TABLE IF NOT EXISTS feedqueue (url text)")
-        cursor.execute("CREATE TABLE IF NOT EXISTS entries (feedurl text, author text, updated datetime, title text, link text primary key)")
-        cursor.execute("CREATE TABLE IF NOT EXISTS twtxt (feedurl text, author text, posted datetime, twt text)")
-        connection.close()
-
-    def queueFeed(self, urls):
-        urlTuples = []
-        for url in urls:
-            urlTuples.append((url,))
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.executemany("INSERT INTO feedqueue (url) VALUES (?)", urlTuples)
-        connection.commit()
-        connection.close()
-
-    def getQueue(self):
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.execute("SELECT * FROM feedqueue")
-        results = []
-        for result in cursor.fetchall():
-            results.append(result[0])
-        connection.close()
-        return results
-
-    def getEntries(self):
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.execute("SELECT feedurl, author, updated, title, link FROM entries ORDER BY updated DESC")
-        results = []
-        for result in cursor.fetchall():
-            results.append(FeedEntry(feedurl = result[0], author = result[1], updated = result[2], title = result[3], link = result[4]))
-        connection.close()
-        return results
-
-    def getTwts(self):
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.execute("SELECT feedurl, author, posted, twt FROM twtxt ORDER BY posted DESC")
-        results = []
-        for result in cursor.fetchall():
-            results.append(TwtxtEntry(feedurl = result[0], author = result[1], posted = result[2], twt = result[3]))
-        connection.close()
-        return results
-
-    def deleteFeeds(self, urls):
-        urlTuples = []
-        for url in urls:
-            urlTuples.append((url,))
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.executemany("DELETE FROM entries WHERE feedurl LIKE ?", urlTuples)
-        cursor.executemany("DELETE FROM twtxt WHERE feedurl LIKE ?", urlTuples)
-        connection.commit()
-        connection.close()
-
-    def deleteFromQueue(self, urls):
-        urlTuples = []
-        for url in urls:
-            urlTuples.append((url,))
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.executemany("DELETE FROM feedqueue WHERE url LIKE ?", urlTuples)
-        connection.commit()
-        connection.close()
-
-    # UPSERTs entries into the DB, if they're not too old. Returns how many entries were upserted.
-    def insertFeedEntries(self, entries, limit=0):
-        entries = [e for e in entries if e.updated > limit]
-        entrytuples = []
-        for entry in entries:
-            entrytuples.append((entry.feedurl, entry.author, entry.updated, entry.title, entry.link))
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.executemany("INSERT INTO entries (feedurl, author, updated, title, link) VALUES (?,?,?,?,?) ON CONFLICT (link) DO UPDATE SET author = excluded.author, updated = excluded.updated, title = excluded.title", entrytuples)
-        connection.commit()
-        connection.close()
-        return len(entries)
-
-    def insertTwtxtEntries(self, entries, limit=0):
-        entries = [e for e in entries if e.posted > limit]
-        entrytuples = []
-        for entry in entries:
-            entrytuples.append((entry.feedurl, entry.author, entry.posted, entry.twt))
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.executemany("INSERT INTO twtxt (feedurl, author, posted, twt) VALUES (?,?,?,?)", entrytuples)
-        connection.commit()
-        connection.close()
-        return len(entries)
-
-    def pruneDB(self, limit):
-        connection = sqlite3.connect(self.dbPath)
-        cursor = connection.cursor()
-        cursor.execute("DELETE FROM entries WHERE updated < ?", (limit,))
-        cursor.execute("DELETE FROM twtxt WHERE posted < ?", (limit,))
-        connection.commit()
-        connection.close()
-

+ 0 - 98
customFilters.py

@@ -1,98 +0,0 @@
-#!/usr/bin/env python3
-# vim: tabstop=4 expandtab
-
-import datetime, time
-from urllib.parse import unquote
-from os import getenv
-from os.path import exists
-import string
-import random
-import URLHelper
-import antennaDB
-import signoffs
-
-def randomFileName():
-    return ''.join(random.choice(string.ascii_letters) for x in range(32))
-
-filterFile = getenv('PATH_INFO')
-filterFilePathParts = filterFile.split("/")
-if "" in filterFilePathParts:
-    filterFilePathParts.remove("")
-db = antennaDB.AntennaDB()
-lastRead = "Last Read:" + str(int(time.mktime(datetime.datetime.utcnow().utctimetuple()))) + "\n"
-
-if not filterFile or not exists("customfilters/"+filterFilePathParts[0]):
-    if not getenv('QUERY_STRING') or not getenv('QUERY_STRING').lower() in ["y","ye","yes"]:
-        print("10 Type 'y' if you want to create a filter, otherwise go back.\r")
-    else:
-        filterFile = randomFileName()
-        while exists("customfilters/"+filterFile):
-            filterFile = randomFileName()
-        with open("customfilters/"+filterFile, "w") as f:
-            f.write(lastRead)
-        print("30 gemini://"+getenv('SERVER_NAME')+getenv('SCRIPT_NAME')+"/"+filterFile+"\r")
-
-else:
-    f = open("customfilters/"+filterFilePathParts[0], "r")
-    rules = f.read().split("\n")
-    f.close()
-
-    f = open("customfilters/"+filterFilePathParts[0], "w")
-
-    rules = list(filter(("").__ne__, rules))
-    rules.pop(0)
-    affectedRule = getenv('QUERY_STRING')
-
-    if filterFilePathParts[-1] == "add":
-        if not affectedRule:
-            print("10 Rule to add:\r\n")
-        else:
-            rules.insert(0,unquote(affectedRule))
-            print("30 gemini://"+getenv('SERVER_NAME')+getenv('SCRIPT_NAME')+"/"+filterFilePathParts[0]+"\r\n")
-
-    elif filterFilePathParts[-1] == "remove":
-        if not affectedRule or not affectedRule in rules:
-            print("10 Rule to remove:\r\n")
-        else:
-            rules.remove(affectedRule)
-            print("30 gemini://"+getenv('SERVER_NAME')+getenv('SCRIPT_NAME')+"/"+filterFilePathParts[0]+"\r\n")
-
-    elif filterFilePathParts[-1] == "read":
-        print("20 text/gemini\r")
-        print("# Your Filter Rules\n")
-        if not rules:
-            print("Bookmark this URL. It's your personal filter. It'll be here forever if you check it often, but if you don't check it in 90 days it'll be removed.\n\nYou should add some rules here! The way it works is that any link on Antenna that starts with a matching rule is removed from this view.\n\nFor example: the rule \"gemini://rockstar\" will remove every link starting with that, including \"gemini://rockstar.com/posts/1\" or \"gemini://rockstarfamily.org\"\n")
-        for rule in rules:
-            print("=> remove?"+rule+" "+rule+" (Click to remove)")
-        print("")
-        feedURLSet = set()
-        for entry in db.getEntries():
-            if entry.feedurl in rules:
-                continue
-            feedURLSet.add(entry.feedurl)
-        for feedURL in feedURLSet:
-            print("=> add?"+feedURL+" Click to block '"+feedURL+"'")
-        print("\n=> "+getenv('SCRIPT_NAME')+"/"+filterFilePathParts[0]+"/add Click to add custom rule")
-
-    else: # This is where we end up when we just want to read the feed.
-        print("20 text/gemini\r")
-        print("=> "+getenv('SCRIPT_NAME')+"/"+filterFilePathParts[0]+"/read Configure your filter.\n")
-        print("# Your Filtered Feed")
-        datestamp = "0000-00-00"
-        for entry in db.getEntries():
-            blocked = False
-            for rule in rules:
-                blocked = entry.link.startswith(rule) or entry.feedurl.startswith(rule)
-                if blocked:
-                    break
-            if blocked:
-                continue
-            timestamp = datetime.datetime.utcfromtimestamp(entry.updated).strftime('%Y-%m-%d')
-            if not datestamp == timestamp:
-                datestamp = timestamp
-                print("")
-            print("=> " + entry.link + " " + timestamp + " " + entry.author + ": " + entry.title)
-
-        print("\n> " + signoffs.getsig())
-    f.write(lastRead + "\n".join(rules) + "\n")
-    f.close()

+ 0 - 39
feedqueuetool.py

@@ -1,39 +0,0 @@
-#!/usr/bin/env python3
-# vim: tabstop=4 expandtab
-
-import antennaDB
-import sys
-
-db = antennaDB.AntennaDB()
-progname = sys.argv.pop(0)
-mode = sys.argv.pop(0)
-
-def printqueue(dummy=None):
-    print('\n'.join(db.getQueue()))
-
-modes = {
-    'list': printqueue,
-    'delete': db.deleteFromQueue,
-    'add': db.queueFeed,
-}
-
-if mode in modes.keys():
-    modes[mode](sys.argv)
-else:
-    print(f'''CLI tool for easy handling of the feedqueue table in the Antenna database.
-
-Usage: {progname} [list | [ delete | add ] URLs]
-
-    list          Lists all URLs in the table.
-
-    delete URLs   Deletes all the given URLs from the table.
-
-    add URLs      Bulk addition of feeds to queue.
-
-Examples:
-
-    {progname} list
-
-    {progname} delete gemini://some.feed/atom another.info/feed and.athird.one
-''')
-

+ 83 - 0
gemini_antenna/URLHelper.py

@@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+# vim: tabstop=4 shiftwidth=4 expandtab
+
+import re
+import urllib.parse
+from pathlib import Path, PosixPath
+
+class URLHelper():
+
+    def __init__(self, blocklist: str = "blocklist.txt"):
+        self.blockrules: set = set()
+        if not Path(blocklist).exists():
+            return
+
+        with open(blocklist) as blockfile:
+            self.blockrules = set(blockfile.read().split("\n")) - {""}
+
+    def isBlocked(self, url) -> bool:
+        """
+        Check whether a URL is blocked by the rules.
+        This method calls :meth:`~URLHelper.resolve`.
+        """
+        url = self.resolve(url)
+        for rule in self.blockrules:
+            if url.startswith(rule):
+                return True
+        return False
+
+    @classmethod
+    def mightBeAURL(cls, url: str) -> bool:
+        """
+        Naive URL validation.
+
+        >>> URLHelper.mightBeAURL("gemini://example.com/feed")
+        True
+        >>> URLHelper.mightBeAURL("my feed")
+        False
+        """
+        pattern = r'^[\w]+://[^/]+\.[^/]+.*'
+        return bool(re.match(pattern, url))
+
+    @classmethod
+    def correct(cls, url: str) -> str:
+        """
+        Unquote a URL and add gemini:// scheme if needed.
+
+        >>> URLHelper.correct("example.com/my%20feed")
+        'gemini://example.com/my feed'
+        """
+        url = urllib.parse.unquote(url)
+
+        if not re.findall(r'^[\w:]*//', url):
+            url = "gemini://" + url
+        elif not urllib.parse.urlparse(url).netloc:
+            url = "gemini:" + url
+
+        return url
+
+    @classmethod
+    def resolve(cls, url: str) -> str:
+        """
+        Resolve relative paths in URLs.
+        This method calls :meth:`~URLHelper.correct` beforehand.
+
+        >>> URLHelper.resolve("gemini://example.com/1/../2")
+        'gemini://example.com/2'
+        """
+        url = urllib.parse.urlparse(cls.correct(url))
+
+        if not url.path:
+            path = ""
+        elif not url.path.startswith("/"):
+            raise ValueError("Not an absoulute URL")
+        else:
+            path = str(PosixPath(url.path).resolve())
+            # restore lost trailing slash
+            if url.path.endswith("/"):
+                path += "/"
+        return urllib.parse.urlunparse(url._replace(path=path))
+
+if __name__ == "__main__":
+    import doctest
+    doctest.testmod()

+ 0 - 0
gemini_antenna/__init__.py


+ 0 - 0
gemini_antenna/cgi/__init__.py


部分文件因为文件数量过多而无法显示