Commit graph

367 commits

Author SHA1 Message Date
Andrew Kvalheim
5ed751a8c6 Skip fetching context of private posts
Context fetching is performed without authentication, so it is only
possible for public and unlisted posts.
2024-07-01 18:21:57 -07:00
nanos
c58c5b5af0 use sha hashes to cache file names 2024-07-01 20:06:51 +01:00
nanos
e85384a5a6 name collision (fixes #134) 2024-06-28 12:46:12 +01:00
nanos
6c1ec2f1c5 fix logs 2024-06-28 09:08:59 +01:00
nanos
3639878df0 Update version 2024-06-28 08:50:32 +01:00
nanos
80c8937a88 Improve docs 2024-06-28 08:49:54 +01:00
Michael
5e6aa2bd66
Merge pull request #133 from nanos/lists
Support for fetching lists
2024-06-28 08:47:42 +01:00
nanos
42774c5195 documentation update 2024-06-28 08:47:21 +01:00
nanos
fd615cad15 Support for fetching lists 2024-06-28 08:22:34 +01:00
nanos
e7da9a1f61 Fix bug 2024-06-27 17:14:41 +01:00
Michael
e0faafb37a
Merge pull request #130 from nanos/cache-robots-on-disk
Cache robots.txt for 24 hours on disk to reduce load on servers
2024-06-27 16:46:01 +01:00
Michael
009fbe54b4
Merge pull request #131 from nanos/no-bot
Do not backfill users that have opted out
2024-06-27 09:18:20 +01:00
nanos
d2a14f687a log what's happening 2024-06-27 09:18:06 +01:00
nanos
aa589670eb Do not backfill users that have opted out of indexing 2024-06-27 09:16:59 +01:00
nanos
40b624aaff update version 2024-06-27 07:56:50 +01:00
nanos
90988872b7 fix 2024-06-26 16:45:30 +01:00
nanos
7e8ca17640 Cache robots.txt for 24 hours on disk to reduce load on servers 2024-06-26 16:41:51 +01:00
nanos
3651d028a6 update version number 2024-06-25 16:36:01 +01:00
nanos
01a2719918 shorten http timeout for robots.txt fetch 2024-06-25 16:32:47 +01:00
Michael
dec718db76
Merge pull request #129 from nanos/cache-robots
Cache robots.txt for each run of the script, to reduce load on the server
2024-06-25 16:25:26 +01:00
nanos
7b9896b5c0 Cache robots.txt 2024-06-25 16:24:37 +01:00
Michael
ac8044db83
Merge pull request #128 from nanos/user-agent
User FediFetcher as User Agent to fetch the robots.txt
2024-06-25 16:16:49 +01:00
nanos
dd468d5956 User FediFetcher as User Agent to fetch the robots.txt 2024-06-25 16:15:43 +01:00
nanos
e40d61d291 update version 2024-06-25 11:00:24 +01:00
nanos
ac2b648e05 change timeout periods to never allow more than once per minute 2024-06-25 10:54:00 +01:00
Michael
de656d1e0d
Merge pull request #125 from nanos/robots
respect robots.txt
2024-06-25 10:46:22 +01:00
nanos
885b84d598 ensure callbacks aren't blocked by robtos 2024-06-25 10:38:47 +01:00
nanos
1b4c135f8f respect robots.txt 2024-06-25 10:24:45 +01:00
nanos
ed5f0ba3b4 update gitignore 2024-06-25 10:01:37 +01:00
nanos
1c7023819e try again 2024-06-25 09:03:59 +01:00
nanos
e6fd9c6b00 bug fix 2024-06-25 09:01:07 +01:00
Michael
721d2fc5bb
Merge pull request #124 from nanos/rate-limits
Rate limit fetching of context
2024-06-25 08:53:16 +01:00
nanos
120008ced0 shorten storage time 2024-06-25 08:50:27 +01:00
nanos
3278ce2f06 Rate limit fetching of context 2024-06-25 08:45:03 +01:00
nanos
f965b4f6fc stop rushing things ... 2024-06-24 17:19:27 +01:00
nanos
468e092e21 fix stupid bug 2024-06-24 17:18:02 +01:00
nanos
a23d6fe1fb Add Instance name to FediFetcher UA [fixes #122] 2024-06-24 17:13:23 +01:00
Michael
624801d0cb
Merge pull request #120 from nanos/dependabot/pip/urllib3-1.26.19
Bump urllib3 from 1.26.18 to 1.26.19
2024-06-18 07:00:24 +01:00
dependabot[bot]
9bc465c0bc
Bump urllib3 from 1.26.18 to 1.26.19
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.18 to 1.26.19.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.19/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.26.18...1.26.19)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-18 01:38:27 +00:00
Michael
df29cff634
Merge pull request #116 from nanos/dependabot/pip/requests-2.32.0
Bump requests from 2.31.0 to 2.32.0
2024-05-21 08:57:15 +01:00
dependabot[bot]
d10e60fa18
---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-21 07:56:07 +00:00
Michael
1bb8c9d2b7
Merge pull request #115 from AndrewKvalheim/log-level
Quiet urllib3 logs
2024-04-22 19:37:24 +01:00
Andrew Kvalheim
1fed215977 Quiet urllib3 logs
As described at https://github.com/urllib3/urllib3/blob/1.26.18/docs/user-guide.rst#logging
2024-04-22 11:17:38 -07:00
Michael
fe47a8a0e5
Merge pull request #114 from AndrewKvalheim/log-level
Add `--log-level` option
2024-04-22 16:31:37 +01:00
Michael
4038d2dde5
Merge pull request #110 from nanos/dependabot/pip/idna-3.7
Bump idna from 3.4 to 3.7
2024-04-22 16:24:14 +01:00
Andrew Kvalheim
c528efb1be Add --log-level option 2024-04-21 12:01:57 -07:00
dependabot[bot]
f6ef16b933
Bump idna from 3.4 to 3.7
Bumps [idna](https://github.com/kjd/idna) from 3.4 to 3.7.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst)
- [Commits](https://github.com/kjd/idna/compare/v3.4...v3.7)

---
updated-dependencies:
- dependency-name: idna
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-12 02:06:56 +00:00
Michael
0dc65a60ef
Merge pull request #105 from Tealk/main
add systemd guide
2024-03-15 08:40:37 +00:00
Tealk
cc260ddfa2 add to readme
Signed-off-by: Tealk <tealk@rollenspiel.monster>
2024-03-15 09:25:04 +01:00
Tealk
f9c2b37647 add .timer explanation
Signed-off-by: Tealk <tealk@rollenspiel.monster>
2024-03-15 09:23:00 +01:00