Skip to content

Conversation

@debbiedub
Copy link

@debbiedub debbiedub commented Oct 11, 2022

Changes to search for and find USKs instead of SSKs and to handle USKs in the database.

The most important changes are:

  • Change the database to store editions separate from the URI (USK have edition 0 in the database). This makes different editions of the same page appear as one in the database.
  • Add a last fetched field in the database.
  • Change the indexes to separate between NEW and NEW_EDITION. Also separate PROCESSED_KSKs, and PROCESSED_USKs from DONE. The NEW_EDITION index is ordered on last fetched instead of last handled.
  • Control fetching from NEW, NEW_EDITION and FAILED queues separately.
  • Change the packing format of TermEntry sent to Library to allow sending USKs. This requires a corresponding fix in the plugin-Library that receives these TermEntrys.
  • Change all SSK links that could be USKs into USKs before entered into the database.
  • Catch redirect of a fetch if it is a USK with new edition to just add it to the queue again. For NEW_EDITIONs this will mean in at the top, for NEW it will mean at the end.
  • Subscribe to new USKs for the USKs found.
  • Show and info file on the Spider GUI.

The purpose is to have newly found pages fetched before attempting to
fetch all failed pages again.
This is an attempt that was not working.
The change of the page affects the list of pages and might confuse the
iterator.
USKs treated as USKs, ...
Each USK is now stored once in the database and not once per edition.
@debbiedub debbiedub changed the title Search for usks Search for usks - INCOMPATIBLE CHANGE REQUIRES MATCHING UPDATE IN plugin-Library Oct 11, 2022
@ArneBab
Copy link
Contributor

ArneBab commented Nov 27, 2022

Thank you! I’ll try to get this into 1496 and to create test-jars till then so interested people can test this before it is pushed to all users.

@ArneBab
Copy link
Contributor

ArneBab commented Nov 27, 2022

@Juiceman
Copy link
Contributor

Can I help get this merged somehow? This needs the plugin-Library merged first... Is the Library backwards compatible with existing indexes or do both need to go in at the same time?

@debbiedub
Copy link
Author

The changes I have made in plugin-Library does not affect using the library to find things in any of the indices and the indices are stored in the same way in freenet/hyphanet.

The problem is for nodes that run the combination of plugin-Spider and plugin-Library to create the index. The communication between the two plugins is changed so that the current version of plugin-Library (without my fixes) cannot receive the information found by plugin-Spider with this PR.

@ArneBab
Copy link
Contributor

ArneBab commented Jul 23, 2024

That is much less risky to release then — because the unchanged Spider + Library are broken already after 3-4 updates, so there’s no new breakage — thank you!

Because the changes to library are huge.

Where are the freenet.copied packages in Library copied from? Are they from the fred sourcetree?

@debbiedub
Copy link
Author

The freenet.copied are from the fred source tree.

One alternative is to not merge my changes to plugin-Library into plugin-Library, and as a consequence let plugin-Library be the plugin that is used in almost every node to read the index and nothing else. Eventually, the old features to create the index can be removed. The functions that actually create the index could then either be a stand-alone repo, or be merged into plugin-Spider since they will always be used together anyway.

One of the problems with this is that the implementation of the code to maintain the B-trees is shared between the reading of the index and the writing of the index. When I made the restructuring of plugin-Library, I split it into src, shared, and updater where shared was shared both between reading and writing but also between running as part of the plugin (plugin-Library) and outside the node. Unless this part is factored out of the plugin, as I have done by moving it to shared, there will be multiple implementations of this.

@debbiedub
Copy link
Author

I am now thinking like this:

| Function            | Current main   | Debbies current solution  | Suggestion                     | Comment                              |
|---------------------+----------------+---------------------------+--------------------------------+--------------------------------------|
| Reading (a plugin)  | plugin-Library | plugin-Library (src)      | plugin-Library (src)           |                                      |
| Shared              | plugin-Library | plugin-Library (shared)   | plugin-Library (shared)        | Shared between reading and creating  |
| Creating            | plugin-Library | plugin-Library (uploader) | plugin-Spider (TBD)            | Compile and run dependency to Shared |
| Tools               |                | plugin-Library (uploader) | plugin-Spider (TBD)            | Tools to maintain the index          |
| Crawling (a plugin) | plugin-Spider  | plugin-Spider             | plugin-Spider (plugins.Spider) |                                      |

The suggested structure would make a lot more sense than my current approach that was started with the ambition not to modify plugin-Spider.

@debbiedub
Copy link
Author

Another thought. I now question the need to shuffle around the shared stuff. Maybe it could be made to work creating the index without actually modifying the plugin-Library or with minor refactoring changes. That would reduce the amount of needed work in plugin-Spider to almost nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants