pointless

  • 0 Posts
  • 20 Comments
Joined 1 year ago
cake
Cake day: June 23rd, 2023

help-circle





  • Another vote for Tesseract – just to clarify the terminology, though: PDF is a fragile format best used read-only; so you really don’t want to edit a pdf, but make a new one using the same (or cleaned-up) bitmaps and a new ocr text layer.

    Now, tesseract is excellent at recognizing glyphs; but especially if the scanned image is a little fuzzy, the layout detection falters; and when it falters, you get redundant line breaks, & chunks of text in the wrong order – all of which gets incredibly annoying for searching & copying purposes. So if you can spare the time, and the text requires it, you may need to mark regions (paragraphs & titles mainly) on the bitmap image manually. There exist a few frontends to Tesseract that help with a task like that; check out, e.g., https://github.com/manisandro/gImageReader - inside single paragraph blocks of text, Tesseract doesn’t get as easily confused; and the text output is in the correct reading order, & w/o redundant breaks.


  • I have a little extension of my own that just sends out selections from the <head> tag from a tab open on Firefox to my database; I haven’t been able to figure out how to add that to any collection — neither do I want to, because it’s of no use to anyone but me, as the ‘database’ in question is just postgrest running on my home router; so I don’t want to make this extension public. So for now I’m using HTTPShortcuts on Android for a similar purpose; though it can only send out a url from a ‘share’ option under Firefox.




  • Not sure what the question is – are you looking to port extensions over yourself, or are you just exclaiming, “it can’t be so hard, so why won’t someone do it!”.

    There’s plenty of documentation over at MDN as to writing extensions, writing cross-browser extensions, porting mv2 firefox extensions over to mv3, the differences between Firefox’s mv3 implementation, and that found in Chrome, etc. etc. etc. The following are good starting points: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions & https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Build_a_cross_browser_extension

    For ground-level, basic stuff (managing a popup, communicating between popup & a ‘background’ script, between content loaded on the browser & your scripts, managing a context menu, etc.) writing an extension is straightforward once you develop some degree of understanding of the sometimes convoluted paths the data needs to take, the permissions you need to have in order to pass messages through, etc. Larger extensions are full fledged applications in their own right, though, so tackling them introduces difficulties of a different order of magnitude.

    The Falkon browser is extensible (in its own way) through QML; and the Nyxt browser is extensible in common lisp. These aren’t ‘webextensions’ in the precise sense of the term, though they could be just as useful. I wrote a basic bookmark manager that I use mainly on Firefox; but I ported its core functionality (just send the current page’s title, url, & selections from the &lt;head> tag over to my database (postgresql via the postgrest http frontend, to which I just make a fetch request)) to QML, and it was pretty straightforward. Falkon is based on Qt’s QtWebEngine, which is Chromium-based; Nyxt is based on WebKit.

    edit: There’s also luakit and qutebrowser . The former is extensible via lua 5.1 scripts, the latter, python; there isn’t a wealth of documentation & examples, though (at least there wasn’t last time I checked) so the API can be a bit of a mystery. Luakit as webkit as its engine, qutebrowser is built on QtWebEngine just like Falkon.


  • Yeah, I’m really happy with my Leopold which I’ve been using for the past 3 months. I used to have Unicomp before that; and while the typing feel was a little better than the brown switches I currently have on the Leopold, its build quality was lower, and eventually it just died on me thanks to what I later found out was a notoriously failure-prone controller they used to use back then. I’m told that Unicomp’s build quality has improved a lot since then.

    … though the frustrating thing is that I was able to get the Unicomp only because I was living in the US at the time; and the Leopold I got thanks to relatives in S. Korea. Where I live, ‘mechanical keyboard’ is treated like a synonym for ‘gamer keyboard’, and all the BS associated with that.

    So excellent off the shelf brands exist, though one has to do some local research first.




  • The pdf standard is open, though criminally bloated. Their pdf software (‘pro’ as well as the freemium ‘reader’ which looks like adware nowadays) is used only because it’s the most lenient with respect to files barely complying with the ‘standard’ – which includes things like application forms from government agencies.

    … that is, if they can be said to ‘own’ the pdf format, it’s only because they smeared it all over with their shit. A bit like how hippos mark their territory, I guess.