What is your average token usage (inference) pr day with your particular workflow ?

Sims@lemmy.ml · 4 months ago

There’s a cheap zbtlink openwrt wifi6 3000Mbps ‘z8101ax-d’ on AliE for around 50 $. (https://www.aliexpress.com/w/wholesale-zbtlink-openwrt.html?spm=a2g0o.productlist.search.0)

I don’t know how long, and haven’t tried the product, but maybe some here have tried it ?

Sims@lemmy.ml · 5 months ago

Seems the only ‘dark side’ was that he was caught…

Sims@lemmy.ml · 5 months ago

There’s imho no stupid questions regarding personal cyber-security. There are only things we don’t know yet.

Sims@lemmy.ml · 5 months ago

I’m old. Once upon a time ‘screen savers’ were used for …saving screens, I swear it’s the truth :-) I would’ve bet money on screensavers disappearing when CRT monitors did, but that certainly did not happen. It exploded. I kind of expect that someone by now have created a screensaver …plugin for another screensaver…

Not picking on you, just feeling old suddenly. I tried searching for 2024 all-time insane screensavers but only found this 13yo one from vsauce: iv.melmac.space/watch?v=zwX95UaKCRg

…but I’m curios what have happened in 13 years, so if any lurkers know better search-fu, please add…

Sharing is caring <3

Sims@lemmy.ml · 6 months ago

Removed by mod

Sims@lemmy.ml · 6 months ago

I read a comment somewhere that Stremio uploads like a normal client. Just a comment oc, but it should be easy to check for a network savvy reader. It may be that the plugin does it, dunno.

Sims@lemmy.ml · 6 months ago

You need more than a llm to do that. You need a Cognitive Architecture around the model that include RAG to store/retrieve the data. I would start with an agent network (CA) that already includes the workflow you ask for. Unfortunately I don’t have a name ready for you, but take a look here: https://github.com/slavakurilyak/awesome-ai-agents

Sims@lemmy.ml · 6 months ago

Not much help, but a quick search revealed this: https://github.com/nschlia/ffmpegfs

This seemed to be read-only tho, so not sure if it covers the use case you described. If you can program a little (AI help?) find a simple fuse filesystem in a language you know, fiddle with it and call ffmpeg or similar on receiving files.

Sims@lemmy.ml · 6 months ago

I haven’t heard that before, so I had to search a bit. This poster says Stremio uploads, but theres no docs. https://www.reddit.com/r/Stremio/comments/182cb7f/does_stremio_upload/

Sims@lemmy.ml · 6 months ago

Cheapest upgrade? Cram an old Android mobile in the USB port, and run a few services on that via Termux…

Sims@lemmy.ml · 6 months ago

Image recognition, speech2txt, txt2speech, classification and such smaller models. They are fast but have no memory worth mentioning and are heavily dependent on data access speed. Afaik, transformer based models are hugely memory bound and may not be a good match if run on these externally via Usb3.

Sims@lemmy.ml · 6 months ago

Oh, my… I followed a few subs on reddit. On lemmy I’ve only found https://lemmy.ml/c/degoogle but that’s also the biggest de-x movement afaik.

Sims@lemmy.ml · 6 months ago

Creative! Maybe an idea for a foss python library/app ? It would be lovely to have a library/app that looks at a crowdsourced list of scum corps and what counter measures exists to migrate from them to any number of free non-capitalist services !

Atmo, we have de-google, de-m$, de-crapple and so on, but it’s a huge work to de-capitalize your life. We need to make that process simpler by automating it.

Sims@lemmy.ml · 6 months ago

docs. https://docs.funkwhale.audio/index.html

Sims@lemmy.ml · 6 months ago

What is your average token usage (inference) pr day with your particular workflow ?

Sims@lemmy.ml · 7 months ago

I’m not an expert in any of this, so just wildly speculating in the middle of the night about a huge hypothetical AI-lab for 1 person:

Super high-end equipment would probably quickly eat such a budget (2-5 * H100?), but a ‘small’ rack of 20-25 normal GPU’s (p40) with 8gb+ vram, combined with a local petals.dev setup, would be my quick choice.

However, it’s hard to compete with the cloud on power efficiency, so the setup would quickly expend all future power expenses. All non-sensitive traffic should probably go to something like groq cloud, and the rest on private servers.

An alternative solution is to go for a Npu setup (tpu,lpu, whatnotpu), and/or even a small power generator (wind, solar, digester/burner) to drive it. A cluster of 50 Opi5b (rk3588) 32gbram is within budget (50*6, 300Tops in theory, running with 1.6tb ram on 500w.). Afaik, the underlying software stack isn’t there yet for small npu’s, but more and more frameworks other than cuda pops up (cuda, rocm, metal, opencl, vulkan, ?) so one for Npu’s will probably pop up soon.

Transformers use multiplications a lot, but bitnet doesn’t (only addition), so perhaps models will move to a less power intensive hardware and model frameworks in the future?

Last on my mind atmo: You would probably also not spend all money on inference/training compute. Any descent cognitive architecture around a model (agent networks) need support functions. Tool servers, homeserved software for agents (fora/communication, scraping, modelling, codetesting, statistics etc). Basically versions of the tools we our selves use for different projects and communication/cooperation in an organization.

Sims@lemmy.ml · 7 months ago

Didn’t know it, so from readme: Immich, a high performance self-hosted photo and video management solution.

Sims@lemmy.ml · 9 months ago

Hm, I would think users could get good value out grouping search subject and selecting the best engines for their need, and receive a good spread of results from a single search.

…also, our upcoming swarm of personal AI’s might benefit from such a selfhosted search service.

Sims@lemmy.ml · 9 months ago

Not sure what the difference is to an Android ‘tv stick’, but tv sticks cost around 20$ (converted in head from danish kr) on aliexpress. Example: https://www.aliexpress.com/item/1005006381266184.html?spm=a2g0o.productlist.main.39.ce819Szc9Szcrc&algo_pvid=78295de8-770e-47db-af24-96fd92ab26e2&algo_exp_id=78295de8-770e-47db-af24-96fd92ab26e2-19&pdp_npi=4%40dis!DKK!359.58!125.85!!!52.80!18.48!%402103252e17052172444052693eb007!12000036967194627!sea!DK!123519862!&curPageLogUid=dTrPgOd3FrQu&utparam-url=scene%3Asearch|query_from%3A

Sims@lemmy.ml · 9 months ago

I wonder how many peeps blocked you before they reached line 3… :-)

Sims@lemmy.ml · 10 months ago

If low on hw then look into petals or the kobold horde frameworks. Both share models in a p2p fashion afaik.

Petals at least, lets you create private networks, so you could host some of a model on your 24/7 server, some on your laptop CPU and the rest on your laptop GPU - as an example.

Haven’t tried tho, so good luck ;)

Sims

What is your average token usage (inference) pr day with your particular workflow ?

What is your average token usage (inference) pr day with your particular workflow ?