Buy A Modem
Show HN: Clusterflock: An AI orchestrator for networked hardware
Hi HN!We built Clusterflock to solve our own headaches with managing AI agents across distributed setups, different VRAM and RAM allowances, and the need to easily try out new models.While we focus on infrastructure (we built this specifically for networked hardware) it does ship with a powerful mission runner (or orchestrator), which is multi-session and asynchronous.Here is what it does best:Hardware-aware auto-downloading: It profiles your networked hardware and automatically pulls down the b
Ask HN: Is it still worth making "Huge" Language Models for dev tools?
I just want to ask the frontier builders and developers who are working on the flagship models a few questions. Is it still cost-efficient and worth it to keep making huge language models, when smaller, specialized models should be enough?Meaning that, when a user is working in a codebase with a certain framework, should the agent/model also know the complete chemical composition of an element, world history, and other random facts? Or should it only know the related and needed things? For
Tq-KV – Rust implementation of TurboQuant that works on GGUF models
TurboQuant came out at ICLR on March 25. We tried every available implementation on GGUF models. None of them produced usable output. Perplexity goes from 5.18 to 3,556. The model starts mixing languages mid-sentence, hallucinating citations, losing coherence entirely.
It's compound quantization error. GGUF models already have quantized weights. Quantize the KV cache on top of that, and the errors multiply through softmax. Nobody was handling this.
So we wrote our own from scratch. 13.7K li
it's not Ai if the LLM is not in control
I always thought that the frontend of "Ai" is awful, but now I know it for sure:OAI5.1+ is good, but chatgpt sucks, it doesn't have gmail integration and barely able to do anything but basic retrieval from the integrations it actually has.Opus is amazing, but claude web is mediocre at best. It has a very limited set of integrations even after 2 years, some don't even work (clay), and it uses way too many tokens to do basic stuff.XAi is ok for social queries but grok is very b
Show HN: Fabro – open-source dark software factory
Hi — I created Fabro to free myself from supervising a fleet of Claude Code tabs running in a REPL (read-eval-prompt-loop). REPLs are great for exploration, but once I know what I need I want to be able to walk away while the agents get it done.
(Before building Fabro, I looked for something off the shelf but couldn't find anything that was open source, hype-free, and full featured / ready.)Fabro helps experienced engineers evolve towards a “dark” software factory where average time be
Hybrid Attention
TLDR: Forked pytorch and triton internals . Changed attention so its linear first layer , middle quadratic layer, last linear layer
Inference got much faster with a low perplexity hit in tests .Full attention O(n²): 17.96s / 5.6 tok/sHybridAttention O(n·W + n·D): 0.35s / 286.6 tok/sI have been building a small Rust focused language model from scratch in PyTorch. This is not a finetune. It is byte level, trained from random initialization on a Rust heavy corpus assembled here:
Show HN: Build queryable packs for AI agents from videos, podcasts, and files
Hi,This started from a pretty personal use case.There was this very technical person I follow who would go live on YouTube from time to time. He has a ton of experience, and would casually drop really good insights about software architecture, engineering tradeoffs, and just general "you only learn this after years" kind of stuff. He also posts shorter clips, but I wanted something else: I wanted that knowledge to be always there, queryable whenever I needed it.At the same time, I was
Show HN: I built an open source multi-agent harness in Go
Hey HN. I built an AI agent harness over the past few months and I'm open sourcing it today.Some context on why. I've been building with Claude Code daily using this harness. It orchestrates multiple AI agents as a team, with a dashboard, chat, kanban board, the works. I used it to build a full SaaS product (MyUpMonitor, https://myupmonitor.com) in about 24 hours of focused coding.Then yesterday Anthropic announced Mythos and decided to keep it behind closed doors. Meanwhile
Show HN: 2500 vision benchmarks / evals for Vision Language Models
I love reading benchmark / eval papers. It's one of the best way to stay up-to-date with progress in Vision Language Models, and understand where they fall short.Vision tasks vary quite a lot from one to another. For example:- vision tasks that require high-level semantic understanding of the image. Models do quite well in them. Popular general benchmarks like MMMU are good for that.
- visual reasoning tasks where VLMs are given a visual puzzle (think IQ-style test). VLMs perform quite
Toilet Paper Health GIF by Bonny Fiber
Toilet Paper Health GIF by Bonny Fiber
Best modem
You can’t have a home Wi-Fi network without a reliable modem. This is why most ISPs (internet service providers) give you one. The catch is, the modem isn’t free; you’re paying for it each and every ...
Show HN: Building your first ASGI framework – step-by-step lessons
I am writing a series of lessons on building an ASGI framework from scratch. The goal is to develop a deeper understand of how frameworks like FastAPI and Starlette work.A strong motivation for doing this is because - I have been using AI to write code lately. I prompt, I get code, it works. But somewhere along the way I see I stopped caring about what is actually happening. So, this is an attempt to think beyond prompts and build deeper mental models of things we use in our day to day lives. I
AI overly affirms users asking for personal advice
<a href="https://arxiv.org/abs/2602.14270" rel="nofollow">https://arxiv.org/abs/2602.14270</a><p><a href="https://www.science.org/doi/10.1126/science.aec8352" rel="nofollow">https://www.science.org/doi/10.1126/science.aec8352</a>
Show HN: Epismo CLI – Make human-AI workflows reusable, like GitHub did for code
Hi HN, I'm Hiroki, founder of Epismo.
Just released the Epismo CLI (https://npmjs.com/package/epismo). 380+ downloads right after launch. Thank you.The problem: I got a great result in a Claude Code thread. A week later I couldn't reproduce how I got there. The real workflow lived across chat histories, tabs, tool settings, and tiny followup prompts. Prompts copy easily, but multi-step processes don't.If GitHub made code reusable and Hugging Face made models re
Show HN: Vyasa – A client-side AI writing detector (WASM, no API calls)
Now that Wikipedia has banned AI generated articles. - https://en.wikipedia.org/wiki/Wikipedia:Writing_articles_wit...I wanted to try and see if it was possible to get somewhat decent engine based on signs of AI writing from wikipedia themselves.It runs entirely in the browser via WASM. Added instructions to further add more ways to figure out as we find out more about LLMs.Would love feedback!!, especially:
- cases where it completely fails
- patterns you think are stronger
Show HN: WhatToBuy – Describe your situation, get AI-curated shopping carts
Before reading text please try the app https://www.whattobuy.app (to get great UX feedback)Shopping research is one of the most challenging tasks and people spend 30-60 min before buying an item. We developed a platform called “WhatToBuy” to save people time. In some cases shoppers are not super aware of what to really order for a trip or occasion. Our app helps them to get a range of products needed for each use-cases hence saving time and money.App workflow: Describe your situation i
Ask HN: M5 MacBook Pro buyers, worth spending the $$$ to maybe run LLMs local?
To anyone upgrading their daily driver Mac this year, are you considering going to a Max + high memory config? eg. with the hope (now or in near future) of being able to do usefully run agents/LLMs locally on your main machine?Or is the few extra thousand dollars difference between a base and max-spec MBP still just better spent on literally any other practical option (like different harware, remote hardware, cloud AI subscriptions or credits). Or wait to see if there will be an M5 Studio o
Show HN: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)
Hi guys and gals, I made a TTS model based on my highly upgraded VITS base, conditioned on external speaker embeddings (Resemble AI's Resemblyzer).The model, with ~31M parameters (ONNX), is tuned for latency and local inference, and comes already exported. I was trying to push the limits of what I could do with small, fast models. Runs 5.6x realtime on a server CPUIt supports voice cloning, voice blending (mix two or more speakers to make a new voice), the license is Apache 2.0 and it uses
Show HN: HF-agents, CLI extension to find the best model/quant for your hardware
We've been building out CLI extensions for the Hugging Face hub, and hf-agents is a fun one to share.It uses llmfit under the hood to profile your hardware and automatically select the best-fit model and quantization — no manual GGUF hunting. It then launches a Pi Agent on top of it. One command, local, fully open.If you've been using Claude Code or Codex CLI and want something that runs entirely on your own hardware/models, this is a nice lightweight alternative to try.Happy to a
Show HN: ClawMem – Open-source agent memory with SOTA local GPU retrieval
So I've been building ClawMem, an open-source context engine that gives AI coding agents persistent memory across sessions. It works with Claude Code (hooks + MCP) and OpenClaw (ContextEngine plugin + REST API), and both can share the same SQLite vault, so your CLI agent and your voice/chat agent build on the same memory without syncing anything.The retrieval architecture is a Frankenstein, which is pretty much always my process. I pulled the best parts from recent projects and researc