Buy A Modem

Show HN: Clor – give your agent claws

At my last job I spent a year building an agentic coding platform used by hundreds of thousands of people. Along the way I tried building a hosting service on OpenClaw, and also ran Hermes myself for a while. Both projects have some great feature ideas, but when I tried to use them for real work they failed more often than not, and their security models worried me. I just couldn't see either one becoming something I'd trust enough for myself/friends/family. After a lot of exp

Show HN: OMT – A simple Python CLI for testing local Ollama models

Selecting the "best" local model usually depends on the task and the hardware.I created this script as an easy way to test local Ollama models and keep the test output organized.When you run the script interactively, it asks which model you want to use, what your prompt is, how many times you want to run it, and (optional) the temperature you'd like to set. It can also be scripted with command-line flags.The output is saved in Markdown/JSON within an organized file structure

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

Hi HN, we’re Nick and Drew, and we’re building boxes.dev – the first cloud-only agentic dev environment (ADE) that gives every Codex and Claude Code agent its own cloud computer.We’re two engineers who previously built Gem (co-founder/CTO and first hire), and we spent the last year coding almost exclusively using Codex and Claude Code. It’s been a huge change to how we code, and it’s been exhilarating seeing the models keep getting better – but we eventually realized that developing on loca

Launch HN: Hyper (YC P26) – Company brain to power agentic development

Hey HN, we’re Shalin & Kanyes, best friends who've been hacking together for 10+yrs, and now founders of Hyper (https://heyhyper.ai/). Hyper is a shared “company brain” that plugs into information flowing inside a company to make AI agents and automations better and ultimately save people time.Models have gotten good enough that they can (mostly) take on long-horizon, complex tasks. We believe the bottleneck now is that these smart-enough models often lack information abo

Bad MCP design costs your agent 5x more tokens

I recently did some tests on two MCPs with identical functionalities. Turns out one of them has really bad performance. So I wanna share those bad MCP design patterns that cause this.It all started when I wrote an MCP Server (MCP-A) for a to-do list app. Later, the app officially released its own MCP Server (MCP-B). Both MCPs have the same functionalities and hit the same backend API.The experiment is set up as follows:- Both MCP Servers connect to the same ToDo list account, and it will be rese

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

Hey HN, Guanming and Bill here from General Instinct (https://general-instinct.com/).After years of working in robotics, we kept running into the same problem: the best models never fit the hardware we actually had available.The models that performed best were usually designed around datacenter assumptions: large GPUs, lots of memory bandwidth, and reliable network access. But most physical systems have the opposite constraints.That led us down the path of figuring out how much of

Show HN: Lazarus, a coding agent for long-horizon tasks

I have been interested in long-horizon coding tasks for a while, especially with benchmarks like FrontierSWE, where even the best coding agents like Codex and Claude Code struggle to complete tasks.These agents come with a collection of tools like bash, file edits, grep, glob, etc.Lazarus takes a different approach. The idea is to give the model exactly one tool: a persistent Python runtime.Model writes Python code, executes it, and receives stdout/stderr. Through Python it inspects repos,

Show HN: On-device transcriber that's 97% accurate at identifying speakers

I’ve spent the last seven months building a tool I wish I’d had in my previous roles. MimicScribe is a macOS menu bar app that fits the "AI notetaker" category. It has accurate on-device speaker identification (a first possibly?), real-time meeting talking points for discovery calls, and a fully keyboard- and voice-driven interface.I believe the accuracy of the speaker ID system is its biggest strength. I used fluid audio’s port of (https://github.com/fluidInference&#x2F

Show HN: I benchmarked LLM agents on fixing real-world security vulnerabilities

I built a benchmark with 20 real CVEs across 18 Python projects (Pillow, GitPython, yt-dlp, urllib3, etc). I've run it over 5 LLM agents (3 OpenAI, 2 poolside) and 3 different prompts (full advisory, locate, diagnose) with a total of 300 runs. The agents are tasked to fix security vulnerabilities in a sandboxed environment and they are scored against a hidden security tests from the maintainer's own fix.Best solve rate was 50%. On the other 50%, some fixes are sometimes coherent and pa

Show HN: I nerfed our coding agents on purpose

Tl;dr: I trained a classifier to route to the least expensive model and reasoning depth to complete the request. Coupling that with additional automated token efficiency techniques has yielded 3x usage for the same spend. For anyone interested in trying it themselves: https://nerfguard.comVarious teammates and I switched over to Codex from Claude Code recently. We still bounce between the tools, but Codex’s speed and steerability coupled with performance gains were hard to ignore. One

Ask HN: What happens when humans become as dumb as AI?

The existential risk that has received much attention is machines eventually becoming as smart as people, and then smarter still. What I see in the news, and, anecdotally, around me is rather the opposite. Thinking is hard. People, even those who went through rigorous university training to develop their critical thinking, are increasingly outsourcing thinking to machines. SOTA models don't need to get any better to catch up with us, they just need to wait. And maybe not even that long. I w

Show HN: I embedded 685M public texts in 32 minutes (on 8x A100, Rust, TensorRT)

Quick note on how it works and how I've done my batch embedding engine IgniteMS.The whole thing runs as one process using Rust, reading input, tokenizing, packing batches, keeping the queue full. TensorRT handles inference. Python is only as a wrapper.I built it this way because when you use more than couple of GPUs, the GPUs stop being the problem. CPU cannot feed them fast enough. One A100 can go through batches faster than Python can tokenize and feed, so the GPU just sits there idle wai

My Humble AI Market Prediction

In 12 - 18 months we get local models that match the capabilities of 4.6.The overall capabilities will peak, and we all get highly efficient code forges in a box available in everyone's home. AI the is a generator, an advanced compiler, not a place for runtime. It's best used as a powerful hammer aimed at one thing until the structure is 'built', then we put the hammer away and enjoy the spoils.These data centers are being built on the assumption that billions of people will

Show HN: Hitoku Draft – Context aware local assistant

Hi guys.I have been working on Hitoku Draft, an open-source, voice-first AI assistant that runs entirely locally. I posted about it already, and now it has also transcription with voice editing. Looking for feedback, as I found that outside tech circles other people still do not use this tech much.It's context-aware, in the sense that it reads your screen, documents, and active app to understand what you're working on. You can ask about PDFs, reply to emails, create calendar events, us

Anthropic Urges Global Pause in AI Development, Flags 'Self-Improvement' Risk

WSJ Anthropic is calling for top artificial intelligence labs to weigh slowing the pace of development, suggesting that AI systems are advancing so rapidly that they may soon be able to improve themselves without human intervention in ways that could pose significant societal risks.The ability to slow global AI development would “likely be a good thing,” the company said Thursday in a blog post that disclosed internal data documenting how quickly its most advanced models are improving.The post,

Show HN: Formally verified polygon intersection – Opus 4.8 oneshots, prev failed

To my knowledge, this is the first formally verified implementation of an intersection algorithm for polygons.The experience of working with AI agents on this project changed a lot with recent model releases, as I describe in the readme. Opus 4.8 is able to provide algorithm implementation with formal proof in one shot, whereas previous models required me to provide proof strategies in multiple steps.Trust in the correctness comes entirely from the Lean checker and human review of a small specif

Routers, Modems, and Why Knowing the Difference Is Crucial

Your modem and router are the dynamic duo you need to get online, so long as you don't mistake one for the other.

Troll Face Realism GIF

Troll Face Realism GIF

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments.I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.What it does:- Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware- Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it- Ships with an eval harness and interactive das

Show HN: Dari-docs – Optimize your docs using parallel coding agents

It’s well known at this point that documentation needs to be optimized for AI agents - we’re all pointing our Claude Code / Codex / Pi agents at documentation, and expecting the models to figure out how to implement a product.This, however, changes the entire optimization problem when writing documentation. Good documentation now becomes more objective - you are solving the very concrete problem: can a dumb harness running the dumbest model implement this reliably?Humans can typically