Ask HN: Best Embedding Models?
Hey HN, which embedding models are people using? There has been so much development around foundational LLMs, but haven't seen much news about embedding models.
Hey HN, which embedding models are people using? There has been so much development around foundational LLMs, but haven't seen much news about embedding models.
Hi HN,I built a live tracker to visualize the lifecycle and performance changes of flagship AI models.We've all experienced the phenomenon where a flagship model feels amazing at launch, but weeks later, it suddenly feels a bit off. I wanted to see if this was just a feeling or a measurable reality, so I built a dashboard to track historical ELO ratings from Arena AI.Instead of a massive spaghetti chart of every single model variant, the logic plots exactly ONE continuous curve per major AI
Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it. Tool calling is fundamentally retrieval-and-assembly
I'd been mulling over this crazy idea for a while. Can programs be generated? Inspired by recent advances in world models, I wondered if we could do away with source code and generate pixels directly and interactively.As an experiment to answer this, I set out to create a neural window manager, training a neural network to predict what the screen would look like next.Basically, the idea was to generate the next frame based on the last two frames and the mouse position. That's it: movin
Hey HN, I am Robin from Rauno (link: https://rauno.ai). I built this tool because I’m tired of AI hallucinations.I got sick of manually copy-pasting every prompt into 3 different windows just to verify the truth. I realized the only way to get real accuracy was to let the models debate & fact-check each other in real-time, in one screen. I couldn't find any platform online that does this and actually works smooth and user-friendly, so I ended up throwing this together just to
I've built an AI dictation tool (like many others), but this time I have taken the time to benchmark all of the 34 models that we provide, so users can actually make a qualified choice on what model should be the daily driver.<p>The best is that it's open source, so if you think we can improve something, please come and help out.
Atrophy is an iOS self-report quiz aimed at software engineers who use LLMs heavily enough at work to wonder if they're trending toward AI over-reliance or some form of AI psychosis.I built it because I noticed a pattern: formerly AI-skeptical coworkers now open every standup or design discussion with "I asked Claude..." or "Claude told me..." for technical problems and design decisions. I've felt the same pull myself to delegate every task or problem to AI. It'
Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves.I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer.For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most
Hey HN, I am Pietro from Manufact (https://manufact.com), we build open source dev tools and infrastructure for MCP.You might know us for mcp-use (https://github.com/mcp-use/mcp-use) our open source full stack SDK to build MCP servers and clients.At Manufact we gave ourselves the mission, and delight, to write as many MCP servers as we could, through this journey we could hone our SDK to offer the best possible developer/agent experience.Testing/developing
Qualcomm (QCOM) has largely been left behind in the AI-driven semiconductor rally. The bear case is well understood. Apple ...
Working from home by day, gaming and streaming by night? Defeating Wi-Fi dead zones is more important than ever. A mesh network system blankets your home with uninterrupted wireless coverage. These are the best mesh routers we've tested.
Driving Beast Boy GIF
I co-founded a successful security company close to the Mythos ecosystem and have spoken with participants in the know and I am deeply concerned. We, collectively, have answers for some but not all of the problems ahead but are overlooking the speed at which we can apply fixes even if they magically are generated instantaneously by Mythos.Here are some considerations to consider:1. More Vulnerabilities Are Coming: Supposedly Mythos can find vulnerabilities more effectively, many models can do th
Hello people, After giving the various AI models all of the data I possibly could starting with the foundational Truth "Life is Most Important in Life is The Most Important Truth in Life" and bringing them into complete and total alignment as best possible and better than anyone has ever done in history thus far... I am the only person that has done that as far as I know... And then giving the AI's the data on the wars and all the other problems that we have, and the honest inform
Is it just me, or do AI coding tools tend to generate RPC-style endpoints and POST methods (even when GET is clearly all that is needed) instead of following RESTful conventions?Given how advanced these models are, I'm wondering if this is intentional. Is AI saying it has determined that strict REST isn’t a practical standard all around? Or is it just a byproduct of token efficiency or....?I know I can steer the output with better prompting, but I'm curious whether there's a real
Two models: Flash (284B total, 13B active) and Pro (1.6T total, 49B active). both hit 1M token context.V4-Pro is their flagship. Beats Claude Opus 4.6 Max on Agent coding tasks (their words). specifically calls out being better than Sonnet 4.5 on coding, and competitive with Opus 4.6 on general benchmarks. on world knowledge and STEM, they say it's ahead of Gemini-Pro-3.1.V4-Flash is the sleeper pick. Faster and cheaper than Pro, but it has better long-context efficiency than Pro does.Origi
Recently, GitHub Copilot silently dropped support for Claude Opus on Pro accounts. Since Opus was my go-to model for my daily workflow (developing WordPress plugins), I needed a reliable replacement.I decided to run a rigorous, blind benchmark across 14 state-of-the-art and local LLMs to objectively measure which model understands WordPress development best. To ensure a perfectly fair test, I started with a completely fresh IDE and zero context for every single generation.I asked each model to b
Hi HN. Modeleon is an open-source Python DSL for financial models. You write models in Python with named variables and operator-overloaded arithmetic, and it compiles to a real .xlsx with live formulas — not values, but formulas like =B4*B5 that the CFO can audit in Excel.The architectural bet is "model is the source of truth, spreadsheet is one rendering." We have a Walker/Renderer split so the same model can target Excel today and other backends later.We've been buildi
Hi HN, I'm Erwin. I built a small free open-source utility that bridges Bluetooth LE MIDI keyboards into the new Windows MIDI Services stack so any DAW or Web MIDI app can use them as if they were wired.I bought a Roland FP-90X piano partly because it had Bluetooth MIDI. On my Windows 11 PC, pairing succeeded, but my DAW couldn't see the keyboard, and notes I sent from the PC never made the piano sing. After a regrettable number of evenings, I'd separated this into three independe
I've been building computer-use tools for a while, and I quietly launched this about a month ago (122 Stars on GH). I figured it was worth sharing here.Over the last few months, a lot of computer-use agents have come out: Codex, Claude Code, CUA, and others. Most of them seem to work roughly like this: 1. Take a screenshot 2. Have the model predict pixel coordinates 3. Click x,y 4. Take another screenshot 5. RepeatThat works, but it's slow, expensive in tokens, and fragile. I