Mixedbread AI Introduces mgrep for Natural Language Code Search

Emily Carter
Emily Carter
Mixedbread AI's mgrep tool interface showing natural language code search in action with code snippets.

Mixedbread AI has launched mgrep, a command-line tool designed to integrate natural language search capabilities with code repositories. The tool aims to provide semantic, multilingual, and multimodal search functionality, with support for audio and video content planned for future updates.

mgrep features background indexing via mgrep watch, which monitors Git repositories to keep content updated. It also offers device login and integration with coding agents. The tool is presented as a utility for both human developers and AI agents, emphasizing quiet output and configurable defaults. Mixedbread AI states that mgrep can reduce token usage for agents by half while maintaining performance.

Core Functionality and Agent Integration

Users can install mgrep via npm, pnpm, or bun. Authentication can be done through a browser-based login process or by setting an MXBAI_API_KEY environment variable for CI/CD or headless environments.

To index a project, users navigate to their repository and run mgrep watch. This command performs an initial synchronization, respects .gitignore rules, and then uses file watchers to keep the Mixedbread store updated. Once indexed, users can perform natural language searches, such as mgrep "Where are the permission settings?" or mgrep -m 25 "store schema". The tool currently supports code, text, PDF, and image files.

mgrep also includes auxiliary installation commands for various coding agents, including Claude Code, OpenCode, Codex, and Factory Droid. These commands facilitate login and integrate mgrep support directly into the agent environment. Mixedbread AI plans to add support for more agents, such as Cursor and Windsurf.

Performance and Design Philosophy

Mixedbread AI conducted a 50-task benchmark integrating mgrep with Claude Code, comparing its performance against a grep-based workflow. The company reported that the mgrep + Claude Code setup used approximately half the tokens while achieving comparable or superior quality. This efficiency is attributed to mgrep's ability to find relevant code snippets semantically, allowing the agent to focus on reasoning rather than extensive code scanning.

The developers note that while grep is a robust tool, its 1973 origins limit its capabilities to precise pattern matching, which can be inefficient for large codebases or when searching for abstract business logic. This often leads to coding agents consuming significant tokens through numerous grep attempts. mgrep is designed to address these limitations by incorporating advancements in semantic understanding and code search.

Mixedbread AI positions mgrep as a complementary tool to grep, suggesting that grep remains suitable for exact matches and quick string locations, while mgrep excels at intent search, business logic lookup, cross-language descriptions, and multimodal content.

Configuration and Environment Variables

mgrep's search functionality includes options to limit results, display content, generate answers based on results, synchronize files before searching, and perform dry runs. For example, mgrep -m 10 "What is the maximum number of concurrent workers for code parsers?" limits results to 10, and mgrep -a "What code parsers are available?" generates an answer.

The mgrep watch command indexes repositories and maintains synchronization, respecting both .gitignore and .mgrepignore files. Users can configure mgrep via .mgreprc.yaml files in the project root or globally, as well as through environment variables. Configuration priority is set from CLI parameters to environment variables, then local and global configuration files, with default values as the lowest priority.

Environment variables can be used for authentication and storage, such as MXBAI_API_KEY and MXBAI_STORE. Search options like MGREP_MAX_COUNT, MGREP_CONTENT, and MGREP_ANSWER can also be set via environment variables, which is useful for CI/CD pipelines. Command-line options always override environment variables.

Underlying mgrep is Mixedbread Search, which utilizes semantic retrieval models, context-aware parsing, and optimized inference methods. Files are pushed to the Mixedbread Store, and search results include relative paths and context hints for efficient browsing. The cloud-based store allows agents and team members to query the same corpus without re-uploading.