Omnivore Git Command¶
The omnivore git command is a powerful code extraction and analysis tool that intelligently processes Git repositories, both local and remote. It automatically detects the codebase type, applies smart filtering, and generates organized reports of the source code.
Overview¶
The git command clones or analyzes repositories, detects the technology stack, filters out unnecessary files (like dependencies and build artifacts), and produces a clean, organized output of the actual source code.
Basic Usage¶
# Analyze a remote repository
omnivore git https://github.com/user/repo --output analysis.txt
# Analyze a local repository
omnivore git ./my-project --output project-code.txt
# Output to stdout
omnivore git https://github.com/user/repo --stdout
# Quick analysis with default output name
omnivore git https://github.com/user/repo
Features¶
Automatic Codebase Detection¶
The command automatically detects: - Project Type: Web Application, CLI, Library, API, Mobile App, Monorepo, etc. - Programming Languages: Rust, JavaScript, TypeScript, Python, Go, Java, C#, Ruby, PHP, etc. - Frameworks: Next.js, React, Vue, Django, Flask, Rails, Laravel, Spring, etc. - Build Tools: npm, yarn, cargo, pip, maven, gradle, etc.
Smart Filtering¶
By default, the command excludes:
- Package directories (node_modules, vendor, site-packages)
- Build outputs (dist, build, target, bin)
- Version control directories (.git, .svn)
- IDE files (.vscode, .idea)
- Large binary files (images, videos, compiled binaries)
- Machine learning models (*.pt, *.pkl, *.h5)
- Database files (*.db, *.sqlite)
- Archive files (*.zip, *.tar.gz)
- Lock files (package-lock.json, yarn.lock)
- Files larger than 10MB (configurable)
Organized Output¶
The output is structured with: 1. Project metadata - Type, language, frameworks, build tools 2. File organization - Grouped by category (Source Code, Configuration, Tests, Documentation, etc.) 3. Full source code - With syntax-aware formatting and line numbers
Command Options¶
Core Options¶
| Option | Description | Example |
|---|---|---|
--output PATH |
Output file path (supports .txt or .json) | --output analysis.txt |
--stdout |
Output to stdout instead of file | --stdout |
--json |
Output in JSON format | --json |
--verbose |
Show detailed progress information | --verbose |
Filtering Options¶
| Option | Description | Example |
|---|---|---|
--only PATTERNS |
Include only files matching patterns (comma-separated) | --only "*.rs,*.toml" |
--include PATTERNS |
Include files matching patterns | --include "*.js,*.ts" |
--exclude PATTERNS |
Exclude files matching patterns | --exclude "*.test.js" |
--no-gitignore |
Ignore .gitignore rules | --no-gitignore |
--allow-binary |
Include binary files | --allow-binary |
--max-file-size SIZE |
Maximum file size in bytes (default: 10MB) | --max-file-size 20971520 |
Repository Options¶
| Option | Description | Example |
|---|---|---|
--keep |
Keep cloned repository after completion | --keep |
--depth N |
Clone depth for remote repos (default: 1) | --depth 10 |
Examples¶
Basic Repository Analysis¶
# Analyze a GitHub repository and save to a text file
omnivore git https://github.com/rust-lang/rust-clippy --output clippy-analysis.txt
# Analyze with automatic naming (repo_name_timestamp.txt)
omnivore git https://github.com/facebook/react
Filtering Specific Files¶
# Extract only Rust files
omnivore git https://github.com/rust-lang/cargo --only "*.rs,*.toml"
# Include test files that would normally be excluded
omnivore git ./my-project --include "**/tests/**" --output with-tests.txt
# Exclude specific directories
omnivore git ./my-project --exclude "**/experiments/**,**/legacy/**"
Working with Large Repositories¶
# Shallow clone for faster processing
omnivore git https://github.com/torvalds/linux --depth 1 --only "*.c,*.h"
# Increase file size limit for larger source files
omnivore git ./ml-project --max-file-size 52428800 # 50MB
# Include normally excluded ML model files
omnivore git ./ml-project --include "*.pkl,*.pt" --allow-binary
Different Output Formats¶
# JSON output for programmatic processing
omnivore git https://github.com/user/repo --json --output analysis.json
# Stream to stdout for piping
omnivore git https://github.com/user/repo --stdout | grep "function"
# Pretty text format (default)
omnivore git https://github.com/user/repo --output report.txt
Local Repository Analysis¶
# Analyze current directory
omnivore git . --output this-project.txt
# Analyze specific local repository
omnivore git /path/to/project --output project-analysis.txt
# Keep repository structure intact (useful for local repos)
omnivore git ./my-project --output analysis.txt
Output Format¶
Text Output (.txt)¶
The text output includes:
================================================================================
OMNIVORE CODE ANALYSIS REPORT
================================================================================
PROJECT INFORMATION
-------------------
Type: Web Application
Description: A web application written in TypeScript using NextJS
Language: TypeScript
Frameworks: NextJS, React
Build Tools: Npm, Yarn
Total Files: 156
================================================================================
PROJECT STRUCTURE
-----------------
📁 Documentation (4)
Project documentation and guides
• README.md
• CONTRIBUTING.md
• docs/API.md
• docs/Setup.md
📁 Configuration (8)
Project configuration and build files
• package.json
• tsconfig.json
• next.config.js
• .eslintrc.json
...
📁 Source Code (120)
Main application source code
• src/pages/index.tsx
• src/components/Header.tsx
...
================================================================================
SOURCE CODE
================================================================================
[Full source code with line numbers for each file]
JSON Output (.json)¶
The JSON output includes:
{
"metadata": {
"project_type": "WebApplication",
"description": "A web application written in TypeScript using NextJS",
"main_language": "TypeScript",
"frameworks": ["NextJS", "React"],
"build_tools": ["Npm"],
"total_files": 156
},
"sections": [
{
"name": "Source Code",
"description": "Main application source code",
"files": [
{
"path": "src/index.ts",
"content": "// File content here..."
}
]
}
]
}
Smart Defaults¶
The command uses intelligent defaults based on the detected project type:
For JavaScript/TypeScript Projects¶
- Includes:
*.js,*.jsx,*.ts,*.tsx,package.json,tsconfig.json - Excludes:
node_modules/,dist/,build/,.next/,*.min.js
For Python Projects¶
- Includes:
*.py,requirements.txt,pyproject.toml,setup.py - Excludes:
__pycache__/,venv/,.venv/,*.pyc,site-packages/
For Rust Projects¶
- Includes:
*.rs,Cargo.toml,Cargo.lock - Excludes:
target/,*.rlib
For Go Projects¶
- Includes:
*.go,go.mod,go.sum - Excludes:
vendor/,*.exe,*.test
Advanced Usage¶
Analyzing Machine Learning Projects¶
# Exclude large model files but keep training scripts
omnivore git ./ml-project \
--exclude "*.pt,*.pkl,*.h5,*.safetensors" \
--include "train.py,model.py" \
--output ml-code.txt
Analyzing Monorepos¶
# Focus on specific packages in a monorepo
omnivore git ./monorepo \
--include "packages/core/**,packages/cli/**" \
--exclude "packages/deprecated/**" \
--output core-packages.txt
CI/CD Integration¶
# Generate a code report in CI pipeline
omnivore git . \
--json \
--output code-report.json \
--no-gitignore \
--verbose
Security Audit¶
# Extract only configuration and security-relevant files
omnivore git https://github.com/org/repo \
--only "*.env.example,*.yml,*.yaml,Dockerfile,*.json" \
--output security-audit.txt
Performance Tips¶
- Use shallow clones for large repositories:
--depth 1 - Filter early with
--onlyto avoid processing unnecessary files - Adjust file size limits based on your needs
- Use default smart filtering - it excludes most unnecessary files automatically
Troubleshooting¶
No files found¶
- Check your filter patterns
- Use
--verboseto see what's being excluded - Try
--no-gitignoreif files might be gitignored - Verify the repository path/URL is correct
Clone fails¶
- Check network connectivity
- Verify repository access (private repos need authentication)
- For SSH URLs, ensure SSH keys are configured
Output too large¶
- Use more specific
--onlypatterns - Reduce
--max-file-size - Exclude test directories with
--exclude "**/tests/**"
Missing files in output¶
- Check if files exceed size limit (default 10MB)
- Verify files aren't binary (use
--allow-binaryif needed) - Check if files match exclude patterns
Integration with Other Omnivore Commands¶
The git command output can be used with other Omnivore features:
# Extract code and then analyze with AI
omnivore git https://github.com/user/repo --output code.txt
omnivore analyze code.txt --ai "Find security vulnerabilities"
# Combine with web crawling
omnivore crawl https://docs.example.com --output docs.json
omnivore git https://github.com/example/repo --output code.txt
Best Practices¶
- Start with defaults - The smart filtering usually works well out of the box
- Use
--onlyfor focused extraction - More efficient than broad inclusion - Keep output organized - Use descriptive output filenames
- Version control your analyses - Track code snapshots over time
- Respect rate limits - Don't repeatedly clone large repositories
Limitations¶
- Maximum file size: 10MB by default (configurable)
- Memory usage scales with repository size
- Private repositories require proper authentication setup
- Binary file analysis is limited
- Very large repositories may take time to process
Security Considerations¶
- The command respects
.gitignoreby default - Sensitive files should be excluded manually if needed
- Cloned repositories are deleted after processing (unless
--keep) - No data is sent to external services
- Authentication credentials are handled by git2 library securely