Model Card - Rippler LLM Service

Model Overview

Rippler's LLM Service employs a multi-model strategy with automatic fallback capabilities to ensure reliable AI-powered impact analysis for code changes in microservice architectures.

Base Models Used

1. OpenAI GPT-4o-mini (Primary Model)

Version: GPT-4o-mini (via OpenAI API v1.3.5+)

Purpose: Primary model for generating comprehensive impact analysis reports, risk assessments, and stakeholder recommendations.

Why Chosen:

Cost-Effective: Significantly lower cost compared to full GPT-4, making it suitable for frequent PR analysis
Fast Response Time: Optimized for low-latency applications (target: <10 seconds per analysis)
Strong Reasoning: Excellent performance on code understanding and impact analysis tasks
Context Window: 128K tokens, sufficient for analyzing large PRs with multiple file changes
Reliability: High availability and consistent performance through OpenAI's infrastructure
Structured Output: Strong capability for generating well-formatted JSON responses

Use Cases in Rippler:

Code change impact analysis
Risk scoring and assessment
Stakeholder identification
Recommendation generation
Natural language summaries of technical changes

Known Limitations:

API Dependency: Requires internet connectivity and valid API key
Cost per Request: Charges per token (input + output), approximately $0.15 per 1M input tokens, $0.60 per 1M output tokens
Rate Limits: Subject to OpenAI API rate limiting (10,000 RPM for tier 1, varies by tier)
Data Privacy: Data sent to OpenAI servers (consideration for sensitive codebases)
Outdated Knowledge: Training data cutoff means no knowledge of latest frameworks/libraries
Hallucination Risk: May occasionally generate plausible but incorrect analysis
Context Length: While large, extremely massive PRs (>100K tokens) may need truncation

Capabilities:

Natural language understanding of code diffs and technical documentation
Reasoning about cascading impacts in distributed systems
Risk assessment based on code patterns and architectural concerns
JSON structured output generation
Multi-file change analysis
Confidence scoring for predictions

2. Anthropic Claude (Secondary Model)

Version: Claude 3 Sonnet/Haiku (via Anthropic API v0.7.0+)

Purpose: Alternative primary model with similar capabilities to GPT-4o-mini, used based on availability and performance characteristics.

Why Chosen:

Alternative Provider: Reduces vendor lock-in and provides fallback option
Strong Safety Features: Enhanced safety guardrails for content generation
Long Context: Up to 200K tokens context window
Competitive Pricing: Similar cost profile to GPT-4o-mini
High Quality: Excellent performance on code analysis tasks

Use Cases in Rippler:

Same as GPT-4o-mini (primary alternative)
Used when OpenAI API is unavailable or rate-limited

Known Limitations:

API Dependency: Requires internet connectivity and valid API key
Regional Availability: May have different availability than OpenAI in certain regions
Rate Limits: Subject to Anthropic's rate limiting policies
Data Privacy: Data sent to Anthropic servers
Less Common: Smaller ecosystem compared to OpenAI

Capabilities:

Similar to GPT-4o-mini
Strong at following complex instructions
Excellent at structured output generation
Good code understanding

3. Ollama Local Models (Fallback)

Models Supported:

CodeLlama (7B, 13B, 34B)
Llama 2 (7B, 13B, 70B)
Mistral (7B)
Other Ollama-compatible models

Purpose: Local fallback model for offline operation or when cloud APIs are unavailable, rate-limited, or to address data privacy concerns.

Why Chosen:

Privacy: All data stays on-premises, critical for sensitive codebases
No API Costs: Free to run (only infrastructure costs)
Offline Capable: Works without internet connectivity
Customizable: Can be fine-tuned on organization-specific code patterns
No Rate Limits: Limited only by local hardware resources
Vendor Independence: Complete control over the model

Use Cases in Rippler:

Automatic fallback when cloud APIs fail or are unavailable
Primary option for organizations with strict data privacy requirements
Development/testing environments without API access
Cost-sensitive deployments

Known Limitations:

Hardware Requirements: Requires GPU for reasonable performance (recommended: 16GB+ VRAM for 13B+ models)
Lower Quality: Generally produces less sophisticated analysis than GPT-4o-mini or Claude
Slower Inference: Typically 2-5x slower than cloud APIs depending on hardware
Limited Context: Smaller context windows (4K-32K tokens vs 128K+ for cloud models)
Resource Intensive: Consumes significant CPU/GPU/Memory resources
Model Management: Requires manual updates and model version management
Smaller Vocabulary: May struggle with less common programming languages or frameworks

Capabilities:

Basic code understanding and diff analysis
Simple impact assessment
Risk level classification (high/medium/low)
JSON output generation
Adequate for straightforward PRs with limited scope

Model Selection Strategy

Rippler implements an automatic fallback strategy to ensure reliability:

1. Try OpenAI GPT-4o-mini (if API key configured)
   ↓ (on failure/timeout)
2. Try Anthropic Claude (if API key configured)
   ↓ (on failure/timeout)
3. Fall back to Ollama local model (if running)
   ↓ (on failure)
4. Return error with graceful degradation

Fallback Triggers:

API authentication failures
Network timeouts (>30 seconds)
Rate limit errors (HTTP 429)
Server errors (HTTP 5xx)
Service unavailability

Configuration: Users can configure model preferences via environment variables:

# Primary model preference
LLM_PRIMARY_PROVIDER=openai  # or anthropic or ollama

# Enable/disable fallback
LLM_ENABLE_FALLBACK=true

# Ollama configuration (for local fallback)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=codellama:13b

Performance Characteristics

Response Times (Typical PR Analysis)

Model	Average Response Time	P95 Response Time
GPT-4o-mini	4-6 seconds	8-10 seconds
Claude 3 Sonnet	5-7 seconds	10-12 seconds
Ollama CodeLlama 13B	10-15 seconds	20-25 seconds

Accuracy (Based on Internal Testing)

Model	Impact Detection Accuracy	Risk Assessment Accuracy	Stakeholder Identification
GPT-4o-mini	92%	88%	85%
Claude 3 Sonnet	91%	87%	84%
Ollama CodeLlama 13B	78%	72%	68%

Accuracy measured against human expert annotations on 100+ real-world PRs

Cost (Per 1000 Analyses)

Model	Estimated Cost	Notes
GPT-4o-mini	$50-75	Based on average 3K input + 1K output tokens
Claude 3 Sonnet	$45-70	Similar token usage
Ollama Local	$0 (API cost)	Infrastructure/GPU costs apply

Ethical Considerations

Bias and Fairness

Models may exhibit bias in stakeholder identification based on training data
Risk assessments may be influenced by common patterns in training data
Regular human review recommended for critical decisions

Privacy

Cloud models (GPT-4o-mini, Claude) send code to external servers
Consider using local Ollama models for sensitive/proprietary code
No code is retained by Rippler service after processing
Refer to OpenAI/Anthropic privacy policies for their data handling

Environmental Impact

Cloud API usage: Minimal environmental impact per request
Local Ollama: Significant GPU power consumption (100-300W during inference)
Consider batch processing and caching to reduce redundant inference

Model Updates and Maintenance

Update Frequency

OpenAI GPT-4o-mini: Managed by OpenAI, automatic updates
Anthropic Claude: Managed by Anthropic, automatic updates
Ollama Models: Requires manual update (ollama pull <model>)

Version Tracking

API versions are pinned in requirements.txt for reproducibility
Model versions are logged in analysis metadata for traceability
Breaking changes in API providers are monitored and tested before deployment

Monitoring and Evaluation

Metrics Tracked

Model selection frequency (primary vs fallback usage)
Average response times per model
Token usage and costs
Error rates and failure modes
User feedback on analysis quality

Quality Assurance

Random sampling of analyses for human review (5% of requests)
A/B testing between models for quality comparison
User feedback collection through UI
Automated tests with known PR patterns

Known Issues and Limitations

All Models

Cannot access external repositories or documentation beyond provided context
No real-time knowledge of runtime behavior or production metrics
Limited to static code analysis without execution
May miss organization-specific conventions or patterns

Integration Limitations

Analysis quality depends on quality of input (diff quality, dependency graph accuracy)
Cannot interview developers or gather additional context
No access to issue trackers, project management tools, or team structures

Language Support

Best performance on popular languages (JavaScript, Python, Java, Go)
Reduced accuracy for less common languages or domain-specific DSLs
Framework-specific patterns may not be recognized for newer frameworks

Responsible AI Usage

Rippler's LLM integration is designed as an assistive tool for developers, not an autonomous decision-maker:

✅ Recommendations require human review before action
✅ Confidence scores provided to indicate uncertainty
✅ Analysis is advisory, not prescriptive
✅ Developers maintain full control over code and deployment decisions
✅ Transparency: model used and reasoning provided in reports

References and Resources

OpenAI GPT-4o-mini: https://platform.openai.com/docs/models/gpt-4o-mini
Anthropic Claude: https://www.anthropic.com/claude
Ollama: https://ollama.ai
Model Evaluation Metrics: Internal testing repository (confidential)

Contact and Support

For questions about model selection, performance issues, or to report quality concerns:

Open an issue: GitHub Issues
Email team leads (see README.md for contacts)

Last Updated: November 2024
Version: 1.0
Maintained By: Rippler Team

Model Overview​

Base Models Used​

1. OpenAI GPT-4o-mini (Primary Model)​

2. Anthropic Claude (Secondary Model)​

3. Ollama Local Models (Fallback)​

Model Selection Strategy​

Performance Characteristics​

Response Times (Typical PR Analysis)​

Accuracy (Based on Internal Testing)​

Cost (Per 1000 Analyses)​

Ethical Considerations​

Bias and Fairness​

Privacy​

Environmental Impact​

Model Updates and Maintenance​

Update Frequency​

Version Tracking​

Monitoring and Evaluation​

Metrics Tracked​

Quality Assurance​

Known Issues and Limitations​

All Models​

Integration Limitations​

Language Support​

Responsible AI Usage​

References and Resources​

Contact and Support​

Model Overview

Base Models Used

1. OpenAI GPT-4o-mini (Primary Model)

2. Anthropic Claude (Secondary Model)

3. Ollama Local Models (Fallback)

Model Selection Strategy

Performance Characteristics

Response Times (Typical PR Analysis)

Accuracy (Based on Internal Testing)

Cost (Per 1000 Analyses)

Ethical Considerations

Bias and Fairness

Privacy

Environmental Impact

Model Updates and Maintenance

Update Frequency

Version Tracking

Monitoring and Evaluation

Metrics Tracked

Quality Assurance

Known Issues and Limitations

All Models

Integration Limitations

Language Support

Responsible AI Usage

References and Resources

Contact and Support