About BugScribe
BugScribe addresses the challenges of incomplete, invalid, and inconsistent bug reports by guiding users to produce high-quality, actionable reports with minimal effort. It combines automated capture of user interactions, intelligent analysis, and interactive guidance to ensure essential details are included and ambiguities are clarified.
By evaluating the quality of each report and suggesting improvements before submission, BugScribe reduces the likelihood of invalid or duplicate reports while integrating with popular bug tracking platforms to provide developers with structured, reliable, and reproducible bug reports.
Key Features
- AI-Assisted Bug Description Generation
- Automated Steps-to-Reproduce (S2R)
- Session Replay & Video Recording
- Conversational AI Agent
- Pre-Submission Invalid Bug Detection
- Duplicate Issue Finding
- Knowledge Base Integration
- Multi-Platform Bug Tracker Integration
Speed Up Bug Reporting
Faster reports, less effort, better quality. One short description from you becomes a full, submission-ready bug report in under two minutes.
Time & effort
- 20–40× faster than writing by hand (~20 min → under 2 min)
- One description → full structured report (no filling 5+ fields manually)
- Describe → review → submit; optional tweaks in natural language
What’s auto-filled
- Summary, steps to reproduce, expected vs observed, environment, extra context
- Validity check in seconds: valid/invalid, confidence %, explanation, route to engineering or suggested fixes
- Duplicate/similar issues surfaced before filing so you can link or skip
Quality & consistency
- Structured format every time (same fields and layout) for easier triage and search
- Validity + confidence (e.g. 88%) so triage can prioritize without re-reading
- Fix suggestions and next steps generated with the report
User journey
- 3 steps: describe the issue → review the draft → submit
- Single place for description, logs, optional screenshot/replay → one report, one link to share
Research & Validation
BugScribe’s Pre-Submission Invalid Bug Detection is backed by systematic LLM experiments. We evaluated a validity classifier on Turkish Airlines software bug reports to compare different prompt and inference strategies. Below is a concise summary of the experiments and results.
Summary of Experiments
| Experiment | Accuracy | Correct/Total | TP | FN | TN | FP | Calls per report |
|---|---|---|---|---|---|---|---|
| Basic | 64.0% | 32/50 | 29 | 0 | 3 | 18 | 1 |
| Enhanced | 78.0% | 39/50 | 22 | 7 | 17 | 4 | 1 |
| Enhanced v2 | 82.0% | 41/50 | 23 | 6 | 18 | 3 | 3 (majority vote) |
| Full 50 (iteration 1) | 82.0% | 41/50 | 26 | 3 | 15 | 6 | 1 |
| Full 50 (iteration 2) | 90.0% | 45/50 | 29 | 0 | 16 | 5 | 1 |
Experiment Overview
- Basic (64%): Minimal prompt (definitions only). Baseline; strong bias toward valid_bug.
- Enhanced (78%): Turkish Airlines context + classification criteria; better invalid_bug detection.
- Enhanced v2 (82%): Few-shot examples, rules checklist, JSON output; 3 calls per report with majority vote.
- Full 50 iteration 1 (82%): No few-shot, no leakage; only context-derived rules and principles.
- Full 50 iteration 2 (90%): Same as iteration 1 plus refined guidance (nuanced distinctions from context). Best accuracy with a single call per report.
Conclusion
Adding domain context, classification criteria, and refined guidance (derived from the same context document) improved accuracy from 64% to 90%. The best configuration (Full 50 iteration 2) achieves 90% accuracy with one LLM call per report and no use of evaluation data in the prompt, demonstrating a practical approach for pre-submission invalid bug detection in BugScribe.