BugScribe

Agentic Bug Report Generation Assistant

Bilkent University • Department of Computer Engineering

CS-491 Senior Design Project 1 • 2025-2026 Fall

About BugScribe

Project Name Pronunciation: /bʌɡ skraɪb/ (Bug-Scribe)

BugScribe addresses the challenges of incomplete, invalid, and inconsistent bug reports by guiding users to produce high-quality, actionable reports with minimal effort. It combines automated capture of user interactions, intelligent analysis, and interactive guidance to ensure essential details are included and ambiguities are clarified.

By evaluating the quality of each report and suggesting improvements before submission, BugScribe reduces the likelihood of invalid or duplicate reports while integrating with popular bug tracking platforms to provide developers with structured, reliable, and reproducible bug reports.

Key Features

  • AI-Assisted Bug Description Generation
  • Automated Steps-to-Reproduce (S2R)
  • Session Replay & Video Recording
  • Conversational AI Agent
  • Pre-Submission Invalid Bug Detection
  • Duplicate Issue Finding
  • Knowledge Base Integration
  • Multi-Platform Bug Tracker Integration

Speed Up Bug Reporting

Faster reports, less effort, better quality. One short description from you becomes a full, submission-ready bug report in under two minutes.

20–40× Faster than manual
5+ Fields auto-filled
1 Description from you
<2 min To submission-ready
~20 min Saved per report

Time & effort

  • 20–40× faster than writing by hand (~20 min → under 2 min)
  • One description → full structured report (no filling 5+ fields manually)
  • Describe → review → submit; optional tweaks in natural language

What’s auto-filled

  • Summary, steps to reproduce, expected vs observed, environment, extra context
  • Validity check in seconds: valid/invalid, confidence %, explanation, route to engineering or suggested fixes
  • Duplicate/similar issues surfaced before filing so you can link or skip

Quality & consistency

  • Structured format every time (same fields and layout) for easier triage and search
  • Validity + confidence (e.g. 88%) so triage can prioritize without re-reading
  • Fix suggestions and next steps generated with the report

User journey

  • 3 steps: describe the issue → review the draft → submit
  • Single place for description, logs, optional screenshot/replay → one report, one link to share

Research & Validation

BugScribe’s Pre-Submission Invalid Bug Detection is backed by systematic LLM experiments. We evaluated a validity classifier on Turkish Airlines software bug reports to compare different prompt and inference strategies. Below is a concise summary of the experiments and results.

Dataset: 50 reports (title + content), ground-truth validity (valid_bug / invalid_bug). Model: meta-llama/Llama-3.3-70B-Instruct-Turbo (Together API). No data leakage in any experiment.

Summary of Experiments

Validity classifier experiments: accuracy and metrics
Experiment Accuracy Correct/Total TP FN TN FP Calls per report
Basic 64.0% 32/50 29 0 3 18 1
Enhanced 78.0% 39/50 22 7 17 4 1
Enhanced v2 82.0% 41/50 23 6 18 3 3 (majority vote)
Full 50 (iteration 1) 82.0% 41/50 26 3 15 6 1
Full 50 (iteration 2) 90.0% 45/50 29 0 16 5 1

Experiment Overview

  • Basic (64%): Minimal prompt (definitions only). Baseline; strong bias toward valid_bug.
  • Enhanced (78%): Turkish Airlines context + classification criteria; better invalid_bug detection.
  • Enhanced v2 (82%): Few-shot examples, rules checklist, JSON output; 3 calls per report with majority vote.
  • Full 50 iteration 1 (82%): No few-shot, no leakage; only context-derived rules and principles.
  • Full 50 iteration 2 (90%): Same as iteration 1 plus refined guidance (nuanced distinctions from context). Best accuracy with a single call per report.

Conclusion

Adding domain context, classification criteria, and refined guidance (derived from the same context document) improved accuracy from 64% to 90%. The best configuration (Full 50 iteration 2) achieves 90% accuracy with one LLM call per report and no use of evaluation data in the prompt, demonstrating a practical approach for pre-submission invalid bug detection in BugScribe.

Team Members

Emre Furkan Akyol
22103352
Mehmet Can Bıyık
22102035
Emre Dinç
22103624
Mustafa Özkan İr
22103267
Akif Emre Köşüş
22103657

Project Supervisor

Eray Tüzün
Faculty Supervisor

Project Information

Course & Academic Details

Course: CS-491 Senior Design Project 1
Academic Year: 2025-2026 Fall
University: Bilkent University
Department: Computer Engineering