Step-by-Step Guide to Building a RAG-Powered JIRA Indexing System
Tri Ho
June 6, 2024

In today's fast-paced development environments, managing and extracting actionable insights from JIRA tickets can be daunting. At Futurify, we've embarked on a project to simplify this process using the power of Retrieval-Augmented Generation (RAG) and AI. Our goal is to seamlessly index our JIRA tickets and diagrams and provide precise answers to user queries. Here's a closer look at how we've brought this project to life.

Overview of Steps

  1. Reason for Having This System
  2. Setting Up the Flask Web Application
  3. Text Processing and Document Management
  4. Integrating OpenAI for Enhanced Query Responses
  5. Document Synchronization and Management
  6. Combining Components

Reason for Having This System

JIRA is an invaluable tool for tracking development tasks and issues. However, as projects grow, the sheer volume of tickets can make it challenging to find relevant information quickly. Teams often spend significant time sifting through tickets to find details about specific issues or historical decisions. We needed a solution to streamline this process, making information retrieval as efficient as possible.

Our solution integrates traditional text processing techniques with advanced AI capabilities from OpenAI to create a robust system for indexing and querying JIRA tickets. This system not only enhances our ability to retrieve relevant information quickly but also leverages AI to provide more accurate and contextually relevant answers, significantly improving efficiency in managing and utilizing project data.

Step 1: Setting Up the Flask Web Application

First, we'll set up a Flask web application to serve as the interface for querying our indexed JIRA tickets.

Code Sample: Flask Application Setup

from flask import Flask, render_template, request, jsonify
from infrastructure.containers import Container
from flask_cors import CORS

app = Flask(__name__)
container = Container()
CORS(app)  # Enable Cross-Origin Resource Sharing

@app.route("/")
def home():
    return render_template('index.html', title='JIRA Query System', name='Futurify')

@app.route("/get-answer", methods=['POST'])
def get_answer():
    input_data = request.get_json()
    query = input_data["query"]
    paragraph_service = container.paragraph_service()
    result = paragraph_service.get_analysis_response(query)
    return jsonify({"result": result})

if __name__ == "__main__":
    app.run(debug=True)

Step 2: Text Processing and Document Management

Next, we need to preprocess and manage the text from our JIRA tickets. We'll use OpenAI embeddings to process the text and Elasticsearch for storing and retrieving paragraph documents.

Code Sample: Text Processing with OpenAI Embeddings

from sklearn.metrics.pairwise import cosine_similarity
import openai
import nltk
from nltk.corpus import stopwords
import string

class TextProcessService:
    def __init__(self, openai_api_key):
        openai.api_key = openai_api_key
    
    def preprocess_text(self, text):
        tokens = nltk.word_tokenize(text.lower())
        return ' '.join([token for token in tokens if token not in stopwords.words('english') and token not in string.punctuation])
    
    def get_embedding(self, text):
        response = openai.Embedding.create(
            input=text,
            model="text-embedding-ada-002"
        )
        return response['data'][0]['embedding']
    
    def calculate_similarity(self, embedding1, embedding2):
        return cosine_similarity([embedding1], [embedding2])[0][0]

Step 3: Integrating OpenAI for Enhanced Query Responses

We integrate OpenAI to generate responses and embeddings for our documents, enhancing the accuracy and relevance of our query results.

Code Sample: OpenAI Integration

import requests
import json

class OpenAiRequestHelper:
    COMPLETION_URL = "https://api.openai.com/v1/chat/completions"
    EMBEDDING_URL = "https://api.openai.com/v1/embeddings"
    EMBEDDING_MODEL = "text-embedding-ada-002"
    MODEL = "gpt-3.5-turbo"

    def __init__(self, api_key):
        self.headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {api_key}"
        }

    def get_answer(self, prompt, temperature=0.5):
        payload = {
            "messages": [{"role": "user", "content": prompt}],
            "model": self.MODEL,
            "temperature": temperature
        }
        response = requests.post(self.COMPLETION_URL, headers=self.headers, data=json.dumps(payload))
        return response.json()["choices"][0]["message"]["content"]

    def get_embedded_vector(self, text):
        payload = {
            "input": text,
            "model": self.EMBEDDING_MODEL
        }
        response = requests.post(self.EMBEDDING_URL, headers=self.headers, data=json.dumps(payload))
        return response.json()["data"][0]["embedding"]

Step 4: Document Synchronization and Management

We synchronize our document data with external sources like Google Drive, process the content, and store embeddings for efficient querying.

Code Sample: Document Synchronization

import gspread
from google.oauth2.service_account import Credentials
import os

class DocumentSyncService:
    def __init__(self, credentials_file):
        self.credentials = Credentials.from_service_account_file(credentials_file)
        self.client = gspread.authorize(self.credentials)
    
    def list_files_in_folder(self, folder_id):
        query = f"'{folder_id}' in parents and trashed = false"
        results = self.client.files().list(q=query, fields="files(id, name)").execute()
        return results.get("files", [])
    
    def download_and_process_files(self, folder_id, process_file):
        files = self.list_files_in_folder(folder_id)
        for file in files:
            file_id = file['id']
            file_name = file['name']
            temp_file_path = os.path.join('/tmp', file_name)
            request = self.client.files().get_media(fileId=file_id)
            with open(temp_file_path, 'wb') as f:
                request.download(f)
            process_file(temp_file_path)
            os.remove(temp_file_path)

Step 5: Combining Components

With all components in place, we combine them to create a cohesive system that indexes JIRA tickets, processes text, integrates AI for advanced queries, and synchronizes data.

Code Sample: Combining Components

from flask import Flask, request, jsonify
from application.services.text_process import TextProcessService
from application.services.openai_request_helper import OpenAiRequestHelper
from application.services.document_sync import DocumentSyncService

app = Flask(__name__)

# Initialize services
text_process_service = TextProcessService(openai_api_key="your_openai_api_key")
openai_helper = OpenAiRequestHelper(api_key="your_openai_api_key")
document_sync_service = DocumentSyncService(credentials_file="path_to_credentials.json")

@app.route("/query", methods=['POST'])
def query():
    data = request.get_json()
    query_text = data['query']
    
    # Preprocess and get embedding for the query
    preprocessed_query = text_process_service.preprocess_text(query_text)
    query_embedding = text_process_service.get_embedding(preprocessed_query)
    
    # Perform the query and get the response
    response = openai_helper.get_answer(preprocessed_query)
    
    return jsonify({"response": response})

@app.route("/sync", methods=['POST'])
def sync():
    folder_id = request.get_json()['folder_id']
    document_sync_service.download_and_process_files(folder_id, process_file)
    return jsonify({"status": "success"})

if __name__ == "__main__":
    app.run(debug=True)

Conclusion

By combining Flask, OpenAI embeddings, traditional text processing techniques, and efficient document management, we've built a robust system for querying and analyzing JIRA tickets. This system not only enhances our ability to retrieve relevant information quickly but also leverages AI to provide more accurate and contextually relevant answers.

Implementing such a solution can significantly improve efficiency in managing and utilizing project data, making it a valuable tool for any development team. Stay tuned as we continue to innovate and enhance this system, and feel free to reach out with any questions or feedback!