Automating Micro-Lesson Creation with AI: A Step-by-Step Guide
Tri Ho
June 6, 2024

We are working on a new education platform for our client in Vietnam, aimed at deploying across Southeast Asia to educate about protecting the environment. The materials target grade students, university students, and young professionals (under 35 years old). The platform's look and feel will be adapted for different age groups to make it easy to use, while the core content remains consistent. By leveraging AI, we can explain the same concepts differently to various age groups. For younger audiences, we use simple language and familiar concepts with actionable steps. For adults, we delve into topics such as global warming, climate change, and sea level rise. This approach ensures effective education tailored to each demographic.

Scope

The scope of this script is to provide a proof of concept to generate questions and answers based on the existing content, making it smaller and more digestible. In our production scripts, we will adapt the content to different age groups. We also need to review all of the generated content to ensure accuracy and ensure that answers are derived solely from the lesson content.

High-Level Overview

  1. Importing Content: The script imports the content from the PowerPoint slides that would be presented to the learners.
  2. Chunking Content: The content is often too large for the context window, so we split it into manageable chunks.
  3. Vectorizing and Clustering: We use OpenAI to vectorize the content and then apply KMeans to generate clusters, creating 5 big topics per lesson.
  4. Generating Subtopics and QA: We use GPT-3.5 and GPT-4 to expand the topics into subtopics. For each subtopic, we generate guiding questions and answers, ensuring that the answers come directly from the lesson content and not from other sources.
  5. Storing Data: Finally, we store the generated data for further use and presentation.

Step-by-Step Guide

1. Document Processing

The first step involves loading and preprocessing the documents. The script supports various document types, focusing on PowerPoint presentations. The UnstructuredPowerPointLoader from the langchain library is used for this purpose.

from langchain_community.document_loaders import UnstructuredPowerPointLoader

def extract_text_from_powerpoint(file_path):
    loader = UnstructuredPowerPointLoader(file_path)
    document = loader.load()
    return document

2. Chunking

Documents are split into smaller chunks to manage the content more efficiently. The chunk_document function handles this by splitting the document based on a maximum token limit.

def chunk_document(document, max_tokens):
    slides = document.split('\n\n\n')
    chunks = []
    current_chunk = []
    current_tokens = 0

    for sentence in slides:
        num_token = num_tokens_from_messages(sentence)
        if current_tokens + num_token <= max_tokens:
            current_chunk.append(sentence)
            current_tokens += num_token
        else:
            chunks.append('\n'.join(current_chunk))
            current_chunk = [sentence]
            current_tokens = num_token

    if current_chunk:
        chunks.append('\n'.join(current_chunk))
    
    return chunks

3. Embedding and Clustering

The script uses OpenAI embeddings to vectorize the document chunks and KMeans clustering to organize the content into topics.

Generate Embeddings

OpenAI embeddings are used to convert text chunks into vector representations.

import openai

def generate_embeddings(chunks):
    embeddings = []
    for chunk in chunks:
        response = openai.Embedding.create(
            input=chunk,
            model="text-embedding-ada-002"
        )
        embeddings.append(response['data'][0]['embedding'])
    return embeddings

Clustering with KMeans

KMeans clustering is applied to the embeddings to group similar content together.

from sklearn.cluster import KMeans

def cluster_documents(embeddings, num_clusters=5):
    kmeans = KMeans(n_clusters=num_clusters, random_state=42)
    kmeans.fit(embeddings)
    labels = kmeans.labels_
    return labels, kmeans.cluster_centers_

4. AI Integration

Using OpenAI’s models, the script generates educational content. It creates topics and subtopics, followed by questions and answers.

GPT-3.5 for Topic Generation

The script uses the cluster centers as input to GPT-3.5 to generate the main topics.

def gpt3_topic_generator(cluster_centers):
    topics = []
    for center in cluster_centers:
        prompt = f"Generate a main topic for the following content cluster:\n{center}"
        response = openai.Completion.create(
            model="gpt-3.5-turbo",
            prompt=prompt,
            max_tokens=150
        )
        topics.append(response.choices[0].text.strip())
    return topics

GPT-4 for Subtopic Expansion

The script then uses GPT-4 to expand these topics into subtopics and generate questions and answers for each subtopic.

def gpt4_topic_expander(topics):
    sub_topics = []
    for topic in topics:
        sub_topic_prompt = f"Expand on the following topic and generate subtopics:\n{topic}"
        
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{"role": "user", "content": sub_topic_prompt}]
        )
        
        sub_topic_content = response.choices[0].message["content"]
        sub_topics.append({
            'main_topic': topic,
            'sub_topics': sub_topic_content
        })
    return sub_topics

Combining Topics and Questions

The final step in AI integration combines topics, subtopics, and generates questions and answers. Importantly, the answers are derived directly from the lesson content to ensure accuracy and relevance.

def generate_questions_and_answers(sub_topics):
    qa_pairs = []
    for entry in sub_topics:
        main_topic = entry['main_topic']
        sub_topics_content = entry['sub_topics']
        
        question_prompt = f"Generate questions and answers for the following subtopics. Ensure that the answers are based solely on the provided content:\n{sub_topics_content}"
        
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[{"role": "user", "content": question_prompt}]
        )
        
        qa_content = response.choices[0].message["content"]
        qa_pairs.append({
            'main_topic': main_topic,
            'questions_and_answers': qa_content
        })
    return qa_pairs

5. Data Management

Processed data is stored and retrieved using Elasticsearch, ensuring efficient data management.

from elasticsearch import Elasticsearch

def store_data(index, doc_type, document):
    es = Elasticsearch()
    es.index(index=index, doc_type=doc_type, body=document)

def retrieve_data(index, query):
    es = Elasticsearch()
    res = es.search(index=index, body=query)
    return res

Example: Lesson 1 - Climate Change and Its Impacts

Main Topic 1: Understanding Climate Change

  • Sub-Topic 1.1: Definition and Causes
    • Question 1: What is climate change?
      • Answer: Climate change refers to long-term changes in temperature, precipitation, and other atmospheric conditions on Earth. It is primarily caused by human activities such as burning fossil fuels, deforestation, and industrial processes.
      • Answer for a 10-year-old: Climate change means the Earth is getting warmer and the weather is changing because of things people do, like driving cars and cutting down trees.
    • Question 2: What are the main causes of climate change?
      • Answer: The main causes of climate change include greenhouse gas emissions from burning fossil fuels, deforestation, and industrial activities.
      • Answer for a 10-year-old: Climate change happens because of pollution from cars and factories and cutting down trees.
  • Sub-Topic 1.2: Effects on the Environment
    • Question 1: How does climate change affect the environment?
      • Answer: Climate change affects the environment by causing more frequent and severe weather events, melting ice caps, rising sea levels, and disrupting ecosystems.
      • Answer for a 10-year-old: Climate change makes the weather more extreme, melts ice, raises sea levels, and harms animals and plants.
    • Question 2: What are the consequences of rising sea levels?
      • Answer: Rising sea levels can lead to coastal erosion, flooding, and loss of habitat for plants, animals, and even humans.
      • Answer for a 10-year-old: Rising sea levels can cause floods, destroy beaches, and make it hard for animals and people to live near the coast.

Conclusion

By following these steps, you can automate the creation of micro-lessons from documents, significantly reducing the time and effort required to produce high-quality educational content. This script leverages the latest in AI and machine learning to not only process and generate content but also to organize it for enhanced learning experiences. Whether you're looking to streamline content creation or enhance your educational offerings, this tool provides a robust solution.