images¶

Image generation module for Direktor.

`direktor.core.images` ¶

Image generation module for Direktor.

This module handles image prompt generation and image creation using FLUX.

`generate_image_prompts(transcript, temp_dir)` ¶

Generate image prompts from a transcript using GPT.

Parameters:

Name	Type	Description	Default
`transcript`	`dict[str, Any]`	Transcript dictionary with `chunks` containing text and timestamps.	required
`temp_dir`	`str \| PathLike[str]`	Temporary directory for output files.	required

Returns:

Type	Description
`list[dict[str, Any]]`	List of image prompts with timestamps.

Source code in direktor/core/images.py

def generate_image_prompts(
    transcript: dict[str, Any], temp_dir: str | os.PathLike[str]
) -> list[dict[str, Any]]:
    """Generate image prompts from a transcript using GPT.

    Args:
        transcript: Transcript dictionary with ``chunks`` containing text and
            timestamps.
        temp_dir: Temporary directory for output files.

    Returns:
        List of image prompts with timestamps.
    """
    temp_path = Path(temp_dir)
    prompts_file = temp_path / "image_prompts.json"
    if prompts_file.exists():
        with prompts_file.open(encoding="utf-8") as f:
            data: list[dict[str, Any]] = json.load(f)
            return data

    settings = get_settings()
    raw_chunks = transcript.get("chunks", [])
    aggregated_chunks = aggregate_chunks(
        raw_chunks, target_duration=settings.target_segment_duration
    )

    system_prompt = (
        "You are an AI assistant that generates image prompts based on podcast "
        "transcripts. Generate a single, vivid image prompt that captures the main "
        "theme or most striking visual element from the given text."
    )
    user_template = (
        "Generate a Stable Diffusion generation prompt for the following podcast "
        "transcript segment:\n\nText: {text}\nTimestamp: {start} - {end}"
    )

    all_prompts: list[dict[str, Any]] = []
    for chunk in tqdm(aggregated_chunks, desc="Generating image prompts"):
        response = settings.client.chat.completions.create(
            model=settings.gpt4_model,
            messages=[
                {"role": "system", "content": system_prompt},
                {
                    "role": "user",
                    "content": user_template.format(
                        text=chunk["text"],
                        start=chunk["timestamp"][0],
                        end=chunk["timestamp"][1],
                    ),
                },
            ],
        )
        content = response.choices[0].message.content
        if content is None:
            raise ImageGenerationError("OpenAI returned empty image prompt.")
        all_prompts.append({"time": chunk["timestamp"][0], "prompt": content.strip()})

    with prompts_file.open("w", encoding="utf-8") as f:
        json.dump(all_prompts, f)
    return all_prompts

`generate_images(prompts, temp_dir)` ¶

Generate images from prompts using the FLUX model.