S2s2t - Story To Screenplay To Trailer
This paper presents a novel process for transforming written narratives into cinematic video experiences. The system converts textual stories into structured screenplays, generates sequential image frames representing key scenes, and synthesizes them into cohesive videos with synchronized audio. This multi-stage approach addresses the challenge of translating static text into dynamic visual storytelling while preserving semantic integrity. The integration of screenplay structure enables detailed scene descriptions, environmental context, and character dialogue generation, while visual and audio modules contribute to narrative clarity and immersion. The proposed framework demonstrates how computational methods can bridge literature and cinema, making stories more engaging and accessible through psychologically informed cinematic techniques. In the proposed implementation, a large language model is employed to convert input stories into structured screenplays and subsequently generate sentence level visual prompts. These prompts are used to synthesize high-quality image frames through a diffusion-based image generation model, which are then transformed into short cinematic video clips using an AI-driven video synthesis system. Finally, text-to-speech models generate synchronized audio narration, and the complete trailer is assembled using multimedia processing libraries, resulting in an automated, non-real time, end-to-end pipeline for story-to-trailer generation. The system employs a sequential pipeline structured into six stages, ensuring modularity, traceability, and coherent storytelling. A four-lane GUI displays component outputs at every stage and traceability is achieved by mapping each image frame to its corresponding screenplay line and original story.
