ChronosEmbodied

Multimodal Temporal Persistence and Sensory Synthesis for Robotic Autonomy

Robots operate in a world of continuous, high-bandwidth sensory streams, yet they often lack a unified way to "remember" the spatial and temporal context of their experiences. ChronosEmbodied explores the intersection of embodied AI and multimodal memory, turning raw sensor data into a persistent, queryable world model.

Technical Focus Areas

Unified Multimodal Indexing We are building a schema to co-index heterogeneous data—including LiDAR point clouds, RGB video, semantic captions, and 6-DoF robot poses—into a single, queryable latent space.
Spatial-Temporal Anchoring Developing a memory system that allows a robot to query its history by where and when it saw an object or event relative to its own trajectory.
Sensory Synthesis & Forgetting Designing intelligent "forgetting" algorithms that decay low-importance sensory frames while preserving high-salience landmarks and novel events.
Cross-Modal Recall Enabling the robot to use one modality to "trigger" another—for example, using a text-based instruction to retrieve a specific LiDAR segment of a room.

Practical Application: Spatio-Temporal Queries & Reasoning

We focus on solving complex retrieval problems that allow a robot to leverage its past experiences to navigate and assist in dynamic environments:

Cross-Modal Retrieval: Answering queries like: "Find the room where I heard the sound of water leaking while I was moving at high speed."
Temporal Change Detection: Identifying shifts in the physical environment over long durations: "Where did the blue toolbox go?"
Spatial Grounding: Linking natural language captions (e.g., "The kitchen island") to specific 3D LiDAR point cloud segments for precise manipulation.