Robots operate in a world of continuous, high-bandwidth sensory streams, yet they often lack a unified way to "remember"
the spatial and temporal context of their experiences. ChronosEmbodied explores the intersection of
embodied AI and multimodal memory, turning raw sensor data into a persistent, queryable world model.
Technical Focus Areas
-
Unified Multimodal Indexing
We are building a schema to co-index heterogeneous data—including LiDAR point clouds, RGB video, semantic captions, and 6-DoF robot poses—into a single, queryable latent space.
-
Spatial-Temporal Anchoring
Developing a memory system that allows a robot to query its history by where and when it saw an object or event relative to its own trajectory.
-
Sensory Synthesis & Forgetting
Designing intelligent "forgetting" algorithms that decay low-importance sensory frames while preserving high-salience landmarks and novel events.
-
Cross-Modal Recall
Enabling the robot to use one modality to "trigger" another—for example, using a text-based instruction to retrieve a specific LiDAR segment of a room.
Practical Application: Spatio-Temporal Queries & Reasoning
We focus on solving complex retrieval problems that allow a robot to leverage its past experiences to navigate and assist in dynamic environments:
- Cross-Modal Retrieval: Answering queries like: "Find the room where I heard the sound of water leaking while I was moving at high speed."
- Temporal Change Detection: Identifying shifts in the physical environment over long durations: "Where did the blue toolbox go?"
- Spatial Grounding: Linking natural language captions (e.g., "The kitchen island") to specific 3D LiDAR point cloud segments for precise manipulation.