Spatial Lingo is a spatialized language practice experience, guiding users to identify and describe objects around their environment in a target language. This was made possible using Meta libraries, such as Llama, Mixed Reality Utility Kit (MRUK), and the Voice SDK. This experience supports both hand tracking and controllers.
Follow Golly Gosh (the polyglot!) as they lead you through your own, real-world space, allowing you to practice your vocabulary using familiar objects. Grow the language tree by completing lessons from Golly Gosh, learning nouns, verbs, and adjectives along the way!
The Spatial Lingo project helps Unity developers understand and develop for multiple Meta features: Passthrough Camera API (PCA), Voice SDK, Interaction SDK, Mixed Reality Utility Kit (MRUK), Llama API, and Unity Sentis. The main scene as well as multiple sample scenes demonstrate the implementation and usefulness of each feature.
| Gym Scene | Word Cloud Scene | Character Scene | Camera Image Scene |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
First, ensure you have Git LFS installed by running this command:
git lfs installThen, clone this repo using the "Code" button above, or this command:
git clone https://github.com/oculus-samples/Unity-SpatialLingo.gitFor development, configure your Llama API key in Assets/SpatialLingo/Resources/ScriptableSettings/SpatialLingoSettings.asset.
Important: Do not ship Quest apps with embedded API keys, as they can be extracted from the app binary. For production, use LlamaRestApi.GetApiKeyAsync to implement server-side authentication. See the Llama API documentation for details.
- Make sure you're using Unity 6000.0.51f1 or newer
- Load the Assets/SpatialLingo/Scenes/MainScene.unity scene
- Open the Meta XR Simulator
- Start Play Mode
Each of these features have been built to be accessible and scalable for other developers to take and build upon in their own projects.
Spatial Lingo is able to identify objects around the user's environment, allowing for spatial placement and dynamic generation of language lessons.
Dynamic vocabulary lessons are generated as the user progresses in growing the langauge tree. After objects are identified in the user's environment, relevant verbs and adjectives for those objects are generated to allow for more lesson variety.
Golly Gosh is able to speak in several different languages. Voice is dynamically synthesized from text, so they can teach users proper pronounciation during language lessons.
Users' speech is transcribed when presented with a word cloud, which is also supported in several languages.
A user's response is sent to Llama to determine if the user has responded well enough to complete a given lesson's word cloud.
This project makes use of the following plugins and software:
- Unity 6000.0.51f1 or newer
- YOLO (with COCO dataset)
- See MetaSdk.md for all Meta libraries used
More information about the services and systems of this project can be found in the Documentation section.
Sample scenes can be found at Assets/SpatialLingo/Scenes.
To run, open WordCloudSample.unity and enter play mode with the simulator. Click the "Activate Microphone" button to start transcription thorugh your microphone.
See LICENSE.md.
See CONTRIBUTING.md.





