Adding New Data Types to be Scanned in the by the Django Backend

Extending TeleView's data indexing to other (non-smurf-data) formats is something that was intently designed into the TeleView project during initial development.

The scope of this issue to consider the steps needed to upgrade TeleView's backend Django project and the MongoDB database. Once this task is completed, a new issue can be created to consider upgrades to the frontend, which will be less abstract and depend highly on the new data types and their use/value to users.

We expected to add 3 or fewer new data types over the lifetime of TeleView. The addition of new data is expected, but not overly abstracted. Imagine you are a developer and want to add a new data type called `elves`, then anywhere you see a file or function named `smurf` (the data type used for initial development) you will need to make a new parallel file or function called elves. 

> [!WARNING]
> This is not a comprehensive guild to adding new data types to TeleView, this is only an overview of major systems that **must** be upgraded. 

# New data location, naming, and parsing

1. TeleView was designed to find all data under a single location. Externally, this data location is specified in the .env file under the variable name `TELEVIEW_PLATFORMS_DATA_DIR`, we will use `platforms_dir` to talk about this this location in the rest of the issue report. 

2. The `platforms_dir` is expected to have any number of subdirectories, each sub directory is considered a `platform_name`.

3. The location `platform_dir/platform_name` is searched to find subdirectories with a name that matches a data_scraper_function in [find.py](https://github.com/simonsobs/TeleView/blob/main/tvapi/api/survey/find.py#L88).  These are functions collected at Python import-time based on the decorator *@data_scraper*. At the time of writing ,this only includes a single function `smurf` see the included image below. 

<img width="1460" alt="image" src="https://github.com/simonsobs/TeleView/assets/8564941/1ed32d23-5163-4699-88f4-e67f0f33b8ae">

4. A data-specific parsing generator must be designed for that type data type. This function must be generator so that single threaded memory usages remains stable for any sized file system being scrapped/scanned. See the `smurf` generator in the image and link provided in 3.

> [!IMPORTANT]
> New data must be specified in a directory per-platform. 

> [!IMPORTANT]
> To be included in the MongoDB, a data-specific parsing generator is expected to have the same name a the directory where the data is found. I.e. the *smurf* directory houses data that of a type the is parsed be the *smurf* generator.

# New data upload to database

Database uploads occur in the Python class DatabaseEvents in [database.py](https://github.com/simonsobs/TeleView/blob/main/tvapi/api/survey/database.py). 

*Smurf* specific operations are denote with methods with the **smurf_** in the method name. The method `DatabaseEvent.upload_data()`—fully resets the database and remakes indexing operations. The method `DatabaseEvent.update_data()`—which uploads data to an existing MongoDB collection.

The existing function definitions at bottom of [database.py](https://github.com/simonsobs/TeleView/blob/main/tvapi/api/survey/database.py) may not need to be updated. These were deigned to update multiple datatypes at once. If updating multiple data types per a single event trigger is still the desire behavior, updating the methods described in the paragraph above may be sufficient.

You will need to update the the allowed post data types in at the top of the file [post_status.py](https://github.com/simonsobs/TeleView/blob/main/tvapi/api/survey/post_status.py). Suppose a new data type called `elves` is being introduced, the strings "scan_elves" should be added to Python sets `allowed_status_types` and `full_reset_types`, see the image below. 

<img width="1314" alt="image" src="https://github.com/simonsobs/TeleView/assets/8564941/d1b1cdae-e5aa-4c62-aa21-c8abd1b90002">

> [!IMPORTANT]
> The file [database.py](https://github.com/simonsobs/TeleView/blob/main/tvapi/api/survey/database.py) contains the code that uploads parsed data to MongoDB.

> [!IMPORTANT]
> Posting a status a status to the Django SQLite database is only allowed when that status type has been defined in [post_status.py](https://github.com/simonsobs/TeleView/blob/main/tvapi/api/survey/post_status.py) at the top of the file. A future developer could choose to remove this restriction. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding New Data Types to be Scanned in the by the Django Backend #24

New data location, naming, and parsing

New data upload to database

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adding New Data Types to be Scanned in the by the Django Backend #24

Description

New data location, naming, and parsing

New data upload to database

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions