Repl.it Clone

Browser can't run different types of code (eg. C++ or Rust or Go etc.) directly under the hood. It can use something like WebAssembly but it does not scale very well and also does not support all packages for all languages. This is what repl.it solves. It runs the code on the server and sends the output back to the browser.

Why is building repl.it hard ?

Remote code execution

Allowing users to execute any code on the server presents a significant security threat. It's crucial to implement a sandbox environment for code execution to mitigate this risk.

Long running processes

Resources are allocated for code execution and must remain so until the user explicitly stops the process or disconnects from the server. If not managed correctly, this could lead to substantial resource depletion.

Shell access inside browser

Implementing a shell-like interface within a browser is a complex task. It involves several potential challenges such as transmitting code to the server, retrieving and displaying the output with appropriate syntax highlighting, managing errors, and ensuring security measures are in place (for instance, preventing users from executing commands like rm -rf /).

File Storage

It's essential to securely and efficiently store all user-generated files. This process involves managing file permissions to ensure users can only access their personal files, offering a method for users to retrieve their files, and maintaining file persistence even after the user session ends.

Architecture of our `repl.it clone v1`

Storing Initial Code Templates

Upon visiting the website, users are presented with a code editor pre-populated with default code corresponding to their chosen language or framework. This initial code is stored in an object storage service (like S3) and is retrieved when the user accesses the website.

For example, base_go_code is used for the Go language, base_python_code for Python, and base_react_code for a React project.

Environment Setup and Initial File Transfer

The base code is transferred to a server where necessary dependencies are installed. For instance, a React project would require a setup with Node.js and React of appropriate versions. The root directory's contents are then sent to the user's browser, but only the top-level files and directories. Files within a directory are not immediately sent; they are transferred when the user opens the directory. Similarly, when a user double-clicks a file to open it, its contents are sent to the user's browser. This process is known as lazy loading.

Why lazy loading?

This approach reduces the amount of data sent to the user's browser, improving performance and reducing latency. It also allows for a more efficient use of resources on the server, as only the necessary files are transmitted.
Lazy loading also allows for collaboration features, as changes made by one user are immediately sent to the server and then broadcast to all other users connected to the same environment.

Real-time Changes and Persistence

Any modifications made by the user in the browser-based IDE are sent to the server via a websocket connection. However, only the differences, i.e., the changes made by the user, are transmitted. The server then applies these changes to the file in its file system. These files are also debounced to an object storage service (like S3) to ensure persistence even if the server crashes or restarts.

What/Why debouncing?

Debouncing is a technique used to limit the rate at which a function is called. In this context, it ensures that the changes made by the user are not sent to the object storage service too frequently, reducing the number of write operations and improving performance.
Debouncing also helps prevent data loss in case of network issues or server crashes. By batching changes and sending them at intervals, the likelihood of losing user modifications is minimized.

Cleaning Up

When a user disconnects from the server or explicitly stops the process, the server cleans up the resources allocated for that user. This includes:

Stopping the process executing the code
Flushing any changes made by the user to the object storage service
Removing the user's files from the server

Limitations & how to overcome them

Rudimentary Terminal

The v1 of our repl.it clone will have a basic terminal-like interface that allows users to execute commands. This interface will be limited to a predefined set of commands, such as ls, cd, cat, and echo. This limitation is imposed to prevent users from executing potentially harmful commands. In v2 we explore pseudo-terminal implementations to provide a more comprehensive terminal experience.

Limited Scalability

All users share the same server in v1, which can lead to performance issues and resource contention. In v2, we will explore containerization technologies like Docker to isolate users' environments and improve scalability and nix-based package managers to install dependencies efficiently.

Port Conflicts

Since all users share the same server in v1, port conflicts can arise when multiple users run code that requires the same port (eg - React application requiring port 3000). In v2, we will explore port forwarding techniques to dynamically assign ports to users' processes and avoid conflicts.

Architecture of our `repl.it clone v2`

Pseudo Terminals (PTY)

References:

Libraries used:

xterm.js: (client side) is a widely used library that enables the creation of terminal-like interfaces within a web browser. It offers a terminal emulator capable of executing commands and presenting the output. It allows us to capture and relay keystrokes to a server, where they are run in a pseudo-terminal (PTY) environment.
node-pty: (server side) is a Node.js module that provides an API for interacting with PTYs. It allows us to spawn a PTY process, send commands to it, and receive the output. This module is used in conjunction with xterm.js to create a full-fledged terminal experience within the browser.

Dockerizing a Next.js App with NEXT_PUBLIC Environment Variables

Main issue is that NEXT_PUBLIC_ environment variables are not available in the CI/CD pipeline. This is because NEXT_PUBLIC_ environment variables are needed at build time and not at runtime/deploy time. To solve this issue, we can pass this variables as build arguments to the Dockerfile. This way, the variables will be available at build time.

...
ARG NEXT_PUBLIC_RUNNER_URL
ARG NEXT_PUBLIC_API_URL
RUN touch .env.production
RUN echo "NEXT_PUBLIC_RUNNER_URL=${NEXT_PUBLIC_RUNNER_URL}" >> .env.production
RUN echo "NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL}" >> .env.production
RUN npm run build
...

docker build -t rohitshah1706/replit_frontend --build-arg NEXT_PUBLIC_RUNNER_URL=runner.local --build-arg NEXT_PUBLIC_API_URL=http://api-service.runner.local .

Setting up AWS S3 bucket locally with LocalStack

References:

Steps taken:

Create LocalStack S3 service in docker-compose file

services:
  s3:
    image: localstack/localstack-s3-test:latest-s3
    container_name: localstack_s3
    ports:
      - "4566:4566"

Create a new Local AWS Profile (called "localstack") to work with LocalStack

aws configure --profile localstack

AWS Access Key ID [None]: test
AWS Secret Access Key [None]: test
Default region name [None]: ap-south-1
Default output format [None]:

Check if the profile is created

aws configure list --profile localstack

      Name                    Value             Type    Location
      -                    --             -    --
   profile               localstack           manual    --profile
access_key     ****************test shared-credentials-file
secret_key     ****************test shared-credentials-file
    region               ap-south-1      config-file    ~/.aws/config

Create S3 bucket ("replit-clone-s3-bucket") with "localstack" profile

aws s3 mb s3://replit-clone-s3-bucket --endpoint-url http://localhost:4566 --profile localstack

# List all buckets
aws s3 ls --endpoint-url http://localhost:4566 --profile localstack

Copy a folder or file to the bucket

aws s3 cp ./target_folder s3://replit-clone-s3-bucket/ --recursive --endpoint-url http://localhost:4566 --profile localstack

List all files inside some bucket

aws s3 ls s3://replit-clone-s3-bucket/ --recursive --endpoint-url http://localhost:4566 --profile localstack

Future improvements

Protect websocket server

Put it behind some authentication mechanism.

Limited priviliges to the user in runner service

Runner service has S3 credentials in environment variables. If someone gets access to the runner service, they can access the S3 bucket. We can limit the priviliges of the user in the runner service to only their workspace directory.

Todo:

Implement Github OAuth & middlewares for backend route protection
Debounce file:refresh socket event in frontend to 200 ms
Replace the constant projectId in projectRouter with randomIdGenerator()
Install python3 in runner Dockerfile
Complete all the TODO comments in the code.
Look xterm & PTY videos and understand how it works.
Look into Socket.io with xterm.js - setup locally and understand how it works.
Look into online IDE: React Monaco File Tree
Look into file watching library: Chokidar on NPM
Look into only sending diffs to the server instead of the whole file Quill on NPM
VSCode like editor components: React-Ace on NPM
Websockets Security:
- WebSocket Security | Heroku
- WebSocket Authentication & Authorization | Youtube

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.vscode		.vscode
api-service		api-service
elk		elk
frontend		frontend
inventory		inventory
manifests		manifests
playbooks		playbooks
runner		runner
s3_base_contents		s3_base_contents
screenshots		screenshots
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
cleanup_docker.sh		cleanup_docker.sh
cleanup_k8s.sh		cleanup_k8s.sh
docker-compose.yml		docker-compose.yml
setup_k8s_with_s3.sh		setup_k8s_with_s3.sh
vault-pass.txt		vault-pass.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Repl.it Clone

Why is building repl.it hard ?

Remote code execution

Long running processes

Shell access inside browser

File Storage

Architecture of our `repl.it clone v1`

Storing Initial Code Templates

Environment Setup and Initial File Transfer

Why lazy loading?

Real-time Changes and Persistence

What/Why debouncing?

Cleaning Up

Limitations & how to overcome them

Rudimentary Terminal

Limited Scalability

Port Conflicts

Architecture of our `repl.it clone v2`

Pseudo Terminals (PTY)

Dockerizing a Next.js App with NEXT_PUBLIC Environment Variables

Setting up AWS S3 bucket locally with LocalStack

Future improvements

Protect websocket server

Limited priviliges to the user in runner service

Todo:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

RohitShah1706/CodePod

Folders and files

Latest commit

History

Repository files navigation

Repl.it Clone

Why is building repl.it hard ?

Remote code execution

Long running processes

Shell access inside browser

File Storage

Architecture of our repl.it clone v1

Storing Initial Code Templates

Environment Setup and Initial File Transfer

Why lazy loading?

Real-time Changes and Persistence

What/Why debouncing?

Cleaning Up

Limitations & how to overcome them

Rudimentary Terminal

Limited Scalability

Port Conflicts

Architecture of our repl.it clone v2

Pseudo Terminals (PTY)

Dockerizing a Next.js App with NEXT_PUBLIC Environment Variables

Setting up AWS S3 bucket locally with LocalStack

Future improvements

Protect websocket server

Limited priviliges to the user in runner service

Todo:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Architecture of our `repl.it clone v1`

Architecture of our `repl.it clone v2`

Packages