Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
974 commits
Select commit Hold shift + click to select a range
17b13c2
only consider initial load a success when a certain file is created t…
williamstein Sep 26, 2025
1042a4e
add ssh server to projects
williamstein Sep 26, 2025
1ba64a0
make ssh gateway understand standard ways to represent project_id for…
williamstein Sep 27, 2025
bc27f98
update the frontend ui with new username
williamstein Sep 27, 2025
8e1414e
add an endpoint to get ssh keys for a specific project
williamstein Sep 27, 2025
d7f6252
refactor code for calling hub from project
williamstein Sep 27, 2025
da7de58
add some info about where conat hub api is used
williamstein Sep 27, 2025
0bffaf4
project ssh: make it have all the env vars; also make logs visible vi…
williamstein Sep 27, 2025
c46d260
write code for managing authorized keys files
williamstein Sep 27, 2025
a1770b9
automatically set authorized keys
williamstein Sep 27, 2025
798615b
project api: endpoint to update authorized_keys file
williamstein Sep 27, 2025
33e3088
make it so ssh keys are automatically updated for projects as soon as…
williamstein Sep 27, 2025
b2ab927
account for stale state in updating ssh keys
williamstein Sep 27, 2025
32fc778
Merge branch 'master' into fs2
williamstein Sep 27, 2025
2917aab
Merge branch 'master' into fs2
williamstein Sep 27, 2025
5c32b6f
Merge branch 'master' into fs2
williamstein Sep 27, 2025
62d747d
Merge branch 'master' into fs2
williamstein Sep 27, 2025
1fb33f1
so a stack trace for this close -- hopefully this is better (?)
williamstein Sep 27, 2025
aaa4e1d
improve project start/stop/restart -- not done yet
williamstein Sep 27, 2025
9444e6d
make project start more robust
williamstein Sep 27, 2025
a21a4b0
project frontend: way to surface any start/stop error and a "force st…
williamstein Sep 27, 2025
96fbc22
mainly wiring up proxy server for project (doesn't work properly yet)
williamstein Sep 27, 2025
d83e2eb
work in progress on project proxy server
williamstein Sep 27, 2025
bb6ccd7
project proxy is fully working
williamstein Sep 27, 2025
a91e026
implement http proxy for projects
williamstein Sep 28, 2025
065e91c
refactor file-server http proxy code to be more flexible
williamstein Sep 28, 2025
fda4d38
integrate project proxy servers with hub (deleting the old ones)
williamstein Sep 28, 2025
1a87271
make the UI for managing ssh keys more like what you get with a file …
williamstein Sep 28, 2025
62e915a
allow "proxy" as alias for "server" for compat with vscode
williamstein Sep 28, 2025
274bedb
clean up the port naming; ensure core mutagen forwards are defined on…
williamstein Sep 28, 2025
c811daa
project startup -- moving scripts around, etc.
williamstein Sep 28, 2025
8f82dda
project --> sshd rename, which makes more sense; always kill file-ser…
williamstein Sep 28, 2025
1294d59
fix some timeouts related to starting projects
williamstein Sep 28, 2025
4828f56
fix some backup conflict issues (mostly in the UI)
williamstein Sep 28, 2025
a26b099
project runner: unmounting when point doesn't exist shouldn't throw e…
williamstein Sep 28, 2025
6fc1790
render bootlog info more nicely
williamstein Sep 29, 2025
6e43c34
support multiple project runners with stickiness and explicit ability…
williamstein Sep 29, 2025
0c16f5c
fix some errors in testing
williamstein Sep 29, 2025
6d6dd20
update project runner
williamstein Sep 29, 2025
cdd6807
rewrite named server launchers
williamstein Sep 29, 2025
e38be79
revamp the runner recipes to use cmd/args and also properly manage ch…
williamstein Sep 29, 2025
f4e38d3
get basic launching of servers to work again...
williamstein Sep 30, 2025
7b843de
add xpra as a server app
williamstein Sep 30, 2025
a26b133
factor out getting the rootfs base from an OCI image
williamstein Sep 30, 2025
4235db0
project runner: working on improving startup -- WIP on better error …
williamstein Sep 30, 2025
27aefd0
fix issues with jupyter launcher
williamstein Sep 30, 2025
6549230
reorg how project startup works to be more log friendly and better tr…
williamstein Sep 30, 2025
58f0567
make project control frontend slightly more usable
williamstein Sep 30, 2025
f663de7
add progress reporting for pulling podman image
williamstein Sep 30, 2025
770e183
Merge branch 'master' into fs2
williamstein Oct 1, 2025
88f4d0f
project-runner: add image extract progress bar updates
williamstein Oct 1, 2025
909bd9e
show time estimate in bootlog
williamstein Oct 1, 2025
808c7be
Merge branch 'master' into fs2
williamstein Oct 1, 2025
3adcbc1
fix bug in an error message and a confusing kernel switch message
williamstein Oct 1, 2025
f9e4831
make install of mutagen more cross-platform
williamstein Oct 1, 2025
5225d89
fixing install of sandbox binaries for macos
williamstein Oct 1, 2025
b03757c
small fix so automated app signing for mac works again
williamstein Oct 1, 2025
df42e08
share server -- get it to work using conat fs; not efficient yet
williamstein Oct 1, 2025
4b6856d
Merge branch 'fs2' of github.com:sagemathinc/cocalc into fs2
williamstein Oct 1, 2025
1dfdee1
implement share server directory listing efficiently using the getLis…
williamstein Oct 1, 2025
f0ea98d
rewriting share server to work with conat/fs2
williamstein Oct 1, 2025
8e7db5c
Merge branch 'master' into fs2
williamstein Oct 1, 2025
0e4a543
write an architecture overview
williamstein Oct 2, 2025
9d99e48
Merge branch 'revert-sticky-routing' into fs2
williamstein Oct 2, 2025
775a611
remove sync-mtime -- we will use rsync for this when we need somethin…
williamstein Oct 2, 2025
d27f613
add openssh to the sandbox tools
williamstein Oct 2, 2025
e8c8ebc
add way for user to explicitly save a project
williamstein Oct 2, 2025
5969b1c
fixing issues with cloning projects (not done)
williamstein Oct 2, 2025
37170dd
working on cloning/saving project stuff
williamstein Oct 3, 2025
cf64ed5
fix frontend issue with opening a file after searching
williamstein Oct 3, 2025
e270a4e
rewriting ssh keys approach
williamstein Oct 3, 2025
d4dc558
properly configure ssh config when cloning project
williamstein Oct 3, 2025
0c4689c
change base quota to 1GB instead of 3GB
williamstein Oct 3, 2025
61bfe3e
refactor so project-runner now depends on file-server instead of the …
williamstein Oct 3, 2025
e7d0722
pasta --> slirp networking, to workaround a bug (and also maybe it is…
williamstein Oct 3, 2025
13c5366
implement support for main file-server for using existing btrfs files…
williamstein Oct 3, 2025
b15520b
supporting using btrfs on the project-runner
williamstein Oct 3, 2025
8602812
script to easily build podman from source
williamstein Oct 3, 2025
88a6130
implement swap properly and also disk quota on runners
williamstein Oct 3, 2025
9bfc8bc
add /scratch for projects
williamstein Oct 3, 2025
ef8527d
Merge branch 'master' into fs2
williamstein Oct 3, 2025
b6bf952
ts: tests
williamstein Oct 3, 2025
3b5fd3d
upgrade chokidar (which I think we don't even use); rewrite our actua…
williamstein Oct 3, 2025
c1e225f
do not use simple quotas with btrfs for now -- it's way too confusing…
williamstein Oct 3, 2025
415fc7e
sandbox: fix two security bugs checking parameters
williamstein Oct 3, 2025
0591ea0
implement quota for snapshots
williamstein Oct 3, 2025
19f249f
implement api so users can get the exact snapshot quotas
williamstein Oct 3, 2025
61816fb
fix bug in idle timeout monitor (uncaught exception)
williamstein Oct 4, 2025
5d77b18
add mutagen sync to file-server
williamstein Oct 4, 2025
00cae8b
save some code related to running bees
williamstein Oct 5, 2025
acfb6bd
switch to running bees using a simple nodejs function instead of some…
williamstein Oct 5, 2025
2b16adc
install and use bees
williamstein Oct 5, 2025
c3ba3e2
work in progress on wiring together filesystem sync api's
williamstein Oct 5, 2025
d5efcda
internal file-sync: first working version
williamstein Oct 5, 2025
d3ae3c8
file-sync: add ignores and ban certain paths
williamstein Oct 5, 2025
a6a88fd
sync: normalize output path when getting sync
williamstein Oct 5, 2025
aef97d5
sync: tweaking file watching code
williamstein Oct 5, 2025
c426f91
re-implementing how file watching and loading from disk for sync work
williamstein Oct 6, 2025
66dfc48
switch to lru cache for sandbox file state
williamstein Oct 6, 2025
7ecccdc
remove traces of older ignore on save approach
williamstein Oct 6, 2025
b293392
file watching: improve typescript
williamstein Oct 6, 2025
92b4906
upgrade rspack
williamstein Oct 6, 2025
2cad2f5
handling of directories/files being deleted now that we switched to c…
williamstein Oct 6, 2025
17b1a75
add back signal option
williamstein Oct 6, 2025
e4e065d
do not close on any file unlink
williamstein Oct 6, 2025
f4bf516
Merge branch 'master' into fs2
williamstein Oct 6, 2025
3923467
further comment out file stars and make a note that the pr was too ha…
williamstein Oct 6, 2025
21bda0f
refactoring backend file watcher to make it easier to understand and …
williamstein Oct 6, 2025
12c3c75
improve watching of file on disk by editor
williamstein Oct 6, 2025
22d7192
...
williamstein Oct 6, 2025
1ef285f
improve directory listing watcher (using stats from chokhidar, fixing…
williamstein Oct 6, 2025
1f7f308
add another missing "deadline" to dmp
williamstein Oct 7, 2025
a75d21e
change stability thresh
williamstein Oct 7, 2025
71cfaf7
Merge branch 'master' into fs2
williamstein Oct 7, 2025
77e785f
add back patch
williamstein Oct 7, 2025
72f3c7d
sync -- it doesn't matter when the file was read, just that it has be…
williamstein Oct 7, 2025
5e9edbc
backend watch: adjust some params and disable extremely verbose logging
williamstein Oct 7, 2025
7db8b39
implement patch approach to loading ipynb file from disk
williamstein Oct 8, 2025
8775c90
for now at least, disable the clean webpack plugin by default, since …
williamstein Oct 8, 2025
782400e
a disk usage button in explorer
williamstein Oct 8, 2025
7450767
disk usage indicator done-ish
williamstein Oct 8, 2025
d4cfddf
disable disk quota for lite mode (use your OS instead)
williamstein Oct 8, 2025
45cbb13
remove the frontend explorer "stale" warning when project not running…
williamstein Oct 8, 2025
f148b02
delay the disk spinner
williamstein Oct 8, 2025
e29c889
try an idea to prevent any save-to-disk-while-editing interference
williamstein Oct 8, 2025
a46ab15
issue #6377 (not finished) -- remove ALL coffeescript from @cocalc/fr…
williamstein Oct 8, 2025
b8931b0
remove the backend sagews support completely
williamstein Oct 8, 2025
7a0fb43
delete all the old HTML templates (that were used for the jquery+coff…
williamstein Oct 8, 2025
054d7c9
do not install ancient threejs -- no longer need it
williamstein Oct 8, 2025
b4b35c1
remove +New sagews
williamstein Oct 8, 2025
530f4b0
start working on sagews converter; delete old code
williamstein Oct 8, 2025
33d9145
tweak sandbox file watch params a bit
williamstein Oct 8, 2025
5f2749b
deprecate sage worksheets in the share server
williamstein Oct 8, 2025
17adf29
get content of sagews
williamstein Oct 8, 2025
71be18b
less reliance on time (for sync)
williamstein Oct 8, 2025
980174b
finish writing basic sagews converter
williamstein Oct 8, 2025
e287c1e
delete a lot of sagews scripts from smc_pyutil
williamstein Oct 8, 2025
118e00f
delete more sagews code from the frontend
williamstein Oct 8, 2025
41980ab
add sagews --> markdown as well
williamstein Oct 8, 2025
ddaf3bc
upgrade to newest http-proxy-3
williamstein Oct 9, 2025
003d31e
Merge branch 'master' into fs2
williamstein Oct 9, 2025
c740478
merge conflict
williamstein Oct 9, 2025
e9a688d
merge conflict
williamstein Oct 9, 2025
49d9e65
Merge branch 'master' into fs2
williamstein Oct 14, 2025
36c8879
Merge branch 'master' into fs2
williamstein Oct 22, 2025
f6dfe1a
Merge branch 'master' into fs2
williamstein Oct 23, 2025
6155220
fix issue with flex
williamstein Oct 23, 2025
e05f783
Merge branch 'master' into fs2
williamstein Nov 2, 2025
1d355e5
terminal: allow CPR responses (needed for jupyter console and codex)
williamstein Nov 2, 2025
f3fbf2d
fix showing CPR sequences on terminal refresh
williamstein Nov 2, 2025
1211b0b
install reflect sync
williamstein Nov 2, 2025
23e53f6
update reflect-sync
williamstein Nov 2, 2025
db677a5
re-vamp terminals so that control messages are processed from only th…
williamstein Nov 3, 2025
1d34f25
if sshpiperd fails to start, show clear error and terminate
williamstein Nov 3, 2025
31971e9
better strategy to determine control respones from xterm.js -- do the…
williamstein Nov 3, 2025
8728367
work in progress switching from mutagen to reflect for port forwarding
williamstein Nov 3, 2025
10fe4db
upgrade reflect-sync
williamstein Nov 3, 2025
3e03806
upgrade reflect-sync again
williamstein Nov 3, 2025
4d8b901
run port forwards in the sidecar
williamstein Nov 3, 2025
5a43b88
broken work in progress on adding delta saving
williamstein Nov 3, 2025
38cb2f1
add frontend side of writing file using delta
williamstein Nov 3, 2025
880bda8
use writeFileDelta to make writing sync editing files to disk more ef…
williamstein Nov 3, 2025
85155a3
make sandbox watch close idempotent
williamstein Nov 3, 2025
a85aa51
rewriting some docs a little
williamstein Nov 3, 2025
5d73c98
create script to bundle cocalc-lite properly
williamstein Nov 3, 2025
30e70cd
update reflect-sync
williamstein Nov 3, 2025
5fdda4c
Merge branch 'fs2' of github.com:sagemathinc/cocalc into fs2
williamstein Nov 3, 2025
5fe6108
package command to build lite bundle, and also place it in lite package
williamstein Nov 4, 2025
6676bf7
working on lite bundle: put in stub for new function needed for non-l…
williamstein Nov 4, 2025
68fb584
switch from using the brittle adhoc inefficient script "build-tarball…
williamstein Nov 4, 2025
78cceb3
cocalc-lite: support macos ncc bundles
williamstein Nov 4, 2025
2c9e8a6
project-runner: use ncc to build bundle instead of a one-off brittle …
williamstein Nov 4, 2025
6e67b2f
bundle the project package as well
williamstein Nov 4, 2025
200d42f
modify podman project runner code to optionally support using project…
williamstein Nov 4, 2025
3b8ed45
package up the backend bin directory with the project bundle
williamstein Nov 4, 2025
94ee60d
properly install reflect-sync binaries
williamstein Nov 4, 2025
81f3ce6
incorporating reflect-sync -- work in progress
williamstein Nov 5, 2025
45d19f5
switching from mutagen to reflect for project file sync
williamstein Nov 5, 2025
c509566
starting a new sqlite based db query for cocalc-lite
williamstein Nov 5, 2025
3f48371
update reflect-sync
williamstein Nov 5, 2025
ef8ea55
cocalc 2 misc
williamstein Nov 9, 2025
4701efd
reflect sync integration work
williamstein Nov 10, 2025
da5d58d
Merge branch 'master' into fs2
williamstein Nov 15, 2025
962a06f
Merge branch 'master' into fs2
williamstein Nov 15, 2025
0ea44b5
Merge branch 'master' into fs2
williamstein Nov 15, 2025
b7cd503
fix merge conflict
williamstein Nov 15, 2025
884aa9a
fixing a bunch more merge conflicts...
williamstein Nov 15, 2025
354821c
fix failing depcheck for backend; update reflect-sync
williamstein Nov 16, 2025
61fb91f
fix a bunch of depcheck issues
williamstein Nov 16, 2025
9fb7b73
fix version inconsistency
williamstein Nov 16, 2025
06de699
fix problem with open command being shadowed in project
williamstein Nov 16, 2025
9619925
reflect-sync upgrade
williamstein Nov 17, 2025
a979d02
upgrade rustic and fix a unit test
williamstein Nov 17, 2025
b9e1665
remove the patch length thresh for using patches to represent file ch…
williamstein Nov 17, 2025
4a959d4
fix rustic install
williamstein Nov 17, 2025
181241c
fix another failing test involving file watching on the backend
williamstein Nov 17, 2025
5fa10b5
fix the sandbox unit tests
williamstein Nov 17, 2025
15beb60
fixing more unit tests (due to chokidar subs for node watch)
williamstein Nov 18, 2025
49d6703
fix in conat core service test
williamstein Nov 18, 2025
98bbb8d
fix some sync conflict unit tests
williamstein Nov 18, 2025
99c4d2f
fix unit tests of sync (which were broken by changes elsewhere)
williamstein Nov 18, 2025
7372756
fix another badly written sync test
williamstein Nov 18, 2025
465228e
sync: fix event not being emited related to loading from disk
williamstein Nov 18, 2025
4251070
explain why a test fails involving sync editing and deleting a file
williamstein Nov 18, 2025
1c11569
updating more unit tests to chokidar semantics
williamstein Nov 18, 2025
edb6cc2
do not show disk usage when quota isn't known
williamstein Nov 18, 2025
ada9f62
only show ssh gateway when configured; move copy button to left side …
williamstein Nov 18, 2025
674df54
delete the "ghost file tabs" tracking -- that was a leftover hack tha…
williamstein Nov 18, 2025
c4c9cbd
simplify the file tab resize prevention code
williamstein Nov 18, 2025
d194c82
typescript error
williamstein Nov 18, 2025
e195276
lite mode: set that project is running in context (since it is always…
williamstein Nov 18, 2025
92d2e75
cocalc-lite --> cocalc-plus
williamstein Nov 18, 2025
01834b2
lite: better-sqlite --> node:sqlite
williamstein Nov 18, 2025
5604397
lite: hook user queries into the new sqlite system
williamstein Nov 18, 2025
f905108
add unit test of new sqlite hub lite functionality
williamstein Nov 18, 2025
5d781cd
lite: new sqlite db -- make changefeed provide initial result
williamstein Nov 18, 2025
c95a131
remove the old dkv-based user-query implementation of the lite databa…
williamstein Nov 18, 2025
36cb858
lite: surface admin settings via customize
williamstein Nov 18, 2025
98a31a0
lite: add no-op touch
williamstein Nov 18, 2025
3608272
fix how reflect-sync is installed into backend
williamstein Nov 18, 2025
07a62d5
add back normalize openai models
williamstein Nov 18, 2025
da9c590
fix issue with an import from file-server and our jest tests
williamstein Nov 18, 2025
97fb42d
switch to using evaluateWithLangChain for user defined model evaluation
williamstein Nov 18, 2025
f3ce894
llm: rewrite user defined llm to use langchain instead of our old cus…
williamstein Nov 18, 2025
dcbb4e2
new @cocalc/ai package
williamstein Nov 18, 2025
999d78f
drop heavy gpt3-tokenizer from @cocalc/ai and add a simple heuristic …
williamstein Nov 18, 2025
c846050
add basic test suite to @cocalc/ai
williamstein Nov 18, 2025
8fcfdc1
add more unit tests
williamstein Nov 18, 2025
a9ef434
@cocalc/ai: add streaming test
williamstein Nov 18, 2025
95bd9d1
@cocalc/ai: more tests of history
williamstein Nov 18, 2025
dd63513
ai: more unit tests
williamstein Nov 18, 2025
8c67958
ai: more tests
williamstein Nov 18, 2025
7bd445d
work in progress switching server/llm to use the new @cocalc/ai package
williamstein Nov 18, 2025
78cda79
ai dic generator -- don't allow request to create notebook if no kern…
williamstein Nov 18, 2025
4e6c1a5
switch to using @cocalc/ai from server
williamstein Nov 18, 2025
f05e19a
swap hint v solution
williamstein Nov 18, 2025
7b348bc
lite: support llm conat server endpoint
williamstein Nov 18, 2025
fbc478c
lite: more settings
williamstein Nov 19, 2025
cdedb9d
lite: get llm @mention to work
williamstein Nov 19, 2025
7085f85
lite: add easy way to configure AI api keys
williamstein Nov 19, 2025
33b5e64
make state of lite ai config better
williamstein Nov 19, 2025
2812a65
hopefully make ai keys not be mistaken for passwords
williamstein Nov 19, 2025
d37f2db
remove some leftover not-important websocketfs remnants
williamstein Nov 19, 2025
5da29fd
remove zstd-napi (switch to native nodejs)
williamstein Nov 19, 2025
b40ae85
be explicit about needing nodejs v22+
williamstein Nov 19, 2025
8d2c776
switch better-sqlite3 --> node:sqlite, which shrinks size and removes…
williamstein Nov 19, 2025
a6c1bd5
fix more typescript issues related to using sqlite:node
williamstein Nov 19, 2025
541c833
aligning versions
williamstein Nov 19, 2025
03279df
finish aligning version
williamstein Nov 19, 2025
a3b7fec
fix bug in better-sqlite3 --> node:sqlite switch
williamstein Nov 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
54 changes: 25 additions & 29 deletions .github/workflows/make-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
detached: true
- uses: actions/checkout@v4
- name: Install python3 requests
run: sudo apt-get install python3-requests
run: sudo apt-get install python3-requests python3-yapf
- name: Check doc links
run: cd src/scripts && python3 check_doc_urls.py || sleep 5 || python3 check_doc_urls.py

Expand Down Expand Up @@ -91,19 +91,15 @@ jobs:
# cache: "pnpm"
# cache-dependency-path: "src/packages/pnpm-lock.yaml"

- name: Download and install Valkey
run: |
VALKEY_VERSION=8.1.2
curl -LOq https://download.valkey.io/releases/valkey-${VALKEY_VERSION}-jammy-x86_64.tar.gz
tar -xzf valkey-${VALKEY_VERSION}-jammy-x86_64.tar.gz
sudo cp valkey-${VALKEY_VERSION}-jammy-x86_64/bin/valkey-server /usr/local/bin/
- name: Install btrfs-progs and bup for @cocalc/file-server
run: sudo apt-get update && sudo apt-get install -y btrfs-progs bup

- name: Set up Python venv and Jupyter kernel
run: |
python3 -m pip install --upgrade pip virtualenv
python3 -m virtualenv venv
source venv/bin/activate
pip install ipykernel
pip install ipykernel yapf
python -m ipykernel install --prefix=./jupyter-local --name python3-local --display-name "Python 3 (Local)"


Expand All @@ -128,30 +124,30 @@ jobs:
name: "test-results-node-${{ matrix.node-version }}-pg-${{ matrix.pg-version }}"
path: 'src/packages/*/junit.xml'

report:
runs-on: ubuntu-latest
# report:
# runs-on: ubuntu-latest

needs: [test]
# needs: [test]

if: ${{ !cancelled() }}
# if: ${{ !cancelled() }}

steps:
- name: Checkout code
uses: actions/checkout@v4
# steps:
# - name: Checkout code
# uses: actions/checkout@v4

- name: Download all test artifacts
uses: actions/download-artifact@v4
with:
pattern: "test-results-*"
merge-multiple: true
path: test-results/
# - name: Download all test artifacts
# uses: actions/download-artifact@v4
# with:
# pattern: "test-results-*"
# merge-multiple: true
# path: test-results/

- name: Test Report
uses: dorny/test-reporter@v2
with:
name: CoCalc Jest Tests
path: 'test-results/**/junit.xml'
reporter: jest-junit
use-actions-summary: 'true'
fail-on-error: false
# - name: Test Report
# uses: dorny/test-reporter@v2
# with:
# name: CoCalc Jest Tests
# path: 'test-results/**/junit.xml'
# reporter: jest-junit
# use-actions-summary: 'true'
# fail-on-error: false

32 changes: 27 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ src/restart_hub
src/restart_compute
.coffee

src/packages/project/build

src/smc-hub/run/hub[0-9].js
src/smc-hub/lti
src/smc-hub/landing
Expand All @@ -110,11 +112,6 @@ src/smc-build/prometheus/alertmanager.yml
.gitignore
postgres-env

# related to testing
src/smc-hub/test/*.js
src/smc-hub/test/*.map
src/smc_sagews/smc_sagews/metastore_db/

# autogenerated
src/smc-webapp/_colors.sass

Expand Down Expand Up @@ -164,8 +161,33 @@ src/.claude/settings.local.json

# test reports by jest-junit
junit.xml


sea-prep.blob
cocalc
cocalc-lite.tar.*
*.egg-info
.python-version
src/packages/lite/sea/cocalc*gz
src/packages/lite/sea/cocalc*xz
src/packages/lite/sea/cocalc*zip
src/packages/lite/sea/cocalc*gnu


# autogenerated docs
**/cocalc-api/site/**
*.pkg
*.zip

src/packages/lite/build/
src/packages/project/build/
src/packages/project-runner/build/


codex.sh
g
g-cs
g-cs2
g-ssh-server
g-bundle
g2
236 changes: 236 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
# CoCalc2 Architecture Overview (Draft)

> This is a working draft meant to capture the current design in one place.

---

## Goals & Non‑Goals

**Goals**

- Fast, durable, multi‑tenant project storage with clear quotas.
- Predictable save from project runner VMs to the central file server \(no “I did work but it can’t be saved”\).
- Efficient storage via transparent compression; simple mental model for users.
- Rolling snapshots for user self‑service restore. Separate quota for snapshots, which users mostly don't worry about.

**Non‑Goals**

- Per‑user UID separation on runners \(we rely on containerization and subvolume quotas instead\).
- Snapshots on runner VMs \(server owns snapshot history; runners are ephemeral\).

---

## High‑Level Components

1. **Central File Server** \(single large Btrfs filesystem\)
- One Btrfs **subvolume per project** \(live working set\).
- Compression enabled \(e.g., `zstd`\).
- **Qgroups/Quotas** enabled for hard limits.
- **Rolling snapshots** per project for user restore.
- Named **user created snapshots**.

2. **Project Runner VMs** \(many; fast local SSD\)
- Also Btrfs with compression and **per‑project subvolumes**.
- Hard quotas sized slightly below the server quota to maintain save‑back headroom.
- No persistent snapshots \(might use short‑lived read only snapshots for atomic rsync of rootfs\).

3. **Sync Layer**
- **Mutagen**: near real‑time sync for user home files.
- **rsync**: periodic sync for the container rootfs upper overlay.

4. **Web UI & Services**
- Surfaced usage and limits \(live and snapshots\), snapshot browser/restore, warnings.

---

## Storage Model & Quotas

### Per‑Project Subvolume (File Server)

- Each project lives at `/mnt/project-<project-id>` as its **own subvolume**.
- **Compression** is enabled at the filesystem level; **quotas are enforced** _**after compression**_.
- Two distinct quota budget buckets:
- **Live quota**: applies to the live subvolume.
- **Snapshots quota**: applies to the aggregate of _all_ snapshots for that project.
- Quota for snapshots will be a simple function \(probably 2x\) of the live quota.

### Qgroups Structure

- Btrfs assigns each subvolume an implicit qgroup `0/<live-id>`.
- We create an **aggregate qgroup** `1/<live-id>` for that project’s snapshots.
- We apply limits:
- **Live**: limit `0/<live-id>` \(or the path directly\) to, say, **10 GiB**.
- **Snapshots**: limit `1/<live-id>` to, say, **20 GiB** total across all snapshots.
- On snapshot creation, we assign the snapshot’s `0/<snap-id>` **into** `1/<live-id>`.
- Using the **live subvolume ID as the aggregate id** avoids external ID bookkeeping.

### Runner VM Quotas

- Each runner has a **per‑project subvolume** with **quota set to ~85–90%** of the server’s live quota.
- Rationale: keeps **headroom** so save‑back to the server succeeds even if compression ratios differ.

### User‑Facing Explanation (docs‑ready blurb)

> **Storage quota is measured after compression.** Your project has a quota that measures the actual space consumed on disk. If your data compresses well, the sum of file sizes you see in the editor may exceed your quota and still fit. Snapshots have a separate quota \(twice the project quota\) that limits how much historical data is retained.

---

## Snapshots

- **Where**: server only, per project \(no long‑term snapshots on runners\).
- **How**: periodic RO snapshots \(e.g., 15 minute/hourly/daily/weekly retention\).
- **Budget**: snapshots all share the project’s **snapshot quota** \(`1/<live-id>` limit\). When the budget is exceeded, the snapshot retention policy prunes oldest automatic snapshots until under budget. Explicit user created named snapshots are not automatically deleted.
- **Self‑service**: UI lets users browse/restore from snapshots; command line restore via rsync is also supported.

> **Note**: Runner nodes may take a **short‑lived RO snapshot** strictly for consistent `rsync` (copy‑on‑write point‑in‑time view), then delete it immediately after sync completes. This does not change policy: history lives on the server.

---

## Data Flow

1. **Active work on runner**
- User edits files in their per‑project subvolume on a runner.
- **Mutagen** streams home‑dir changes to the server nearly immediately. In case of file change conflicts the central file server always wins.
- **rsync** pushes the rootfs overlay periodically \(e.g., every minute\) from a transient snapshot for consistency.

2. **File Server receives changes**
- Writes land in the project’s live subvolume, bounded by the live quota.
- Periodic snapshots capture history and consume from the snapshots quota.

3. **Restore**
- Users restore individual files or directories from snapshots via UI or CLI.

---

## Operational Procedures

The following is roughly what the actual Javascript code in `packages/file-server` does.

### One‑Time Setup (per filesystem)

```bash
# Enable quotas once
sudo btrfs quota enable /mnt/fs
# Optional after bulk ops or enabling late
sudo btrfs quota rescan -w /mnt/fs
```

### Create a New Project (Server)

```bash
# Live subvolume
sudo btrfs subvolume create /mnt/project-$PROJECT_ID

# Set live quota (example: 10 GiB)
sudo btrfs qgroup limit 10G /mnt/project-$PROJECT_ID

# Snapshot aggregate group uses the live subvolume ID
LIVEID=$(sudo btrfs subvolume show /mnt/project-$PROJECT_ID | awk '/Subvolume ID:/ {print $3}')

# Create and limit the snapshots group
sudo btrfs qgroup create 1/$LIVEID /mnt/
sudo btrfs qgroup limit 20G 1/$LIVEID /mnt/ # example snapshots budget
```

### Snapshot Creation (Server)

```bash
# Create RO snapshot
TS=$(date -u +%Y%m%dT%H%M%SZ)
SNAP=/mnt/project-$PROJECT_ID/.snapshots/$TS
sudo btrfs subvolume snapshot -r /mnt/project-$PROJECT_ID "$SNAP"

# Assign snapshot to the project’s snapshot group
SNAPID=$(sudo btrfs subvolume show "$SNAP" | awk '/ID:/ {print $2}')
LIVEID=$(sudo btrfs subvolume show /mnt/project-$PROJECT_ID | awk '/ID:/ {print $2}')
sudo btrfs qgroup assign 0/$SNAPID 1/$LIVEID /mnt
```

### Runner Subvolume & Quota

```bash
# Create per‑project subvolume on runner
sudo btrfs subvolume create /runnerfs/project-$PROJECT_ID

# Set runner quota to ~90% of server limit (example: 9 GiB)
sudo btrfs qgroup limit 9G /runnerfs/project-$PROJECT_ID
```

### Rsync from Runner \(optional transient snapshot\)

```bash
# (TODO)
P=/runnerfs/projects/$PROJ
TS=$(date -u +%Y%m%dT%H%M%SZ)
rsync -aHAX --delete ... file-server:/mnt/projects-$PROJECT_ID/.local/overlay/...
```

### Inspecting Usage

```bash
# Qgroup usage (referenced/exclusive, human‑readable)
sudo btrfs qgroup show -reF /mnt | less

# Filesystem space by class (useful with compression)
sudo btrfs filesystem df /mnt
```

---

## Policies & Safety

- **Hard quotas**: enforced by the kernel via qgroups \(both server and runner\). When a project exceeds its quota, writes fail with ENOSPC scoped to that subvolume.
- **Headroom on runners**: prevents the common failure mode where work done on a runner can’t be saved back to the server due to tighter server limits or different compression ratios.
- **User guidance**: expose a `~/scratch` directory \(separate subvolume and policy\) for large temporary files not intended for sync—reduces quota pressure on the live budget.
- **Performance knobs**: `compress=zstd[:3]`, `ssd`, `discard=async`. Consider `autodefrag` only for heavy small‑random‑write workloads. Set `chattr +C` sparingly on paths needing no‑CoW \(trades off checksumming\).
- **Dedup** on runners: optional **bees** on runners to reduce local SSD usage; measure CPU/IO overhead under realistic load. Use reflink copy\-on\-write when possible \(e.g., cloning projects\).
- **Dedup** on file server: optional **bees** to reduce disk usage. Also extensively use copy\-on\-write, e.g., when copying files between projects.

---

## Failure Modes & Mitigations

- **Runner quota exceeded** → user sees ENOSPC early; save‑back fails fast and visibly. UI should warn near 80–90%.
- **Server live quota exceeded** → incoming syncs fail; UI callouts \+ guidance to delete files or increase quota.
- **Snapshot budget exceeded** → retention pruner deletes oldest snapshots until under budget.
- **Qgroup counter drift** \(rare, after crashes/bulk ops\) → `btrfs quota rescan -w` to reconcile.
- **Filesystem nearly full** → monitor `btrfs filesystem df`; alert admins before metadata pools are pressured.

---

## Observability (What to Monitor)

- Live and snapshots usage per project (qgroup referenced/exclusive).
- Runner vs server usage deltas (to detect pathological compression differences).
- Snapshot creation latency; pruner actions count.
- Error rates from mutagen/rsync; ENOSPC events; quota rescans.

---

## FAQ (User‑Facing)

**Q: My files add up to more than my quota, but I’m not blocked. Why?**
A: Quotas measure space **after compression**. If your data compresses well, you can store more than the sum of uncompressed file sizes.

**Q: Do snapshots count against my main quota?**
A: No. Snapshots have a **separate budget which is twice your main quota**. When that fills, older snapshots are pruned automatically.

**Q: What happens if I hit the quota while working?**
A: New writes fail with “out of space.” Delete data or request a higher quota, then try again.

**Q: Can I keep big temporary outputs?**
A: Use `~/scratch` \(limited retention and a separate quota\). Only the project’s live area is synced and counted against your main quota.

---

## Appendix: Rationale for Design Choices

- **Per‑project subvolumes** enable kernel‑level quotas, small blast radius, and fast deletion.
- **Server‑side snapshots only** simplify reasoning about history, save SSD cycles on runners, and reduce operational complexity.
- **Aggregate snapshot qgroup** provides a single dial for “how much history a project can accumulate.”
- **Runner quotas < server quotas** provide a simple, robust guardrail against save‑back failures due to compression variance.

---

_End of draft._

5 changes: 3 additions & 2 deletions src/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,6 @@ CoCalc is organized as a monorepo with key packages:
- Prefix git commits with the package and general area. e.g. 'frontend/latex: ...' if it concerns latex editor changes in the packages/frontend/... code.
- When pushing a new branch to Github, track it upstream. e.g. `git push --set-upstream origin feature-foo` for branch "feature-foo".


## React-intl / Internationalization (i18n)

CoCalc uses react-intl for internationalization with SimpleLocalize as the translation platform.
Expand Down Expand Up @@ -215,6 +214,7 @@ Same flow as above, but **before 3. i18n:upload**, delete the key. Only new keys
- Ignore files covered by `.gitignore`
- Ignore everything in `node_modules` or `dist` directories
- Ignore all files not tracked by Git, unless they are newly created files
- DO NOT DELETE ANY FILES NOT TRACKED BY GIT.

# Important Instruction Reminders

Expand All @@ -223,4 +223,5 @@ Same flow as above, but **before 3. i18n:upload**, delete the key. Only new keys
- ALWAYS prefer editing an existing file to creating a new one
- REFUSE to modify files when the git repository is on the `master` or `main` branch
- NEVER proactively create documentation files (`*.md`) or README files. Only create documentation files if explicitly requested by the User
- when modifying a file with a copyright banner at the top, make sure to fix/add the current year to indicate the copyright year
- when modifying a file with a copyright banner at the top, make sure to fix/add the current year to indicate the copyright year

Loading
Loading