Skip to content

Conversation

@kew6688
Copy link
Collaborator

@kew6688 kew6688 commented Dec 1, 2025

Highlights

  • Refactored Habitat evaluation flow integration into the InternNav framework for improved readability and reusability.
  • Refactored VLNPE evaluation flow improved DistributedEvaluator and add option for bypass the agent server.
  • Bump to version v0.0.2.

Improvements

  • Distributed evaluation adapted for slurm and AliCloud.
  • Supported InternVLA-N1 distributed evaluation.
  • Easy-to-use Habitat Env wrapper.
  • Uniform interface and workflow for multi Sims.

Bug fix

  • Fix logger bug in evaluation.
  • Fix dataset load start position bug.
  • Fix visualization utilities dimension unmatch bug during internvla-n1 evaluation.

Distributed Eval Time

Using 16 nodes, each with 1× RTX 4090 GPU, 8 CPUs, and 60 GB RAM.

Model Variant GPUs Used Previous Runtime (Single 4090) Distributed Runtime (16×4090) Speedup
InternVLA-N1 Flash 1 GPU → 16 GPUs ~13.5 hours ~1 hour 13× faster
InternVLA-N1 PE` 1 GPU → 16 GPUs ~21 hours ~1.6 hour 13× faster

Performance after Refactor

  • Re-ran InternNav benchmarks (Internutopia VLNPE + Habitat VLNCE) three times post-refactor, matching previous performance.
Model Dataset/Benchmark NE OS SR SPL
InternVLA-N1 Habitat R2R 4.88 62.2 57.0 52.4
InternVLA-N1 Flash 4.17 67.2 59.8 53.9
InternVLA-N1 PE 4.87 55.7 50.0 42.9
RDP Flash 7.11 41.7 24.3 17.4
RDP PE 6.73 38.0 26.3 18.6

@kew6688 kew6688 marked this pull request as ready for review December 1, 2025 09:43
@kew6688 kew6688 marked this pull request as draft December 1, 2025 11:09
@kew6688 kew6688 changed the title Bump version to 0.0.2 Bump version to 0.2.0 Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant