updates to migrated medium blog

kscalelabs · Jan 31, 2025 · b7e79a9 · b7e79a9
1 parent 648daa9
commit b7e79a9
Show file tree

Hide file tree

Showing 3 changed files with 6 additions and 22 deletions.
diff --git a/src/content/research/introducing-k-scale-labs.mdx b/src/content/research/introducing-k-scale-labs.mdx
@@ -5,8 +5,6 @@ date: "August 7, 2024"
 image: "/images/research/css-pattern2.png"
 ---
 
-
-
 # Introducing K-Scale Labs
 
 In 1964, Soviet astronomer Nikolai Kardashev proposed a scale for measuring a civilization’s level of technological advancement based on energy consumption. A Type 1 civilization on this scale is one which can harness all the energy available on its planet. A Type 2 civilization is one which can harness all of the energy from a star. A Type 3 civilization is one which can harness all the energy from a galaxy.
@@ -22,7 +20,6 @@ Our mission at K-Scale Labs is to move humanity to a Type 1 Kardashev civilizati
 
 ### Kardashev scale projections for Earth
 
-
 <Image
   src="/images/research/time-to-t1.webp"
   alt="Time-to-T1 for different growth rates in energy consumption"

diff --git a/...thats-not-an-option-llms-robustness-with-in-correct-multiple-choice-options.mdx b/...thats-not-an-option-llms-robustness-with-in-correct-multiple-choice-options.mdx
@@ -7,7 +7,6 @@ image: "/images/research/css-pattern4.png"
 
 # Wait, That’s Not an Option
 
-
 Reflective judgment is a critical process that enables individuals to evaluate and analyze information to form well-founded conclusions. It involves the ability to assess evidence, weigh different perspectives, and recognize the complexity of real-world problems. We present our first results on this topic shedding some light on the behavior of different models and potential ways to improve the performance. You can also see our project website and the Github code.
 
 ## What do we measure?
@@ -18,13 +17,7 @@ We investigate Reflective Judgment (RJ), a model’s ability to override its ten
 
 Blindly adhering to instructions can result in incorrect or harmful outputs, especially in high-stakes settings like healthcare and decision-making systems. Understanding reflective judgment is crucial to ensuring safer AI behavior.
 
-<Image
-  src="/images/research/why-rj.webp"
-  alt="Why RJ?"
-  width={600}
-  height={300}
-/>
-
+<Image src="/images/research/why-rj.webp" alt="Why RJ?" width={600} height={300} />
 
 ## How do we measure RJ?
 

diff --git a/src/content/research/we-are-here-for-the-long-haul.mdx b/src/content/research/we-are-here-for-the-long-haul.mdx
@@ -5,22 +5,21 @@ date: "August 8, 2024"
 image: "/images/research/css-pattern3.png"
 ---
 
-
 # We Are Here for the Long Haul
 
 We believe the world will soon see an exponential increase in the number of useful and affordable humanoids deployed across labs, warehouses, and households worldwide. The price of hardware will soon drop below the cost of your favorite VR headset¹, and the AI software will become much more sophisticated and mature.
 
 We also believe advancements in this field should be publicly accessible, which is why we created K-Scale Labs. The best part of building in the open-source spirit is sharing everything with the community instead of keeping it behind closed doors. With our updates, we want to share what we build and what we learn along the way.
 
-## Laying the foundation  
+## Laying the foundation
 
 My co-founder Ben likes to say that the main reason GPT-2 was adopted faster and more widely than BERT is because you could immediately see the poetry it generated after training.² We are far from that setup in robotics, and K-Scale’s mission is to change that. In upcoming weeks we will share some updates on our affordable humanoidal platform that will open up new possibilities to roboticians, MLEs and enthusiasts so they can easily test new models and skills at home or lab.
 
 In 2018, I collected the largest task-oriented dialogue dataset available to the community. 10,000 dialogues felt like more than enough (sic!), but the reality was that the foundation model was still missing. Just like in the NLP world, the robotics world is still searching for that foundation. And it feels like we’re making the same mistakes along the way. In the world of ubiquitous robots you still have to make them truly generalizable. We believe that achieving this is only possible being ML-first while fully hardware-aware.
 
 We recognize the long road ahead in creating truly useful and widely adopted robots, but we’re excited to embrace the challenge and contribute to working towards a world where embodied intelligence is cheap, plentiful and useful.
 
-## The data challenge: quality and diversity  
+## The data challenge: quality and diversity
 
 Collecting the right data at scale is challenging, to put it mildly. There are two main issues with every collection: data quality and data diversity. Having spent a considerable portion of my ML career collecting conversational data, I believe you can’t escape these constraints, even with unlimited resources. Recent datasets like OXE, Droid (and many more) are fantastic steps forward, but the problem of quality will only become more challenging. Let’s look at some examples from the Droid dataset below:
 
@@ -34,7 +33,6 @@ Collecting the right data at scale is challenging, to put it mildly. There are t
   allowfullscreen
 ></iframe>
 
-
 ### The instruction is ‘Spread the jeans on the couch’.
 
 <iframe
@@ -47,7 +45,6 @@ Collecting the right data at scale is challenging, to put it mildly. There are t
   allowfullscreen
 ></iframe>
 
-
 ### The instruction is ‘Move the cup to the left and cover it’.
 
 The ambiguity of the annotations is a major issue. With the challenges of the real world and the lack of a foundation model, any noise in the annotation, instead of helping the model generalize, will cause it to overfit to the noise. We will soon share our first datasets and tools to build initial filter models that can act as initial tests against incorrect annotations. Nevertheless, they will never be perfect.
@@ -63,7 +60,6 @@ Teleoperation feels like an obvious path, and most major robotics companies are
 
 We are quite skeptical of teleop as a solution since it distracts from focusing on the core of the problem. The harsh reality is that our “a lot” of data isn’t really that much, and data diversity will be a major bottleneck if we don’t have these robots in real-world environments. And just like in the NLP world, we try to collect vast amounts of data to train on specific tasks, incentivizing overfitting.
 
-
 <iframe
   width="600"
   height="300"
@@ -74,15 +70,14 @@ We are quite skeptical of teleop as a solution since it distracts from focusing
   allowfullscreen
 ></iframe>
 
-## Streamlining development  
+## Streamlining development
 
 Dealing with CUDA issues when working on ML models is already frustrating. Adding to that world, ad-hoc URDF changes, different simulators with their quirks, and optimizing for sim-to-real makes ML robotics development a truly painful experience. That’s why we’re building tools to quickly go from CAD designs to modeling in your favorite simulator with URDF or XML files. All of this is shared at the K-Scale Onshape library.
 
-## Modeling through simplicity  
+## Modeling through simplicity
 
 The UMI and Aloha projects popularized ACT and diffusion architectures. IsaacLab, LeRobot, and many other packages significantly lower the barrier to entry for newcomers to the field. Below, you can see a simple policy trained on a handful of examples, with the model generalizing to a new background.
 
-
 <iframe
   width="600"
   height="300"
@@ -93,8 +88,7 @@ The UMI and Aloha projects popularized ACT and diffusion architectures. IsaacLab
   allowfullscreen
 ></iframe>
 
-
-## Conclusions  
+## Conclusions
 
 There will be billions of autonomous, affordable, and helpful robots in the world, enabling us to do more productive and creative work. K-Scale Labs’ mission is to bring them to market at an affordable price with a simple software ecosystem where hardware and software are tightly integrated and driven by a single model.