Project Info

Viability of Static and Convolutional Embeddings for AI and Robotics Vision

Kaveh Fathian

kaveh.fathian@mines.edu

Project Goals and Description:

This project explores a fundamental question in artificial intelligence and computer vision: Can we replace complex, resource-intensive contextual embeddings with simpler, more efficient alternatives without significantly sacrificing performance? Modern deep learning models for tasks like object recognition, scene understanding, and multimodal AI rely on contextual embeddings—representations that capture relationships between elements in data. While powerful, these embeddings demand substantial computing power and vast amounts of labeled training data. This project investigates whether static and convolutional embeddings—simpler, non-contextual alternatives—can achieve comparable results with fewer resources.

Building on influential studies from Stanford University (2020) and Beijing University of Technology (2022), this project examines whether findings in natural language processing, where non-contextual embeddings performed surprisingly well against contextual models, also hold for computer vision. If successful, this work could pave the way for more accessible AI systems, enabling high-performance vision models to run on less powerful hardware, operate with limited training data, and execute tasks faster. Such advancements are crucial for applications in robotics, real-time image processing, and AI deployment in resource-constrained environments.

More Information:

Grand Challenge: Engineer the tools of scientific discovery.

Stanford non-contextual text embeddings paper: https://arxiv.org/abs/2005.09117
Beijing University of Technology convolutional image patch embeddings paper: https://arxiv.org/abs/2207.13317
BERT paper: https://arxiv.org/abs/1810.04805
GloVe paper: https://arxiv.org/pdf/1902.11004
ARIA Labs website home page: https://www.ariarobotics.com/

Primary Contacts:

Kaveh Fathian Assistant Professor Computer Science Department Colorado School of Mines Personal website: https://sites.google.com/view/kavehfathian/ Lab website: https://www.ariarobotics.com/ Email: kaveh.fathian@mines.edu

Student Preparation

Qualifications

Basic knowledge and experience with Git (centralized) and Docker (images, containers, volumes, etc).
Familiarity with self-supervised neural network architectures and methodologies (textual and/or vision embedding experience preferred as well), deep learning practices, data processing / cleaning, and software engineering.
Knowledge and experience with PyTorch.
Experience with Debian-based Linux, Python 3, Bash scripting, and OpenCV.
Basic knowledge of linear algebra and systems programming.
Comfortable with formal mathematical and computer science notation.
Comfortable with doing computer science literature reading and review.

TIME COMMITMENT (HRS/WK)

5 hours/week

SKILLS/TECHNIQUES GAINED

Significant experience with PyTorch and vision embedding architectures.
Intermediate experience with Git, Docker, OpenCV, software engineering with Python, and Debian-based Linux systems programming (Bash).
Intermediate knowledge of self-supervised model training and hyperparameter tuning.
Significant experience with good computer science research practices and academic writing.

MENTORING PLAN

The student working on this project will be assigned a mentor (a senior lab member) and will meet with the faculty on a weekly basis. The mentor and the advisor will onboard the new student by sharing resources and training them as needed. The student will participate in weekly work meetings to work alongside other lab members that are working on this topic. This will create an opportunity for the student to learn from current members, ask questions, and participate in the development.

Preferred Student Status

Freshman

Sophomore

Junior

Senior

Return to All Project Page