Project Info
Viability of Static and Convolutional Embeddings for AI and Robotics Vision
Project Goals and Description:
This project explores a fundamental question in artificial intelligence and computer vision: Can we replace complex, resource-intensive contextual embeddings with simpler, more efficient alternatives without significantly sacrificing performance? Modern deep learning models for tasks like object recognition, scene understanding, and multimodal AI rely on contextual embeddings—representations that capture relationships between elements in data. While powerful, these embeddings demand substantial computing power and vast amounts of labeled training data. This project investigates whether static and convolutional embeddings—simpler, non-contextual alternatives—can achieve comparable results with fewer resources.
Building on influential studies from Stanford University (2020) and Beijing University of Technology (2022), this project examines whether findings in natural language processing, where non-contextual embeddings performed surprisingly well against contextual models, also hold for computer vision. If successful, this work could pave the way for more accessible AI systems, enabling high-performance vision models to run on less powerful hardware, operate with limited training data, and execute tasks faster. Such advancements are crucial for applications in robotics, real-time image processing, and AI deployment in resource-constrained environments.
More Information:
- Stanford non-contextual text embeddings paper: https://arxiv.org/abs/2005.09117
- Beijing University of Technology convolutional image patch embeddings paper: https://arxiv.org/abs/2207.13317
- BERT paper: https://arxiv.org/abs/1810.04805
- GloVe paper: https://arxiv.org/pdf/1902.11004
- ARIA Labs website home page: https://www.ariarobotics.com/
Primary Contacts:
Student Preparation
Qualifications
- Basic knowledge and experience with Git (centralized) and Docker (images, containers, volumes, etc).
- Familiarity with self-supervised neural network architectures and methodologies (textual and/or vision embedding experience preferred as well), deep learning practices, data processing / cleaning, and software engineering.
- Knowledge and experience with PyTorch.
- Experience with Debian-based Linux, Python 3, Bash scripting, and OpenCV.
- Basic knowledge of linear algebra and systems programming.
- Comfortable with formal mathematical and computer science notation.
- Comfortable with doing computer science literature reading and review.
TIME COMMITMENT (HRS/WK)
SKILLS/TECHNIQUES GAINED
- Significant experience with PyTorch and vision embedding architectures.
- Intermediate experience with Git, Docker, OpenCV, software engineering with Python, and Debian-based Linux systems programming (Bash).
- Intermediate knowledge of self-supervised model training and hyperparameter tuning.
- Significant experience with good computer science research practices and academic writing.