AI2 Incubator Insights 8
In this issue we cover interesting developments around large models, their real-world deployment, and their optimization for the specific domain of commerce. The special topic for this issue is a review of key technologies and tools that are handy for founders building AI-first companies. First we give a quick update on two of the AI2 incubator companies, Flowdex and Yoodli, and share our picks for cool AI startups and research papers.
AI2 Incubator Update
William Cheng and Justin Crenshaw launched Flowdex, a note taking application that automatically tags notes using AI so users can access their notes at their fingertips. Six months after coming out of stealth, Yoodli opens up its free product for anyone to try. Yoodli is an AI-powered product that helps everyone improve their public speaking abilities. Yoodli's co-founders, Esha and Varun brought home the GeekWire's Young Entrepreneur of the Year award shortly after this. Congratulations William, Justin, Esha, and Varun! We have more exciting news to share in our next newsletter—stay tuned!
AI Startup and Research Paper Picks
Our cool AI startup for this issue is Deepset who recently announced their $14M Series A. In the last issue we talked about Vespa.ai and Jina.ai as modern options to build AI-driven search experiences. A modern search stack should support features such as vector search, question answering, summarization, and ease of integration with Hugging Face’s platform. Deepset is such a contender building on top of the OSS project Haystack. Unlike Vespa and Jina, Deepset is modular and thus can be integrated with existing backends such as Elasticsearch. The bar has definitely been raised when it comes to building search experiences!
Our pick for a cool paper is from Google Research, titled LiT: Zero-Shot Transfer with Locked-image Text Tuning, to appear at CVPR2022 (see the accompanying blog post and demo). In this paper the authors propose a contrastive training regime to align image models and text models for the purpose of zero-shot image classification and retrieval. They found that by locking the image model and allowing the text models to be updated (to learn to read out good representations from the locked image model), the resulting model improves the SOTA significantly in zero-shot settings (from 76.4% to 84.5%). This can come handy in startup proof-of-concept settings where founders can get strong results out of the box with these models without the need for additional training data collection and fine tuning.
Founder technical toolkit
At the AI2 incubator we work with founders building companies from Day One. At this stage, our companies typically have no more than three entrepreneurs, and at times just one. When it comes to building a proof-of-concept (POC) for an AI product, such a small team represents a unique challenge. We are sharing some advice, technologies, and tools that we found to be useful in tackling this challenge.
- ML libraries: Pytorch, Hugging Face, spaCy. If you need to run ML on a mobile phone or inside a browser, consider TensorFlow Lite/JS. Definitely check out our guide on machine learning for startups. In particular, there is rarely a need for MLOps or large-scale data labeling efforts at this stage.
- If you do not have engineering backgrounds, you can still build a POC using no-code tools such as Bubble.io and AI APIs such as OpenAI’s API and Hugging Face’s API. Paul Jacoubian started copy.ai by tinkering with OpenAI’s GPT-3, for example.
- If you do have engineering backgrounds but have been working at big technology companies (e.g. MAANG), expect a bit of adjustment, as the tech stacks at such companies are not portable to the startup environment. View this as a challenge that you can crush—and at the AI2 incubator we can help! It’s also more enjoyable than practicing binary tree inversion on LeetCode just to pass another MAANG interview.
- Our recommended front-end/UX toolkit for web applications consists of ReactJS and Chakra UI. Alternatives such as NextJs and Remix.run can be compelling but unlikely to be worth the added learning curve for a POC. We also heard good things about Svelte, Vue if that's your things.
- REST vs GraphQL. We recommend staying with REST. The benefit/cost considerations in a pre-seed project heavily favor simplicity. GraphQL is powerful, but not simple.
- For the service layer, use either Node/Express or Flask. Use Go if you are an expert in it, but don’t try to use Go for the purpose of adding another PL to your arsenal/resume.
- For a data backend, Postgres, Mongo, Dynamo, etc. are all fine options. Pick the one that you are most comfortable with.
- For deployment, use AWS, specifically the Elastic Container Service (ECS). Consequently, some familiarity with Docker is required.
- At the AI2 incubator we deploy to ECS using Pulumi, writing our infra-as-code in TypeScript. This allows us to pick up minimal cloud architecture and devops knowledge in a pay-as-you-go fashion. Pulumi’s TypeScript documentation is superb and available at your fingertips inside your favorite IDE. Pulumi is used at AI2 incubator companies such as Ozette and WhyLabs.
- Consider a continuous deployment (CD) strategy with GitHub Actions. Continuous integration (CI) is definitely optional. Write tests only if you absolutely have to.
- At a high level, aim to optimize for speed and simplicity. Dial up the KISS principle to a pretty high setting, especially compared to the level at a larger organization.
- ML: Pytorch, Hugging Face, spaCy, Tensorflow Lite/JS.
- Rapid prototyping: Bubble.io, Streamlit, Gradio.
- Frontend: React, TypeScript, Chakra UI.
- Services: Express/Flask.
- Cloud/infrastructure: AWS, ECS, Pulumi, Docker, GitHub Actions.
- AI API services: OpenAI, Hugging Face.
- Look for discounts and promotions from major cloud providers. Companies incubated at the AI2 incubator get awesome packages of cloud credits from AWS, GCP, and Azure.
The world of technology is constantly evolving. Even with lots of simplification, the above list could still be rather overwhelming. You may decide to pick a subset of these technologies and work with contractors to fill the remaining gap. For your first engineering hire, recruit someone who may not necessarily know a given set of technologies but has an insatiable drive to learn whatever tool that is needed to get the job done.
We wrap up this discussion by sharing two GitHub repos that contain starter project templates that use our recommended toolkit.
- AI2 Incubator’s webapp template is a full-stack web application with a React/TypeScript frontend and a Flask backend. The app is deployed to AWS’s ECS using Pulumi and GitHub Actions. The deployed HTTPS endpoint URL can be configured by simply specifying a domain name and a subdomain name (e.g. https://demo.example.com). Pulumi takes care of setting up the necessary Route53 and certificates behind the scenes.
- AI2 Incubator’s streamlit template uses the same tools as the webapp template for the purpose of deploying a Streamlit demo to the domain URL of your choice. You may also take a look at Hugging Face’s Spaces as an alternative to host your Streamlit/Gradio apps.
AI in practice
Let’s recap recent announcements in AI/ML with an eye on applicability to the startup space.
As neural networks get ever bigger, it is becoming increasingly challenging to deploy them in production in a way that hits the sweet spot among size, cost, accuracy, latency, throughput, etc. In this issue we note two recent updates from AWS and Hugging Face that are relevant. In a blog post co-authored by folks from SageMaker and Hugging Face teams, we learn about how to take advantage of the SageMaker serverless inference. While the supported models seem to be small (e.g. up to a few gigabytes), this is a welcome development and a preview of more powerful features coming down the pike. The SageMaker team also remind us that they offer other solutions for real-time inference, batch transform, and asynchronous inference as well.
When it comes to deploying state of the art neural networks, we feel that Hugging Face may provide the most powerful option on the market. In a recent blog post, the HF team gave a deep dive into how to take advantage of their accelerated inference technology called Optimum, discussing a use case where we can reduce the model size and inference latency by half while still maintaining 99.61% of the original accuracy.
Lastly, we discuss Tensorflow JS/Lite. While Pytorch is often the preferred deep learning library among practitioners, Tensorflow JS/Lite are required if we want to deploy neural networks to edge environments such as mobile phones and browsers. At Google IO, the Tensorflow JS team gave an amazing update on the Tensorflow JS ecosystem. We highly recommend you check out their YouTube video of this update. It’s amazing to see how the community has come up with so many creative ideas using this platform.
What’s new with large models
AI21 labs introduced J1-Grande, a 17B-param model that they claim to provide similar performance to the 10x larger J1-Jumbo. Meta released a GPT-3-like (175B parameters) pre-trained model with training code and a detailed logbook of the training process. Hugging Face wasted little time in importing the 30B version into the HF transformers and deploying it to their Accelerated Inference infrastructure. You can play with this puppy on Colab. Meta also released the model and the code for the 6.7B-param InCoder, a GitHub CoPilot-like model that is trained with bidirectional context, allowing it to perform infill tasks such as type inference, comment generation, and variable renaming.
At AI2 incubator we are fans of Snorkel.ai and their highly pragmatic approach to building ML solutions using weakly supervised (WS) learning. Snorkel started as a research project at Stanford university at a time where large models did not exist. It’s only a question of time when the Snorkel team will integrate the latest advances from large models into their technology. In a recent blog post, the Snorkel team shared early findings in harnessing LM-enabled zero-shot models as labeling functions (LF):
- WS + LM outperforms LM directly as a predictor (41.6% error reduction), suggesting that the other LFs help correct or fill in gaps of the LM.
- WS + LM outperforms WS only (20.1% error reduction), suggesting that the LM does indeed provide useful additional information to the problem.
This is great news for startups building ML-driven minimal-viable-products as greater performance can be achieved without significant data labeling efforts.
We wrap up our discussion on large models with Meta’s CommerceMM, a commerce-optimized multi-modal large model (see blog post). While large models have demonstrated remarkable success across a wide range of tasks, we always feel that custom optimization for a focused domain such as commerce would have an advantage. In commerce, since product content is often a mix of text and images, a multi-modal approach seems logical. With CommerceMM, Meta researchers demonstrated SOTA performance across seven tasks such as product categorization and retrieval. Unlike OPT and InCoder, this time Meta decided to release neither the pre-trained model nor the proprietary dataset, which draws from Instagram shops, Facebook shops, and Facebook marketplace, for obvious reasons. Given commerce’s massive market size, we anticipate that at least one startup will replicate CommerceMM and offer its capabilities across next-generation search, discovery, recommendation, data integration, etc. for different players in this industry. Depict.ai, which raised $17M Series A earlier this year, has demonstrated “Amazon-level product recommendation” by crawling and processing commerce content. If this claim holds, it’s quite remarkable that Depict was able to accomplish this without Amazon’s proprietary customer data or CommerceMM-like technology. We nevertheless believe that large models can take their technology to the next level, and are bullish on the bright future of large models that are optimized for various domains.
Additional Readings That We Found Interesting
- AI2's researchers Jesse Dodge, Michael Schmitz, and Oren Etzioni awarded 10-year Test of Time Awards at ACL 2022, the premier NLP conference.
- AI2 launches GRIT, a new benchmark for general-purpose computer vision models.
- Shopify’s blog post on its machine learning platform, Merlin.
- OpenAI’s DALLE 2.
- TechCrunch's story on machine learning feature stores.
- Oren Etzioni's op-ed on the recent legal ruling against LinkedIn on data scraping.
- Anthropic’s $580M Series B.
- TechCrunch's story: The emerging types of language models and why they matter.
- Synthesis AI raises $17M to generate synthetic data for computer vision.
- Robert Dale's The Voice Synthesis Business: 2022 Update.