AI2 Incubator Insights - #006
We started the new year with a trio of companies graduating from AI2 incubator, raising a total of $11M in seed round funding. All three were covered by Geekwire.
- Augment AI is building a radically new productivity tool. Congrats Saurav, Jordan, Shu, and Daniel!
- BirchAI is building tools to streamline customer support for healthcare companies. TheSequence also interviewed Birch's CTO, Yinhan Liu. Congrats Yinhan, Sumant, and Kevin!
- Measure Labs is developing ways to measure patients' vital signs remotely. Congrats Eric, Matt, and Jamien!
Large models: new and noteworthy
GPT-3's release in the summer of 2020 was a watershed moment for the AI/ML community. One of the key finding is that size matters, a lot. Training large models on lots of data requires huge capex investment. Last month, Meta announced the on-going building of its AI super computer: Research SuperCluster (RSC). It currently boasts 760 Nvidia DGX A100 systems, growing to a total of 2,000 by the end of the year. That's 16,000 A100 GPUs!
We are still in the early innings of developing large self-supervised models. Research on LSMs falls into the following two broad categories: 1) methods that help achieve the same or better performance with smaller models and/or training/inference cost and 2) get LSMs to do new things: fine-tune, follow instructions, learn from audio/images, search the Web for info, generate code, etc.
Do more with less
Research in this category has the flavor of achieving better performance than GPT-3 using a smaller model.
- Google Deepmind announced RETRO (Retrieval-Enhanced Transformer) model that matches GPT-3's performance in knowledge intensive tasks with 25x less parameters (7 billion parameters).
- Microsoft shared the result of reaching human parity on CommonSense QA using an ensemble of 39 relatively small models (about 1 billion parameters). Even with one model, they got pretty close to human parity and outperformed fine-tuned GPT-3. The key technique is to bring external information, referred to as external attention, into the prediction process.
- Microsoft's DeepSpeed team shared several results on mixture-of-experts (MOE) models that have lower training costs, inference latency and time.
- AI2's Macaw model has 11 billion parameters but outperforms GPT-3 on a question answering benchmark by 10%.
- Meta published XGLM, a 7.3 billion-parameter cross-lingual model trained on a diverse set of languages. It achieves new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning and machine translation.
OpenAI continues to be THE pioneer in this category. We can now fine-tune GPT-3 with your our data, instruct it with a technique called reinforcement learning from human feedback (RLHF), command it to surf the Web to find more accurate answers, even train it to solve math problems from the IMO (international math olympiads). If all you want are just embeddings, OpenAI has you covered as well.
There were several 2021 retrospectives that are interesting to us: