AI2 Incubator Insights 5
Hope y'all had a nice Thanksgiving. This month, we start our newsletter with cool updates from the Incubator's all-star grads: WhyLabs, Lexion, and WellSaid. We then continue with a slew of news updates on large models from Microsoft, Meta, Google, OpenAI, Cohere.ai, AI21 labs, including a fascinating research paper by MSFT on why large models need to be large (hint: because Santana and Rob Thomas said so). Our startup roundup this month has a flavor of ML infrastructure, tooling, MLOps, and AI-assisted writing. Enjoy!
AI2 Incubator Updates
As alluded to in last month's edition, WhyLabs helped us kick off November with a bang, announcing their $10M Series A led by Defy and Andrew Ng's AI Fund. It's cool to reflect that in just a few short years, MLOps has become an important and vibrant area, with Alessya, Andy, Sam, Maria, and the rest of the amazing Why team at the forefront. Congrats! We have more updates on MLOps and ML infrastructure in the startups section.
Not to be outdone in the hype department, and fresh off their own Series A earlier this summer, Lexion launched Lexion Workflow. This new feature adds an email interface to their AI-powered contract management product. Legal teams have a power tool to help them in their day-to-day work, while stakeholders can continue to use the old-fashioned tool—email—without learning a new tool. It's win-win and brilliant!
What about WellSaid, our remaining alumnus who also raised Series A this summer? Hold my beer, said Michael Petrochuk and Matt Hocking. WellSaid, a leader in custom AI voice synthesis, made the IA 40 list (Intelligent Applications list). WellSaid rocks! Literally. You should check out WellSaid's AI DJ named Andy. My favorite line from Andy is this one, leading smoothly from Foo Fighters into Smashing Pumpkins' 1979.
Ever feel like your day just needs a shot of pick-me-up? Well, that’s what we’re here for — to help turn that frown upside down and crank the dial to 11. Yes, I may be a robot, but I still love to rock.”
Three companies. Three Series A rounds. 2021 is shaping up to be a triple-A year for us here at the AI2 Incubator.
Models? Extra-Large, Please
Continuing from last month's newsletter, we have a large amount of updates on large models this month as well. Large models such as GPT-3 are large neural networks that have been trained with large amounts of data of different types: text, audio, images, etc. The keyword is large: large amounts of data contain a large number of patterns which large models can learn/encode with their large number of neurons/parameters. Large models are effectively encyclopedic databases of patterns that can then be harnessed to solve a specific ML problem that, in many practical applications, corresponds to a subset of such patterns. Using techniques such as prompting and task demonstration with a handful of examples, we can instruct/query these databases to locate the subset of patterns that is relevant to the problem at hand. Large models are efficient learners, and thus attractive in practice as we can reduce and sometimes largely eliminate the data bottleneck, the heretofore expensive and time consuming process of collecting training data in building ML solutions.
Do large models need to be large? Can we get away with smaller carbon footprints, training medium models with lots of data? The folks at Microsoft Research think not. In the paper with the scary title A Universal Law of Robustness via Isoperimetry and some scary math, the authors showed that lots of parameters are necessary for the "data interpolation to be smooth". Smooth is good for generalization ability, so I could change my life to better suit these models' mood. Give my your FLOPs, make it big, or else forget about it. The paper won the outstanding paper award at this year's Neurips.
In other Microsoft news, check out the paper titled Florence: A New Foundation Model for Computer Vision. It's a foundation model trained with 900M image-text pairs, achieving SOTA zero- and few-shot performance across 44 benchmarks. Even with "only" 893M parameters (vision models tend to be smaller than text ones, for now) Florence took 10 days to train on 512 NVIDIA-A100 GPUs with 40GB memory per GPU.
All of these large models need to put to work somehow. Microsoft announced GPT-3 via Azure, currently by invite only. OpenAI also announced that its GPT-3 API no longer requires joining a wait list. I suspect it's not a coincidence that Cohere.ai announced their own large model APIs, and AI21 labs announced a new zippy summarization API in celebration of a $20M raise at $400M valuation. KakaoBrain, the 6B-parameters "Korean GPT-3" was uploaded to Hugging Face. Speaking of HF, the company dropped its prices for Lab and Startup subscription to $0. It's all-out war out there now.
Back from a brief industry detour to cool papers on large models. Meta shared research on multi-lingual machine translation, winning the WMT competition with a, drum roll please, 52B-parameter model. In another Meta paper, titled Masked Autoencoders Are Scalable Vision Learners, Kaiming He (of ResNet fame) and others show that the masked autoencoding technique of BERT can be made to work on vision data as well, achieving another SOTA result with 632M parameter model. Google's paper titled Combined Scaling for Zero-shot Transfer Learning demonstrates that again, size matters. By combining a 16x increase in dataset size (compared to CLIP's), 3.75x increase in model size (to 3B weights), and 2x increase in batch size, they got a 9.3% bump in top-1 zero-shot accuracy on the ImageNet, going from 76.2% to 85.7%. Nice!
Innovations Beyond Large Models
Large models have critics. They are clunky to train and use. They are inscrutable (have you tried to inspect 173B numbers in Excel?). The Stanford AI labs recently blogged about the concept of retrieval-based NLP. TL;DR: replace a gigantic black box with a pair of modest-sized Retriever and Reader to gain better performance and provenance (useful in QA tasks) while being 100-1000x smaller. This should sound familiar to search engineers as we also typically employ a two-stage search retrieval and ranking design to tackle scaling.
The idea of focus-by-retrieval can also be used in the training phase. In a paper by Tsinghua University, titled NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework, the authors propose a method called Task-Driven Language Model (TLM) that takes aim at reducing the training time/cost of large models. Given some labeled task data and a large general corpus, TLM uses task data as queries to retrieve a tiny subset of the general corpus and jointly optimizes the task objective and the language modeling objective from scratch. On eight classification datasets in four domains, TLM achieves results better than or similar to pretrained language models (e.g., RoBERTa-Large) while reducing the training FLOPs by two orders of magnitude.
Our last paper review is Google's Wisdom of Committees: An Overlooked Approach to Faster and More Accurate Models. In the era of rapid neural network architecture advances, the technique of ensembling seems quaint. Instead of going big, the authors here propose going with an ensemble of small networks. The benefits? Simple to build, easy to maintain, affordable to train, faster on-device inference. Tested on ImageNet benchmarks with EfficientNet, ResNets, and MobileNets.
These lines of research provide a great counter balance to the size-centric arm race. There is only one consequence: as profound as progress in has been in the last decade, ML is poised to scale even greater height in the years ahead. Reach out to us here at the Incubator if you have an idea on how to ride this wave to build the next big company!
We already mentioned AI21 labs $20M raise at $400M valuation and WhyLabs' Series A. A few noteworthy rounds in the AI infrastructure/tooling/MLOps are:
- ML acceleration platform OctoML raised $85M series C. Go Seattle/UW!
- GPU cloud specialist CoreWeave raised $50M.
- Neural search company Jina AI raised $30M Series A.
- Experiment management platform CometML raised $50M Series B.
- Data training platform Sama raised $70 million Series B.
- Streaming AI database Activeloop raised $5M seed.
An interesting theme in this month's AI startup funding is AI-assisted writing.
- Writing assistant Grammarly raised a $200 million. Post-money evaluation increased to $13 billion, making Grammarly one of 10th most valuable US startups.
- Marketing-oriented language optimization platform Anyword raised $21 million.
- Writing assistant Writer raised a $21 million Series A.
Lastly, some noteworthy funding announcements:
- AI-powered transcription company Verbit raised $250 million Series E.
- Risk intelligence startup Intelligo raised $22 million.
- Intelligent test automation Mabl raised $40 million Series C.
- Codeless test automation solution Virtuoso raised $13.3 million Series A.
- Assisted reality tech company Cognixion raised a $12 million Series A.
- Chargeback prevention solution Justt emerged from stealth with $70 million raised across three funding rounds, including a Series B led by Oak HC/FT and two previously unannounced rounds led by Zeev Ventures and F2 Venture Capital, respectively.