Weakly and Self-Supervised AI, Perceiver IO, OpenAI Codex

Insights

Insights 2: Weakly and Self-Supervised AI, Perceiver IO, OpenAI Codex

August 31, 2021 Vu Ha

This is the second technology newsletter from AI2 startup incubator. As a reminder, our Venn diagram is news about AI

and

startup, assuming the reader can get updates from either of these two angles elsewhere. Topics discussed in this issue include the profiling of our alumni Emad Elwany and Greg Finak by TheSequence, weakly, self-supervised, edge AI, Perceiver IO, and OpenAI Codex. We’ll also sample a few funding rounds announced in August, highlighting trends around computer vision, edge AI, conversational tech, remote health, therapeutics, Covid-19 pandemic adaptation, and more.

Community Update

TheSequence is a daily AI newsletter “curated by industry insiders” with >100K readers. In August they did a nice job profiling two members of our community: Lexion’s Emad Elwany and Ozette’s Greg Finak. What stood out for us in these pieces is the common theme of finding ways to rely less on hand-labeled data. You know, for us poor startups, gold datasets are just like gold jewelry - luxury items. Emad spoke about how his team uses weakly supervised learning to great success at Lexion. Greg shared his thoughts on unsupervised learning at Ozette (emphasis is ours):

TheSequence

: Data labeling and classification for cytometric data structures has been a known challenge to apply machine learning to cell analysis. What type of techniques and frameworks do you use at Ozette to address this challenge?

: We're aware of these challenges: how many cell types are truly there, what are they, how should they be labeled. We’ve been working on these questions for several years, and we’ve developed techniques that perform cell type discovery and annotation in an unsupervised fashion, without the need for labeled training samples.

A Crash Course on the Many Forms of Learning

Curious about all of these terms: supervised, unsupervised, semi-supervised, self-supervised, weakly supervised learning? We include a dummy guide below.

Supervised learning
. Have labels, will train. This is the standard, most well-known ML concept.
Unsupervised learning
. Do stuff without labels: clustering, anomaly detection, dimensionality reduction (e.g. PCA), etc. Training (large) language models can also be considered a form of unsupervised learning, but more recently tends to be referred to as self-supervised learning (more on this below).
Semi-supervised learning
. Have some labels, but a boatload of unlabeled data. For an unlabeled data point X’ that is similar to a labeled data point X, assume that X’ has the same label as X (think analogically). Voilà, more labels. A recent noteworthy development is Unsupervised Data Augmentation from Google Research.
Self-supervised learning (SSL)
. This is the hot stuff: BERT, GPT-3, need we say more? Facebook AI recently published a cool post calling it the dark matter of intelligence. Why is it called self-supervised? Consider a language modeling task, which used to be called unsupervised learning. Given a sentence, a word is masked out. The neural network is asked to predict the masked out word given the rest. Backprop. Do that for a gazillion times. Outcomes BERT, GPT-3, etc. No hand label is required - the sentence supervises itself. Compared to NLP, doing this in speech and vision is more tricky, but clever researchers keep finding ways to make it work, sort of. See Wave2vec for speech and SEER for vision.
Weakly supervised learning (WSL)
. Quantity is good. Quality? Not so much. Quantity has a quality all its own.

On-Going Efforts at the Incubator

So how are we using these different techniques here at the incubator?

Weakly Supervised Learning, Lexion.ai, and Snorkel.ai

First, let’s talk about

weakly supervised learning

. As discussed in our recent post on Minimum Algorithmic Performance, Lexion has been using WSL to great success. This month, two-and-half year old Snorkel.ai reached unicorn status, just two years after their seed round in the summer of 2019. This has to be a record for an AI tool startup, zooming past public darlings such as Hugging Face and Streamlit and hot on the heels of category leader Scale.ai.

What’s behind Snorkel.ai’s seemingly overnight success? We believe there are three factors. First is the founders’ on-going, significant innovations in application-focused WSL research, as part of Chris Re’s group at Stanford, that predate the founding of the company by several years. Second, Snorkel’s approach to incorporate domain knowledge into the modeling task in the form of labeling functions seems to resonate with practitioners. Third, Snorkel.ai is a

one-stop-shop AI vendor

, covering the whole lifecycle of an AI project: labeling, model development, deployment, and monitoring. With their data-centric view, these different aspects of an AI project are not isolated but fluidly intertwined.

Snorkel.ai’s success unequivocally validates that WSL is ready for prime time, at least in the NLP space. Unfortunately, Snorkel.ai is not responding to our multiple requests to demo their product, Snorkel Flow. They are probably too busy counting money hand over fist from their enterprise customers. Lexion built their own WSL framework. Other startups probably need to build off Snorkel’s abandoned OSS repo, or at least use it as an inspiration to build something in-house. At the incubator we are working with one of our companies along this direction. Snorkel.ai’s non-prioritization of startups could also open up the opportunity for another player to enter the fray - this does not seem like a winner-takes-all market.

Self-Supervised Learning: A Pilot Effort

Next up, let’s talk about

self-supervised learning

. SSL’s use in NLP tasks is pervasive with the popularity of BERT, transformers, and the Hugging Face’s tooling. Inspired by a recent conversation with Ani Kembhavi who leads the computer vision group at AI2, we are experimenting with SSL in a computer vision task, using Facebook’s VISSL library. SSL for non-NLP tasks is still at the bleeding edge of research (e.g. there’s no SSL unicorn yet), so stay tuned for updates on how this goes. This is also a good opportunity to remind that as part of the unique

AI2 credits

program, brilliant minds like Ani can point us to new ideas and directions.

Cool AI Tech Updates

This month we picked three cool AI updates to share: AI at the edge, Perceiver IO, and OpenAI’s Codex.

AI at the Edge: MoveNet, FaceMesh, BlazePose and HandPose

Google released MoveNet, a pose estimation and classification neural network architecture for edge devices, including Android and Raspberry Pi. This followed an earlier release for browsers running on top of Tensorflow.js.

They teamed up with IncludeHealth, a digital health and performance company, to understand whether MoveNet can help unlock remote care for patients. IncludeHealth has developed an interactive web application that guides a patient through a variety of routines (using a phone, tablet, or laptop) from the comfort of their own home. The routines are digitally built and prescribed by physical therapists to test balance, strength, and range of motion.

There is also a 3D version:

Speaking of edge AI, Latent AI just raised $19M series A, about 18 months after XNOR’s exit to Apple.

The company says it makes software designed to train, adapt and deploy edge AI neural networks, irrespective of hardware constraints or the inexpensive chips that are typically found in edge devices. It also says it can compress common AI models by ten times without a noticeable change in accuracy, partly through an “attention mechanism” that enables it to save power and run only what is needed based on environment and operational context. Lastly, it promises its users (who are edge device developers) that its software can do all of this with nearly zero latency (thus the company name).

With these advances in edge AI, let’s continue to keep an eye for the next killer app.

Deep Mind’s Perceiver IO

Deep learning with neural network architectures such as ConvNet and Transformers have been transformational. In addition to having great performance, neural networks reduce to a great extent the need for feature engineering (there was a time when the vision kernels were hand tuned instead of just conveniently popping out from backprop). But have you ever wondered why we rely on ConvNets for vision tasks but Transformers (or earlier, LSTM) for NLP tasks? The reason is these architectures incorporate our specific knowledge and intuition of the given problem. For vision tasks, ConvNets’ convolution kernels were inspired by the biological receptive fields. For NLP tasks, the attention mechanism that is central in the Transformers architecture was invented to mimic cognitive attention. In Jay Alammar’s excellent tutorial titled The Illustrated Transformer, he used the sentence ”The animal didn't cross the street because it was too tired” as an example. To encode that the word “it” refers to the word “animal” but not the word “street”, we need to pay strong attention to the word “animal” and weak attention to the word “street”. These weights are of course learned through back propagation from a massive amount of text data.

Neural network architecture engineering is at the heart of today’s machine learning. A group of researchers (Andrew Jaegle et al.) at Deep Mind are still unsatisfied with simply graduating from feature engineering to architecture engineering. They asked the question whether there is a generic architecture that works well across multiple modalities. In two recent papers, they introduced Perceiver and then an extension called Perceiver IO that attempted to achieve this. Perceiver IO incorporates an interesting adaptation of the attention mechanism from Transformers and the encoder-decoder architectures that are often used in high-bandwidth domains - computer vision, audio, multimodal processing. The initial findings are encouraging: Perceiver IO is competitive with SOTA computer vision models without any 2D convolutions, matches a Transformer-based BERT baseline on the GLUE language benchmark without the need for input tokenization, and achieves state-of-the-art performance on Sintel optical flow estimation. Perceiver IO’s code is open source.

Feeling fatigue from too many neural network types? Perceiver IO may point towards an interesting direction. This matters for us practitioners as any time research transitions from rapid and chaotic innovation to a more consolidated phase, adoption increases and new opportunities arise. Hugging Face grabbed such an opportunity around the time Transformers emerged.

OpenAI Codex/Aleph Alpha

You should check out OpenAI’s demo of Codex on YouTube (it’s somewhat strange that Wojciech Zaremba, the project lead, handed off the demo duty to his co-founders, Sutskever and Brockman). Codex is a sequence (think Terminator 2) to GitHub Copilot that can compile English to code (Python, HTML, Javascript, etc.), sort of. Cherry-picked examples aside, it’s really impressive how OpenAI continues to innovate in this space. Europe decided to do something to keep up: Aleph Alpha raised $27M series A to “build Europe’s OpenAI”. Exciting times ahead.

AI Startup Scene

We wrap up this month’s newsletter with a recap of fund raise announcements of AI startups. This month themes include computer vision, conversational tech, remote health/therapeutics, and Covid-19 adaptation. But first, it’s worth looking at Zeni’s $34M series B.

Zeni’s AI-powered finance concierge platform offers bookkeeping, accounting, tax and CFO services, managing these for a flat monthly fee starting at $299 per month. Founders have real-time access to financial insights via the Zeni Dashboard, including cash in and out, operating expenses, yearly taxes and financial projections.

Check out Zeni for your startup!

Computer Vision

In the AI for construction space, Buildots raised $30M series B and Doxel raised $40M series B, seemingly going after the same opportunity. The technical problem seems challenging, but the pain seems severe enough that AI does not need to be perfect on day one to provide immediate value.

Buildots: While construction processes would seem similar to manufacturing processes, building to the design or specs didn’t happen often due to different rules and reliance on numerous entities to get their jobs done first, he said. Buildots’ technology is addressing this gap using AI algorithms to automatically validate images captured by hardhat-mounted 360-degree cameras, detecting immediately any gaps between the original design, scheduling and what is actually happening on the construction site. Project managers can then make better decisions to speed up construction.

Doxel: has developed software that uses computer vision to help track and monitor progress on construction job sites. “Our predictive analytics gives building owners and general contractors a way to identify critical risk factors that threaten to derail their project before they even know the risks exist,” he said. “So they are not finding out about problems when it’s too late to actually solve them.

Cardiomatics raised a $3.2M seed to read ECG with AI.

The data set that we use to develop algorithms contains more than 10 billion heartbeats from approximately 100,000 patients and is systematically growing. The majority of the data-sets we have built ourselves, the rest are publicly available databases.

Singapore-based YC alum Adra wants to turn all dentists into cavity-finding super dentists, using computer vision for X-Ray, raising $250K so far. Adra claimed dentists misdiagnose cavities up to 40% of the time, and AI can do this 25% more accurately - wow! The global dental services market is $435.08 billion this year.

California-based YC alum Revery.ai is building a virtual dressing room, raising $125K so far. Want to try on a dress but Covid is giving you a pause about going to a real dressing room? Revery.ai can help:

This technology comes at a time when online shopping jumped last year as a result of the pandemic. Just in the U.S., the e-commerce fashion industry made up 29.5% of fashion retail sales in 2020, and the market’s value is expected to reach $100 billion this year.”

Conversational Tech

Level AI raised $13M series A to build conversational intelligence for customer service.

Our product helps agents in real time to perform better, resolve customer queries faster and make them clear faster. Then after the call, it helps the auditor, the folks who are doing quality assurance and training audits for those calls, do their jobs five to 10 times faster.

ConverseNow raised $15M series A.

The Austin-based company’s AI voice ordering assistants George and Becky work inside quick-serve restaurants to take orders via phone, chat, drive-thru and self-service kiosks, freeing up staff to concentrate on food preparation and customer service. Restaurants were some of the hardest-hit industries during the pandemic, and as they reopen, Shukla said their two main problems will be labor and supply chain, and “that is where our technology intersects.

Remote Health/Therapeutics

Lucid Lane raised $16M series A to help treat patients with medication dependency.

Its technology utilizes web and mobile-based applications to provide remote patient monitoring and connection to dedicated therapists on a daily basis. A newly developed analytics engine collects health signals from patients to measure symptoms like anxiety, depression, pain levels and withdrawal effects so that the platform and therapists can personalize their treatments. If needed, the engine will connect patients instantly with an on-call counselor.

As with the healthcare industry itself, the global pandemic helped adoption of the company’s telehealth platform surge as remote care became more mandatory than a discretionary feature. In addition, Asar said it would have normally taken two years for the company to get into Medicare, but with the government’s updated regulations around telehealth, Lucid Lane is now nationwide with Medicare.

Ultrahuman raised $17.5M series A for continuous glucose monitoring.

The product (a wearable and a subscription service) — which it’s branded “Cyborg” — consists of a skin patch that extracts glucose from the interstitial fluid under the skin, per founder and CEO, Mohit Kumar, with the data fed into a companion app for analysis and visualization. The patch tracks the wearer’s blood glucose levels as they go about their day — eating, exercising, sleeping, etc. — with the biomarker used to trigger the app to nudge the user to “optimize your lifestyle,” as Ultrahuman’s website puts it — such as by alerting the user to a high blood glucose event and suggesting they take exercise to bring their level down.”

BrainQ raised $40M to build a device that stimulates the damaged part of the brain and promotes self-repair for stroke patients.

To achieve this, we have analyzed a large-scale amount of healthy and unhealthy individuals’ brainwaves (electrophysiology data). Our technology uses explanatory machine learning algorithms to observe the natural spectral characteristics and derive unique therapeutic insights. During the pandemic, many of those recovering from a stroke who would normally visit the hospital for regular care were (and some remain) unable to do so. A home-based therapy with low risk and potentially great outcomes would be of enormous benefit for people currently recovering from a stroke.”

Others

Brain Technologies raised $50M for the launch of a natural language search engine, powered by “one shot learning”. Impressive $ amount given that the app is just launched.

For example, “I’d like sushi tonight,” will bring back options (in theory) for ordering sushi, and possibly your most favored dishes, from a selection of restaurants by way of food ordering apps that you use, or places to go eat it, as well as options for making that sushi yourself (and buying the ingredients online to do so, as well as a method).

Similarly, travel searches return results that dip into multiple silos from, say, airlines and airline aggregators that are easily editable and that you can buy directly from those results, if you already have payment details on your device. (While Google provides this to some degree, you eventually have to navigate to sites to buy tickets, which might end up significantly more expensive when you actually visit said sites.) The more you use the app, the theory is that it will learn more about what you might want from your questions.

AI that anticipates what we are trying to say or do is something that has been attempted before, of course, but the difference here, Yue said, is in Brain’s approach, which is based on the concept of “one shot” learning, which he described as a kind of general purpose AI, “a tool that learns to use other tools.”

Singapore-based Nektar.ai raised a $6M seed: to help B2B sales teams collaborate more effectively.

Lovo raised $4.5M series A for synthetic voice.

LOVO’s four core markets are marketing, education, movies and games in entertainment and AR/VR, Lee said. The movie “Spiral,” the latest film of the Saw Series, features LOVO’s voice in the film, he noted.

It is expected that LOVO will create additional synergies in the entertainment industry in the wake of the latest funding from a South Korean entertainment firm.

VP of CEO Vision Office at Kakao Entertainment J.H. Ryu said, “I’m excited for LOVO’s synergies with Kakao Entertainment’s future endeavors in the entertainment vertical, especially with web novels and music,” Ryu also added, “AI technology is opening the doors to a new market for audio content, and we expect a future where an individual’s voice will be utilized effectively as an intellectual property and as an asset.”

Dataiku raised $400M series E at a $4.6B valuation. There are a lot of opportunities to help enterprises taking baby steps in adopting analytics and data science without trying to adopt cutting edge AI.

Stay up to date with the latest A.I. and deep tech reports.

→

I have read and accept the Privacy & Terms