Subscribe to 0x06AE
Receive the latest updates directly to your inbox.

Google extracted ChatGPT’s Training Data using a silly trick

0x06AE…f87c
January 22
In their publication, Scalable Extraction of Training Data from (Production) Language Models, Deepmind researchers were able to extract several megabytes of ChatGPT’s training data for about two hundred dollars. They estimate that it would be possible to extract ~a gigabyte of ChatGPT’s training dataset from the model by spending more money querying the model. Their, “new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly.”

Temporal Graph Learning in 2024

0x06AE…f87c
January 22
Many complex networks evolve over time including transaction networks, traffic networks, social networks and more. Temporal Graph Learning (TGL) is a fast growing field which aims to learn, predict and understand evolving networks. See our previous blog post for an introduction to temporal graph learning and a summary of advancements last year.

ADMET-AI: A machine learning ADMET prediction platform. Article review

0x06AE…f87c
January 22
With the rapidly increasing amounts of chemical data, it is still a huge problem to characterize it and be able to predict the properties of compounds accurately. Currently, this is still a challenge due to the lack of open-source data and tools, so it is super exciting to see great projects that attempt to solve this problem. I recently discovered a platform called ADMET-AI developed by Greenstone Biosciences and Stanford University. It not only has a web interface but can be used locally as well providing great prediction speed. Here are some details on how this service affects the efficiency of research work, what specific functions it provides, and what benefits it has brought to the field of Drug Discovery.

How to Write Memory-Efficient Classes in Python

0x06AE…f87c
January 22
A few years ago, I wrote a blog post on how to write memory-efficient loops in Python which became quite popular. The positive response encouraged me to write a second part, where I delve into additional memory optimization methods.

Towards Hybrid Reasoning: Assimilating Structure into Subsymbolic Systems

0x06AE…f87c
January 22
The recent advances in large language models (LLMs) have demonstrated their remarkable fluency and adaptability when generating text. After exposure to just a few examples, these models can produce surprisingly coherent continuations on a wide array of topics, exhibiting signs of flexible understanding and reasoning.