AI Daily
AI Daily
HyenaDNA | Open-Source StyleDrop | Data Poisoning

HyenaDNA | Open-Source StyleDrop | Data Poisoning

AI Daily | 7.4.23

Welcome back to AI Daily! Today we discuss three great stories, starting with HyenaDNA. The application of the hyena model in DNA sequencing - enabling models to handle a million context length and revolutionizing our understanding of genomics. Secondly, we cover the exciting open-source implementation of StyleDrop - a tool that's making waves in the world of image editing and style replacement. Finally, we delve into the topic of data poisoning - how a small amount of injected data can drastically alter the outcome of an instruction tuning and the implications this has for AI security.

Key Points:

1️⃣ HyenaDNA

  • HyenaDNA utilizes sub-quadratic scaling for DNA sequences, enabling a million context length, each a unique nucleotide, trained on 3 trillion tokens.

  • HyenaDNA, setting a new state-of-the-art in genomics benchmarks, could predict gene expression changes, elucidating protein creation from genetic polymorphisms.

  • It's 160 times faster than previous LLMs, fitting on a single CoLab, showcasing the potential to outperform transformers and attention models.

2️⃣ Open-Source StyleDrop

  • An open-source version of Style Drop, an image editing and style replacing tool, has been implemented and made available for public use.

  • Style Drop outperforms comparable models and offers comprehensive instructions for setup, allowing users to experiment with stylizing lettering and more.

  • Following a pattern set by Dream Booth, Style Drop went from being a Google research paper to being implemented as an open-source project on GitHub.

3️⃣ Data Poisoning

  • Two papers discuss data poisoning, a technique where information like ads or SEO can be injected into LLMs, impacting their responses and recommendations.

  • Even a small number of examples in a dataset can effectively "poison" it, significantly altering the output of a language model during fine tuning.

  • This technique is expected to be used with open-source datasets for fine-tuning, similar to how publishers put fake words in dictionaries to trace usage.

🔗 Episode Links

Follow us on Twitter:

Subscribe to our Substack:

AI Daily
AI Daily
The go-to podcast to stay ahead of the curve when it comes to the rapidly changing world of AI. Join us for insights and interviews on the most useful AI tools and how to leverage them to drive your goals forward.