Войти
  • 275849Просмотров
  • 3 года назадОпубликованоRobert Miles AI Safety

Why Does AI Lie, and What Can We Do About It?

How do we make sure language models tell the truth? The new channel!: @aisafetytalks How to Help: Sources: Evan Hubinger's Talk: https:/ ACX Blog Post: Links: Eliciting latent knowledge from AI: Do current AI models show deception: What is deceptive alignment: Scaling Laws: LLMs as simulators: With thanks to my wonderful Patrons at : - Tor Barstad - Kieryn - AxisAngles - Juan Benet - Scott Worley - Chad M Jones - Jason Hise - Shevis Johnson - JJ Hepburn - Pedro A Ortega - Clemens Arbesser - Chris Canal - Jake Ehrlich - Kellen lask - Francisco Tolmasky - Michael Andregg - David Reid - Teague Lasser - Andrew Blackledge - Brad Brookshire - Cam MacFarlane - Olivier Coutu - CaptObvious - Girish Sastry - Ze Shen Chin - Phil Moyer - Erik de Bruijn - Jeroen De Dauw - Ludwig Schubert - Eric James - Atzin Espino-Murnane - Jaeson Booker - Raf Jakubanis - Jonatan R - Ingvi Gautsson - Jake Fish - Tom O'Connor - Laura Olds - Paul Hobbs - Cooper - Eric Scammell - Ben Glanton - Duncan Orr - Nicholas Kees Dupuis - Will Glynn - Tyler Herrmann - Reslav Hollós - Jérôme Beaulieu - Nathan Fish - Peter Hozák - Taras Bobrovytsky - Jeremy - Vaskó Richárd - Report Techies - Andrew Harcourt - Nicholas Guyett - 12tone - Oliver Habryka - Chris Beacham - Zachary Gidwitz - Nikita Kiriy - Art Code Outdoors - Andrew Schreiber - Abigail Novick - Chris Rimmer - Edmund Fokschaner - April Clark - John Aslanides - DragonSheep - Richard Newcombe - Joshua Michel - Quabl - Richard - Neel Nanda - ttw - Sophia Michelle Andren - Trevor Breen - Alan J. Etchings - Jenan Wise - Jonathan Moregård - James Vera - Chris Mathwin - David Shaffer - Jason Gardner - Devin Turner - Andy Southgate - Lorthock The Banisher - Peter Lillian - Jacob Valero - Christopher Nguyen - Kodera Software - Grimrukh - MichaelB - David Morgan - little Bang - Dmitri Afanasjev - Marcel Ward - Andrew Weir - Ammar Mousali - Miłosz Wierzbicki - Tendayi Mawushe - Wr4thon - Martin Ottosen - Alec Johnson - Kees - Darko Sperac - Robert Valdimarsson - Marco Tiraboschi - Michael Kuhinica - Fraser Cain - Patrick Henderson - Daniel Munter - And last but not least - Ian Reyes - James Fowkes - Len - Alan Bandurka - Daniel Kokotajlo - Yuchong Li - Diagon - Andreas Blomqvist - Qwijibo (James) - Zannheim - Daniel Eickhardt - lyon549 - 14zRobot - Ivan - Jason Cherry - Igor (Kerogi) Kostenko - Stuart Alldritt - Alexander Brown - Ted Stokes - DeepFriedJif - Chris Dinant - Johannes Walter - Garrett Maring - Anthony Chiu - Ghaith Tarawneh - Julian Schulz - Stellated Hexahedron - Caleb - Georg Grass - Jim Renney - Edison Franklin - Jacob Van Buren - Piers Calderwood - Matt Brauer - Mihaly Barasz - Mark Woodward - Ranzear - Rajeen Nabid - Iestyn bleasdale-shepherd - MojoExMachina - Marek Belski - Luke Peterson - Eric Rogstad - Caleb Larson - Max Chiswick - Sam Freedo - slindenau - Nicholas Turner - FJannis - Grant Parks - This person's name is too hard to pronounce - Jon Wright - Everardo González Ávalos - Knut - Andrew McKnight - Andrei Trifonov - Tim D - Bren Ehnebuske - Martin Frassek - Valentin Mocanu - Matthew Shinkle - Robby Gottesman - Ohelig - Slobodan Mišković - Sarah - Nikola Tasev - Voltaic - Sam Ringer - Tapio Kortesaari