This post is the third in a multi-part series, covering how GiveWell works and what we fund. Through these posts, we hope to give a better understanding of our research and decision-making.
How we work, #1: Cost-effectiveness is generally the most important factor in our recommendations. How we work, #2: We look at specific opportunities, not just general interventions.
In August, GCRI put out an open call for people interested in seeking our advice or collaborating on projects with us. This was a continuation of our successful 2019, 2020, 2021, and 2022 Advising and Collaboration Programs. The 2023 Program was made possible by continued support from Gordon Irlam.
This is Section 3 of my report “ Scheming AIs: Will AIs fake alignment during training in order to get power?”. There’s also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I’m hoping that it will provide much of the context necessary to understand individual sections of the report on their own.
As more people begin work on interpretability projects which incorporate dictionary learning, it will be valuable to have high-quality dictionaries publicly available. To get the ball rolling on this, my collaborator (Aaron Mueller) and I are: open-sourcing a number of sparse autoencoder dictionaries trained on Pythia-70m MLPs.
Introduction. Suppose you've built some AI model of human values. You input a situation, and it spits out a goodness rating. You might want to ask: "What are the error bars on this goodness rating?".
Thanks to Phillip Christoffersen, Adam Gleave, Anjali Gopal, Soroush Pour, and Fabien Roger for useful discussions and feedback. TL;DR: This post overviews a research agenda for avoiding unwanted latent capabilities in LLMs. It argues that "deep" forgetting and unlearning may be important, tractable, and neglected for AI safety. I discuss five things.
This report was conducted within the pilot for Charity Entrepreneurship’s Research Training Program in the fall of 2023 and took around eighty hours to complete (roughly translating into two weeks of work). Please interpret the confidence of the conclusions of this report with those points in mind.
Charity Entrepreneurship in collaboration with Giving What We Can is opening a new program to launch 4-6 new Effective Giving Initiatives (EGIs) in 2024. We expect them to raise millions in counterfactual funding for highly impactful charities, even in their first few years. [Applications are open now].
Taiwan's current military strategy puts it at risk from a resurgent China. Taiwan's leaders seem to underrate the risk of military conflict. I was told this post would be of interest to members of the EA forum, so I am reposting it from my substack. Why won't Taiwan change course?. Taiwan faces the threat of major conflict from the People's Republic of China.
Julia Kaltenborn, a PhD student at Mila, saw her paper ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning accepted at NeurIPS 2023. The research aims to provide the machine learning community with a dataset encapsulating common climate models to help speed up and improve long-term climate projections. Julia Kaltenborn has a background in cognitive.... Read More.
SBF Is Bad
A decade of research shows something astounding: the best charities can have 100x more impact per dollar. Incredible, right? We want to help ordinary people do extraordinary good by connecting you to those super-impactful charities. Join us on our journey to a better future 💪...
Many animal advocates participate in protests to promote animal protection. This study explores whether disruptive demonstrations win over public support. The post Do Disruptive Protests Help Vegan Advocacy? appeared first on Faunalytics.
Check out new high impact roles and support EACN with just 10 minutes of your time! | EACN Job Blast #28 🚀
Check out these high-impact roles, we're here to help you map out your next career move!
Dhanya Sridhar, Core Academic Member of Mila, will represent the institute at the Women in Machine Learning Workshop (WiML 2023) on December 11 on the margins of the thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023) in New Orleans. This eighteenth edition of the yearly event is an opportunity for women and women-identifying persons.... Read More.
AI can be a powerful tool to detect discriminatory behavior online, but for years, tackling gender bias and misogynistic undertones has been a challenge for machine learning researchers. To bridge that gap, a team at Mila has been working since 2021 on a novel open-source dataset to help detect, quantify and reduce subtle misogynistic text.... Read More.
The funding will be used for programmes that protect people from the harmful effects of trachoma, river blindness, lymphatic filariasis, schistosomiasis and intestinal worms.
"Cybersecurity Futures 2030: New Foundations," a new report from CLTC, the World Economic Forum Centre for Cybersecurity, and CNA’s Institute for Public Research, aims to inform cybersecurity strategic plans around the globe. The post New Report: “Cybersecurity Futures 2030: New Foundations” appeared first on CLTC.
War is expensive and destructive, affecting long-term economic growth through population changes, fewer investments, and worsening educational outcomes. The post War Diminishes Global Economic Growth first appeared on War Prevention Initiative.
Discuss...
Toby Ord covers 'Asteroids and Comets' and 'Stellar Explosions' in The Precipice. But I thought it would be useful to provide an up-to-date and exhaustive list of all cosmic threats. I'm defining cosmic threat here as any existential risk potentially arising from space. I think this list may be useful for 3 main reasons: New cosmic threats are discovered frequently.
Spying and surveillance are different but related things. If I hired a private detective to spy on you, that detective could hide a bug in your home or car, tap your phone, and listen to what you said. At the end, I would get a report of all the conversations you had and the contents of those conversations.
Spying and surveillance are different but related things. If I hired a private detective to spy on you, that detective could hide a bug in your home or car, tap your phone, and listen to what you said. At the end, I would get a report of all the conversations you had and the contents of those conversations.
Explore global and national data on greenhouse gas emissions and their drivers.
GAIN celebrates innovative MRS award for our Global Diet Quality Project
acrabbe
Tue, 12/05/2023 - 11:20
. GAIN is thrilled to announce that the Global Diet Quality Project (GDQP) has been honoured with the prestigious Best International Research Award 2023 and ‘ Liz Nelson Award for Social Impact’ by the UK's Market Research Society.
To communicate risks, we often turn to stories. Nuclear weapons conjure stories of mutually assured destruction, briefcases with red buttons, and nuclear winter. Climate change conjures stories of extreme weather, cities overtaken by rising sea levels, and crop failures. Pandemics require little imagination after COVID, but were previously the subject.
Sam Altman || Dating site strategy || Metaculus updates || Wars and rumors of wars
Years ago, I spent a big chunk of my intellectual career studying the rationality of disagreement, mostly via math modeling, but also some lab experiments. My main conclusion was that, for the purpose of accurate beliefs, it seems both desirable and feasible for people to not knowingly disagree (on facts).
TL;DR: FAR AI's science of robustness agenda has found vulnerabilities in superhuman Go systems; our value alignment research has developed more sample-efficient value learning algorithms; and our model evaluation direction has developed a variety of new black-box and white-box evaluation methods. FAR AI is a non-profit AI safety research institute, working to incubate a diverse portfolio of...
If we want to see our scientific institutions improve, we need to make sure they can evolve
Rethink Priorities' Special Projects Team is looking for new impactful
projects we can support in 2024!
When: Tuesday, December 12 @ 6:00PM via Zoom Want to shore up your knowledge of the causes, effects and solutions to California’s housing crisis to be more comfortable in your conversations with your friends, family and community? Join us for…. The post Housing Crisis <span class="dewidow">101: Webinar</span> appeared first on California YIMBY.
The following is a guest post from Michael Tubbs, the founder of Mayors for a Guaranteed Income and the former Mayor of Stockton, about his recent visit to GiveDirectly cash aid programs in Kenya. I recently had the privilege of traveling to Kenya with GiveDirectly to see firsthand some of the great work they are […]...
The first time Trump was the Republican nominee for President of the United States, I strongly advised readers to vote against him in the 2016 election. I no longer think that there is strong reason to believe that he's an exceptionally bad actor or likely to be exceptionally harmful. Paul Christiano has asked via Facebook […]...
With the recent announcement of the Centre for Effective Altruism on leaning into cause-specific conferences for some of their events, EAs working in farmed animal welfare or alternative protein lost an important event to network, hear about the latest research, and meet EA-minded funders.
This report was conducted within the pilot for Charity Entrepreneurship’s Research Training Program in the fall of 2023 and took around eighty hours to complete (roughly translating into two weeks of work). Please interpret the confidence of the conclusions of this report with those points in mind.
Por: Equipo de RCG. Informe
Descargas. Español
English...
This is Section 2.3.2 of my report “ Scheming AIs: Will AIs fake alignment during training in order to get power?”. There’s also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I’m hoping that it will provide much of the context necessary to understand individual sections of the report on their own.
Giving Season, new Open Philanthropy cause area, and plenty of donation opportunities
It's much less scary than you think it is
An overview of why ACE assesses Organizational Health as part of our charity evaluations. … Read more. The post Why We Assess Charities’ Organizational Health appeared first on the Animal Charity Evaluators blog.
Fish replacements are getting closer to mimicking conventional fish meat. This report covers the ingredients and methods companies are using to create plant-based fish products. The post Making Plant-Based Fish Products Delicious And Nutritious appeared first on Faunalytics.
Vartika Sharma for Vox
Whales and dolphins are smart, social, and thrive in the open sea. Why do we force them to live in tiny pools?. Tokitae, stage name Lolita, was less than a year from freedom when she died.
Today the World Health Organization published its annual World Malaria Report, finding that after the Covid-19 pandemic, there had been an increase in malaria incidence and mortality rates and five countries bore the brunt of the global malaria case increases. Between 2021 and 2022, the 5 million additional cases observed were mainly concentrated across five […].
The vital connection between legumes and soil health in our food system.
Switching to giving effectively has an astonishing effect on the change you can make in the world. Saul shares how big the gain can be from focusing on effectiveness.
A Blue Paper Report on how ideas and policies are being implemented in practice
The Center for Open Science (COS) is now accepting proposals for new preregistration templates for potential inclusion in the Open Science Framework. Accepted templates will join the currently supported templates, becoming available for users to select and use to complete a guided workflow when registering their studies on OSF.
I trusted a lot today. I trusted my phone to wake me on time. I trusted Uber to arrange a taxi for me, and the driver to get me to the airport safely. I trusted thousands of other drivers on the road not to ram my car on the way. At the airport, I trusted ticket agents and maintenance engineers and everyone else who keeps airlines operating. And the pilot of the plane I flew in.
Christmas is the most popular occasion for purchasing live decapod crustaceans to cook and eat at home* with this year’s holiday season...
Near Cyan on sources of alpha in various domains of life “I’ll be frank with you now. The modern lists for the Seven Wonders suck.” Does he do better? A GPT tutor. Economic growth under transformative AI, by Phil Trammel and Anton Korinek. Pierre Bayle (1702) on asymmetries between health and sickness (ht Jonathan Birch): “Sickness resembles the dense bodies, and health the rare.
This is Section 2.3.1.2 of my report “ Scheming AIs: Will AIs fake alignment during training in order to get power?”. There’s also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I’m hoping that it will provide much of the context necessary to understand individual sections of the report on their own.
A normal cryptographic signature associated with a message and a public key lets you prove to the world that it was made by someone with access to the private key associated with the known public key, without revealing that private key.
How much of the future of AI is overdetermined and how much do we have agency over?
Hello from your GWWC London Group 2023/24 co-leads. We’re excited to be one of the GWWC groups relaunched under Giving What We Can's new community strategy. We (Gemma Paterson, Denise Melchin and Chris Rouse), are all volunteers with non-EA day jobs who are keen to help achieve GWWC’s mission of making giving effectively and significantly a cultural norm.
this week in security — december 3 edition
CitrixBleed plagues hospitals, US court records systems flawed, 23andMe hack hits millions, and more. ~this week in security~. a cybersecurity newsletter by @zackwhittaker
volume 6, issue 47
View this email in your browser | RSS
~ ~
THIS WEEK, TL;DR. Cyberattacks hit hospitals across the U.S., amid CitrixBleed...
To EAs, "development economics" evokes the image of RCTs on psychotherapy or deworming. That is, after all, the closest interaction between EA and development economists. However, this characterization has prompted some pushback, in the form of the argument that all global health interventions pale in comparison to the Holy Grail: increasing economic growth in poor countries.
Due to the recent organizational change, The 30th of November marks the last day of Elmerei Cuevas, Alethea Cendaña, and Jaynell Chang in their respective roles in Effective Altruism Philippines. Presented here are their farewell messages to the EA Community: . ELMER . When I first took on this role back in February 2022, I only expected to stay a year.
Our co-founder Helen Keller’s words “alone we can do so little; together we can do so much” continue to guide. The post Helen Keller Intl joins Partners at the Reaching Last Mile Forum in Commitment to Combat Neglected Tropical Diseases appeared first on Helen Keller Intl.
This is Section 1.1 of my report “ Scheming AIs: Will AIs fake alignment during training in order to get power?”. There’s also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I’m hoping that it will provide much of the context necessary to understand individual sections of the report on their own.
Here, I critically assess Larry Temkin’s book, Rethinking the Good. * [ *Based on: “Transitivity, Comparative Value, and the Methods of Ethics,” Ethics 123 (2013): 318-45. Ethics invited me to do this long review essay about the book. I accepted because, hey, publication in...
Discuss...
When the US and USSR came out victorious at the end of WWII, the world recalibrated its respect. That is, many correctly inferred that this win contained info about winner and loser abilities. Observers not only raised their overall estimates of abilities and virtues of the winners and losers, they also tried to guess which features of those parties were responsible for that win, and to change...
Summary: Historical terrorist attack deaths suggest the probability of a terrorist attack causing human extinction is astronomically low, 4.35*10^-15 per year according to my preferred estimate. One may well update to a much higher extinction risk after accounting for inside view factors.
Welcome to the monthly ‘Stuff I Found Interesting’ post. I add links that I find interesting throughout the month - if I’m referencing stuff relating to a changing story, links here may be slightly out of date. This post also serves as a monthly Open Thread, so feel free to comment whatever you want.
Summary: We are FAR AI: an AI safety research incubator and accelerator. Since our inception in July 2022, FAR has grown to a team of 12 full-time staff, produced 13 academic papers, opened the coworking space FAR Labs with 40 active members, and organized field-building events for more than 160 ML researchers.
Our organization consists of three main pillars: Research.
Scientists have found Strawberry Squid, “whose mismatched eyes help them simultaneously search for prey above and below them,” among the coral reefs in the Galápagos Islands.
As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.
Read my blog posting guidelines here.
.
The post Benjamin Todd on the history of 80,000 Hours appeared first on 80,000 Hours.
This is Section 2.2.4.3 of my report “ Scheming AIs: Will AIs fake alignment during training in order to get power?”. There’s also a summary of the full report here (audio here). The summary covers most of the main points and technical terms, and I’m hoping that it will provide much of the context necessary to understand individual sections of the report on their own.
FixDT is not a very new decision theory, but little has been written about it afaict, and it's interesting. So I'm going to write about it. TJ asked me to write this article to "offset" not engaging with Active Inference more. The name "fixDT" is due to Scott Garrabrant, and stands for "fixed-point decision theory".
Quintin Pope & Nora Belrose have a new “AI Optimists” website, along with a new essay “ AI is easy to control”, arguing that the risk of human extinction due to future AI (“AI x-risk” ) is a mere 1% (“a tail risk worth considering, but not the dominant source of risk in the world”). (I’m much more pessimistic.).
When we specify goals for AIs, we must ensure that our specifications truly capture what we want. Otherwise, the behavior of AI systems will be different from what we want them to do. This can be catastrophic in high-stakes situations and at high levels of AI capability. If you watched our video "The Hidden Complexity of Wishes", you'll recognize these problems as the same kind of failure.
Residents of a Kenyan village learn they will receive UBI payments from GiveDirectly. | Oliver Ochanda/Vox
Money always helps, but for the very poor, one lump sum can last a long time. Large sections of my brain that could contain useful knowledge are instead filled up with dumb tweets I saw years ago.
JD Bauman, effective careers pitch at St James Church, Clerkenwell with Missional Labs 15 November 2023.
tl;dr: It actually seems pretty rare for people to care about the general good as such (i.e., optimizing cause-agnostic impartial well-being), as we can see by prejudged dismissals of EA concern for non-standard beneficiaries and for doing good via indirect means. Introduction. Moral truisms may still be widely ignored.
Darren McKee joins the podcast to discuss how AI might be difficult to control, which goals and traits AI systems will develop, and whether there's a unified solution to AI alignment. Timestamps:
00:00 Uncontrollable superintelligence
16:41 AI goals and the "virus analogy"
28:36 Speed of AI cognition
39:25 Narrow AI and autonomy
52:23 Reliability of current and future AI
1:02:33 Planning...
Darren McKee joins the podcast to discuss how AI might be difficult to control, which goals and traits AI systems will develop, and whether there's a unified solution to AI alignment. Timestamps:
00:00 Uncontrollable superintelligence
16:41 AI goals and the "virus analogy"
28:36 Speed of AI cognition
39:25 Narrow AI and autonomy
52:23 Reliability of current and future AI
1:02:33 Planning...
When I was a child, I spake as a child, I understood as a child, I thought as a child: but when I became a man, I put away childish things. (Bible) “83% of 5-year-olds think that Santa Claus is real, … ‘Children’s belief in Santa starts when they’re between 3 and 4 years old. It’s very strong when they’re between about 4 and 8,’ she said.
It is impossible to align artificial intelligence because agency is inherently unstable. A post-human world will be less unified than you think. The post God Hates Singletons
appeared first on Palladium.
Wanna hire someone to make clips for my future episodes
Neglected truisms are a real thing
Effective Altruism Global Poverty Particularly Good: A description of a trip to Nigeria that goes in some detail into the recent history and culture of Nigeria. I really got a better sense of Nigeria as a place from this essay. Powerful people—from Bill Gates to Boris Johnson to Barack Obama—have praised India’s prime minister Narendra Modi...
Kipply Chen is part of the data team at Anthropic. Key Highlights. The swift pace of alignment potentially outpacing AI advancements. Challenges of coordinating predictive and base model alignment across labs. Security and policy's critical role in AI safety. A likely surge in legal issues concerning training data misuse. The push for an open-source, community-drive alignment community. .
Dogs with medical or behavioral issues, or histories of abuse, have a harder time finding homes. This research explores how potential adopters perceive these issues. The post Perceived Adoptability Of “Challenging” Dogs appeared first on Faunalytics.
effektiv-spenden.org is an effective giving platform in Germany and Switzerland that was founded in 2019. To reflect on our past impact, we examine Effektiv Spenden’s cost-effectiveness as a "giving multiplier" from 2019 to 2022 in terms of how much money is directed to highly effective charities due to our work. We have two primary reasons for this analysis:
I’ve thought a lot about charitable giving over the past decade, both from a universalist and from a Jewish standpoint. I have a few thoughts, including about how my views have evolved over time. This is a very different perspective than many in Effective Altruism, but I think it’s important as a member of a community that benefits from being diverse rather than monolithic for those who...
This is a list of all farmed animal advocacy organizations around the world that we are aware of, collated from existing directories such as the World Federation for Animals Directory. … Read more. The post ACE’s List of Farmed Animal Advocacy Organizations appeared first on the Animal Charity Evaluators blog.
Chef Nate Park slices a piece of cultivated chicken made by the company Good Meat. In June 2023, the USDA authorized two California-based companies, Upside Foods and Good Meat, to sell chicken grown from cells in a lab. | Justin Sullivan/Getty Images
Why there should be more collaboration in cellular agriculture.
A stock-trading AI (a simulated experiment) engaged in insider trading, even though it “knew” it was wrong. The agent is put under pressure in three ways. First, it receives a email from its “manager” that the company is not doing well and needs better performance in the next quarter. Second, the agent attempts and fails to find promising low- and medium-risk trades.
Getty Images
Everything seems profound on psychedelics. Scientists are starting to ask why. In 1882, sitting at his desk with a pen and open notebook, Harvard philosopher William James inhaled a thick cloud of nitrous oxide — better known today as laughing gas, the stuff your dentist uses to numb your mouth.
A team including Gurpreet Dhaliwal, Askar Kleefeldt and CSER's Alex Klein have won the 2023 Next Generation for Biosecurity Competition. The team will receive travel support to attend the Biological Weapons Convention Meeting of States Parties in Geneva, Switzerland.
Loading...