Effective Altruism News
Effective Altruism News
- I tried training Qwen2.5-1.5B with RL on math to both get correct answers and have a CoT that doesn’t look like human-understandable math reasoning. RL sometimes succeeds at hacking my monitor, and when I strengthen my monitor, it fails at finding CoT that are both illegible and helpful, even after training for roughly 4000 steps (~1B generated tokens).
- I woke up Friday morning w/ a very sore left shoulder. I tried stretching it, but my left chest hurt too. Isn't pain on one side a sign of a heart attack?. Chest pain, arm/shoulder pain, and my breathing is pretty shallow now that I think about it, but I don't think I'm having a heart attack because that'd be terribly inconvenient.
- Last week, Thinking Machines announced Tinker. It’s an API for running fine-tuning and inference on open-source LLMs that works in a unique way. I think it has some immediate practical implications for AI safety research: I suspect that it will make RL experiments substantially easier, and increase the number of safety papers that involve RL on big models.
- Last week, Thinking Machines announced Tinker. It’s an API for running fine-tuning and inference on open-source LLMs that works in a unique way. I think it has some immediate practical implications for AI safety research: I suspect that it will make RL experiments substantially easier, and increase the number of safety papers that involve RL on big models.
- TL;DR: I made a dataset of realistic harmless reward hacks and fine-tuned GPT-4.1 on it. The resulting models don't show emergent misalignment on the standard evals, but they do alignment fake (unlike models trained on toy reward hacks), seem more competently misaligned, are highly evaluation-aware, and the effects persist when mixing in normal data.
- It’s amazing how much smarter everyone else gets when I take antidepressants. It makes sense that the drugs work on other people, because there’s nothing in me to fix. I am a perfect and wise arbiter of not only my own behavior but everyone else’s, which is a heavy burden because some of ya’ll are terrible at life. You date the wrong people.
- Intro. LLMs being trained with RLVR (Reinforcement Learning from Verifiable Rewards) start off with a 'chain-of-thought' (CoT) in whatever language the LLM was originally trained on. But after a long period of training, the CoT sometimes starts to look very weird; to resemble no human language; or even to grow completely unintelligible. Why might this happen?.
- Above the Fold plays the waiting game
- Could desalination make them irrelevant?
- Listen now | A conversation with Paul Scharre, author of Four Battlegrounds: Power in the Age of Artificial Intelligence joins us to talk about
- Beginning in 2027, all vegetarian MREs will be replaced with vegan options WASHINGTON — In a groundbreaking move recently announced by Pentagon News, the U.S. military will replace its four vegetarian MREs (meals ready to eat) with fully plant-based versions in 2027. The change comes after years of advocacy by Mercy For Animals and its […].
- We would aim for heaven if we knew what it was like
- It's a promising design for reducing model access inside AI companies.
- New York State's AI bill is more ambitious than California’s SB 53 — and is facing opposition from Andreessen Horowitz and other tech groups...
- Opinion: Assembly Member Alex Bores argues that regulation can prevent market pressure from encouraging the release of dangerous AI models, without harming innovation.
- COS’s 2026–2028 Strategic Planning Process. As the global research system evolves—technologically, politically, and culturally—so must the organizations that support it. At the Center for Open Science (COS), we’re developing a bold and focused strategy for 2026–2028 to meet the moment and our shared future with clarity, collaboration, and impact. This planning process comes at a pivotal time.
- Greta Panova wrote a math problem so difficult that today’s most advanced AI models don’t know where to begin.
- Abdulai has been recognised at this year’s Presidential National Best Teachers Awards in Sierra Leone for his work to make education systems more inclusive of children with disabilities.
- The post If we can’t control MechaHitler, how will we steer AGI? appeared first on 80,000 Hours.
- Sometimes things are boring
- When will artificial intelligence (AI) match top human forecasters at predicting the future? In a recent podcast episode, Nate Silver predicted 10–15 years. Tyler Cowen disagreed, expecting a 1–2 year timeline. Who’s more likely to be right?.
- Cori Jackson — a single mom living in Indiana — took in her two young nieces to keep them out of foster care this summer. It hasn’t been easy. The youngest still isn’t potty-trained. The oldest isn’t used to having food in the fridge so, sometimes, she eats so much it makes her sick. The […]...
- (Context: I’m not an expert in animal welfare. My aim is to sketch a potentially neglected perspective on prioritization, not to give highly reliable object-level advice.). Summary: We seem to be clueless about our long-term impact. We might therefore consider it more robust to focus on neartermist causes, in particular animal welfare.
- Disclaimers: I am a computational physicist, not a machine learning expert: set your expectations of accuracy accordingly. All my text in this post is 100% human-written without AI assistance. Introduction: The threat of human destruction by AI is generally regarded by longtermists as the most important cause facing humanity.
- gui2de is hiring student RAs at Georgetown University for the academic year 2025-2026.
- Safeway employees across Alberta are sounding the alarm about Sobeys, their parent label owned by Empire Company Limited. Through its “Truck You, Sobeys” campaign, the United Food and Commercial Workers union accuses Sobeys of cutting delivery routes and reducing full-time jobs to protect profit margins — moves that hurt both workers and customers. This public […].
- The post Vegan Meals Ready-to-Eat (MREs) Coming to US Military Rations by 2027 appeared first on Mercy For Animals.
- TLDR: We found that models can coordinate without communication by reasoning that their reasoning is similar across all instances, a behavior known as superrationality. Superrationality is observed in recent powerful models and outperforms classic rationality in strategic games. Current superrational models cooperate more often with AI than with humans, even when both are said to be rational.
- tl;dr: In terms of financial interests of an AI company, bankruptcy and the world ending are both equally bad. If a company acted in line with its financial interests , it would happily accept significant extinction risk for increased revenue. There are plausible mechanisms which would allow a company to act like this even if virtually every employee would prefer the opposite.
- A pandemic that's substantially worse than COVID-19 is a serious possibility. If one happens, having a good mask could save your life. A high quality reusable mask is only $30 to $60, and I think it's well worth it to buy one for yourself. Worth it enough that I think you should order one now if you don't have one already. But if you're not convinced, let's do some rough estimation.
- This is a link post for two papers that came out today: Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time (Tan et al.). Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment (Wichers et al.).
- This is a link post for two papers that came out today: Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time (Tan et al.). Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment (Wichers et al.).
- In Malawi, we’re answering a new question: can cash not only transform individual lives but entire communities, accelerating the end of extreme poverty? The evidence is clear that large, unconditional cash transfers help people escape extreme poverty. Now we’re testing how it works at scale and learning how to make it even more effective along the […]...
- A math and engineering friendly tour of how networks “choose” to vibrate. At the Ekkolapto Polymath Salon @ Frontier Tower in San Francisco, Andrés Gómez Emilsson (QRI Director of Research) presents our program combining bottom-up oscillator simulations with top-down spectral graph theory to reveal a graph’s resonant modes and symmetries.
- EA Forum Digest #261 Hello!. Draft Amnesty Week starts on Monday! Check out the “What posts would you like someone to write?” thread if you’d like some inspiration. Two weeks left to enter the ‘Essays on Longtermism’ Competition — the top prize is $1000. Also, the application deadline for EAGxSingapore is coming up on October 20. Enjoy the posts! :) .
- It’s amazing how much smarter everyone else gets when I take antidepressants. It makes sense that the drugs work on other people, because there’s nothing in me to fix. I am a perfect and wise arbiter of not only my own behavior but everyone else’s, which is a heavy burden because some of ya’ll are … Continue reading "I take antidepressants. You’re welcome"...
- Different plans for different levels of political will
- I sometimes think about plans for how to handle misalignment risk. Different levels of political will for handling misalignment risk result in different plans being the best option. I often divide this into Plans A, B, C, and D (from most to least political will required). See also Buck's quick take about different risk level regimes.
- I sometimes think about plans for how to handle misalignment risk. Different levels of political will for handling misalignment risk result in different plans being the best option. I often divide this into Plans A, B, C, and D (from most to least political will required). See also Buck's quick take about different risk level regimes.
- How to Lose Friends and Infuriate People
- Defeating entropy, not just death
- With so many urgent problems in the world, how do you decide where to focus? In this video, we’ll share a simple but powerful framework for choosing the areas where you can have the greatest impact. Care about all the things? Here’s how to cut through the noise, avoid spreading yourself too thin, and focus your energy where it really counts.
- Today marks the release of Faunalytics’ Aquaculture Fundamentals! This blog gives you an overview of what you can find inside, and outlines what we left out and where you can learn more. The post Aquaculture Fundamentals: What We Included & What We Left Out appeared first on Faunalytics.
- If you spend too much time on X, you’ve probably seen this annoying heatmaps meme:
- A battle is simmering over proposals to make US chipmakers sell to domestic customers before exporting to countries such as China
- Despite adoption through a rescue or shelter being a highly ethical route in finding an animal companion, BIPOC-identifying individuals risk facing discrimination and rejection along the way. How can we make adoption a more empathetic and equitable experience?. The post Adoption Without Adversity: BIPOC Experiences In Companion Animal Acquisition appeared first on Faunalytics.
- The Educated Choices Program (ECP) is seeking a full-time, remote Communications Manager to join our team! We are a nonprofit dedicated to creating a healthier, more sustainable world by empowering students and communities to make informed food choices.
- October Brief | Visa restrictions didn't stop this consultant from making an impact 10 new roles from AI policy in the UK to global health research—closing soon. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ...
- October Brief | Visa restrictions didn't stop this consultant from making an impact 10 new roles from AI policy in the UK to global health research—closing soon. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ...
- Covering Null Results: How to Turn “Nothing” into News A new article in The Open Notebook offers guidance for science journalists on how to cover null results—studies that find no significant effect—and why these findings are essential to the scientific process.
- Direct cash transfers effective in poverty alleviation, says J-PAL's Iqbal Singh Dhaliwal Hardware is hard but software is harder — India has built the hardware. Now the country must focus on the software – the people, their skills, their health and education, the economist tells Moneycontrol. spriyabalasubr… Wed, 10/08/2025 - 09:47...
- Opinion | States need to improve finances to expedite poverty alleviation agenda States need to improve their financial health to achieve faster poverty reduction goals, J-PAL, part of the Massachusetts Institute of Technology (MIT), global executive director Iqbal Singh Dhaliwal has said. spriyabalasubr… Wed, 10/08/2025 - 09:46...
- States need to improve finances to expedite poverty alleviation agenda States need to improve their financial health to achieve faster poverty reduction goals, J-PAL, part of the Massachusetts Institute of Technology (MIT), global executive director Iqbal Singh Dhaliwal has said. spriyabalasubr… Wed, 10/08/2025 - 09:46...
- Direct cash transfers: ‘Studies show that on average people save money for productive uses or invest’ Iqbal Dhaliwal, Global Executive Director, J-PAL (Abdul Latif Jameel Poverty Action Lab), which is based at the Massachusetts Institute of Technology’s (MIT’s) economics department and works to reduce poverty by ensuring that policy is informed by scientific evidence, believes that cash...
- If someone illegally double parks in a one-way street and a cop walks by, the expectation is that they’d get fined. Similarly, you’d think that if a company that uses animals is caught mistreating them, they too would face some sort of legal repercussion. But for many businesses in the US, that’s not what’s happening. […]...
- Fiscal reform: J-PAL’s Dhaliwal urges states to fix finances for faster poverty reduction; calls for scheme rationalisation and focus on human capital Iqbal Singh Dhaliwal of J-PAL urges states to strengthen finances for poverty reduction, highlighting excessive welfare schemes and unproductive fund use.
- In September 2025 in Warsaw prior to CARE, 19 animal movement leaders came together to participate in the first ever AGI and animal welfare wargame. TLDR: Great-power moves and food-security shocks repeatedly overrode animal welfare efforts; standard campaigning tactics like mass mobilization struggled under political crackdowns and were overshadowed by government’s concerns about public...
- Paul Thomas Anderson's newest film is a paean to moderate leftism
- I want to highlight this paper (from Sept 29, 2025) of an alternative to RL (for fine-tuning pre-trained LLMs) which: Performs better. Requires less data. Consistent across seeds. Robust (ie don't need to do a grid search on your hyperparameters). Less "Reward Hacking" (ie when optimizing for conciseness, it naturally stays close to the original model ie low KL-Divergence).
- A pandemic that's substantially worse than COVID-19 is a serious possibility. If one happens, having a good mask could save your life. A high quality reusable mask is only $30 to $60, and I think it's well worth it to buy one for yourself. Worth it enough that I think you should order one now if you don't have one already. But if you're not convinced, let's do some rough estimation.
- If there is no plausible mechanism by which a scientific hypothesis could be true, then it’s almost certainly false. But if there is a plausible mechanism for a hypothesis, then that only provides weak evidence that it’s true. An example of the former: Astrology teaches that the positions of planets in the sky when you’re born can affect your life trajectory.
- L'épisode complet : https://youtu.be/gXNQGXnuE5I?si=ilE9M5N1ewC2xg4z
- Interested in AI (safety) research and pursuing a career of earning to give. Would love to meet a lot of fellow EAs/vegans/rationalist types while doing this. I'm currently in Belgium. And almost finished with my computer science masters degree from a ~150 ranked university.
- The odds are against you and the situation is grim. Your scrappy band are the only ones facing down a growing wave of powerful inhuman entities with alien minds and mysterious goals. The government is denying that anything could possibly be happening and actively working to shut down the few people trying things that might help.
- This is a cross-post of some recent Anthropic research on building auditing agents. The following is quoted from the Alignment Science blog post. tl;dr: We're releasing Petri (Parallel Exploration Tool for Risky Interactions), an open-source framework for automated auditing that uses AI agents to test the behaviors of target models across diverse scenarios.
- After years of dedicated federal policy work led by Mercy For Animals, the U.S. military is making a monumental shift in its food procurement. Starting in 2027, the military will replace the four existing vegetarian MREs (meals ready to eat) with fully plant-based options. This policy change means that four of the 24 MRE menu […].
- Doomimir: The possibility of AGI being developed gradually doesn't obviate the problem of the "first critical try": the vast hypermajority of AGIs that seem aligned in the "Before" regime when they're weaker than humans, will still want to kill the humans "After" they're stronger and the misalignment can no longer be "corrected". The speed of the transition between those regimes doesn't matter.
- America’s regional planning boards are stacked against transit riders. While renters make up over 30% of households in typical metropolitan areas, they hold just 3% of seats on regional planning agencies that control billions of dollars in federal transportation funding.….
- California’s construction defect liability system (the legal rules that let people sue builders for problems with new buildings) is adding up to $18,300 per unit to the costs of condominiums. What was supposed to protect consumers has become a barrier….
- After months of public pressure, Best Western has finally shared meaningful progress toward its global cage-free commitment — and because of this, we’re cautiously celebrating and pausing our public campaign. Best Western’s most recent statement has shown significant progress:
- Today, of course, is the second anniversary of the genocidal Oct. 7 invasion of Israel—the deadliest day for Jews since the Holocaust, and the event that launched the current wars that have been reshaping the Middle East for better and/or worse. Regardless of whether their primary concern is for Israelis, Palestinians, or both, I’d hope […]...
- When mathematicians make breakthroughs, they hallucinate too.
- Is believing in God like believing in Santa Claus?
- This study aims to identify the perceptions and preferences of young consumers in two of the largest meat substitute markets in the world. The post What Do Young Chinese And Japanese Consumers Think About Meat Alternatives? appeared first on Faunalytics.
- Animals < Humans < Nature?
- Updates from CEPI, NTI | bio, GHSN, Asia CHS, CSR, CCDD, Blueprint Biosecurity, UNIDIR, Brown Pandemic Center, CLTR, 1DaySooner, SecureBio, Sentinel Bio, MBDF, Open Philanthropy and IBBIS
- We will get back to you within two weeks from the deadline with a request for more information or an invitation to an interview if there is sufficient interest in your application. Decisions are communicated and grants paid out mid December.
- More Funding and Better Policies Are The Key Drivers For The Private Sector to Produce Safe And Nutritious Foods gloireri Tue, 10/07/2025 - 10:46 More Funding and Better Policies Are The Key Drivers For The Private Sector to Produce Safe And Nutritious Foods. Meet Naguti Scovia, founder of ABBA Quality Foods.
- Chana Messinger (80k’s head of video, formerly of CEA’s Community Health team) and I (Matt Reardon, formerly of 80k) have been recording Expected Volume (Apple, Spotify), a podcast of unscripted conversations between us and and some great guests from across the community, including: Andy Masley. Phil Trammell. Conor Barnes. Daniel Filan. Alex Lawsen. Trevor Levin. Julia Wise.
- A frame I am trying on: When I say I'm worried about takeover by "AI superintelligence", I think the thing I mean by "intelligence" is "relentless, creative resourcefulness.". I think Eliezer argues something like "in the limit, superintelligence needs to include super-amounts-of Relentless, Creative Resourcefulness.".
- Could we halve aviation's climate impact at a fraction of the cost of sustainable aviation fuels?
- A few days ago, I got a call at 2:39 a.m.
- This is a narration of ‘Introducing Better Futures ’ by William MacAskill; published 3rd August 2025. Narration by Perrin Walker (@perrinjwalker).
- Epistemic Status: A fun heuristic. Advice is directional. We all are constantly making decisions about how to spend our time and resources, and I think making these decisions well is one of the most important meta-skills one can possibly have. Time spent valuably can quickly add up into new abilities and opportunities, time spent poorly will not be returned.
- THL is hiring an OWA Asia-Pacific Corporate Strategy Lead and OWA Europe Corporate Strategy Lead. The ideal candidate will be a strategic thinker with excellent communication skills who has experience with corporate pressure campaigns. They will be responsible for building relationships with OWA member groups and providing advice, coaching, and training, which will translate to them securing...
- Of course, you must understand, I couldn't be bothered to act. I know weepers still pretend to try, but I wasn't a weeper, at least not then. It isn't even dangerous, the teeth only sharp to its target. But it would not have been right, you know? That's the way things are now. You ignore the screams.
- Hello everyone,. ALDF is hiring an Associate General Counsel to work closely with me, the General Counsel, and Molly, our other Associate General Counsel. The job posting can be found here, and I'm copying the text of the post below. Please share with anyone in your network who could be a good fit. Kind regards,. John Seber. . Associate General Counsel. Fully Remote. Description.
- Animal Place is a 501 (c)(3) animal sanctuary and is one of the oldest, largest, and most respected sanctuaries dedicated to farmed animals in the United States. Apply: Executive Director, Animal Place. Location: California, United States. Salary: $120,000 to $150,000, commensurate with experience.
- Hello! We are seeking support for the Great Apes Law in Spain — a law that would be historic. The Spanish Government had a legal mandate to present a specific law for the protection of great apes within three months of the approval of the Animal Welfare Law in 2023. Two years have passed and nothing has been done.
Loading...