Presslei

How to Find Journalists to Pitch

How I Built a 27,000+ Journalist Database From Scratch

DIY Media Database

Built from scratch without paying for a media list

⌚ 9 min read · 2,000 words

27K+
journalist contacts

Most PR agencies pay $500 to $1,000 a month for media databases like Prowly, Muckrack, or Cision. We did too, briefly. Then I realized something: those databases are incomplete, outdated, and everyone else has the same list.

Key TakeawayA database of 500 highly-targeted journalists with recency data is worth more than a purchased list of 50,000 generic media contacts. Quality and recency beat volume every time.
18,871
Verified journalist contacts in Presslei’s database — each with beat classification and recency data from real campaigns

“A journalist database isn’t a spreadsheet of email addresses. It’s a living system of relationships, recency signals, and coverage patterns.”

— Salva Jovells, Presslei

Key Takeaway

We built a database of 27,000+ journalists without buying a single media list. Using free tools, public data from sitemaps and bylines, LinkedIn exports, and smart automation, any PR team can build a better contact list than what paid services offer.

So I built our own. From scratch. It now has 27,000+ journalists with beat information, contact details, and engagement history. And I am going to show you exactly how we did it.

IN THIS ARTICLE
Why Commercial Media Databases Fall Short
KEY TAKEAWAYS
27,000+ journalist contacts built from scratch
15,525 with verified email addresses (57%)
Zero spent on expensive media databases
Multiple sources combined: placements, Prowly, Ahrefs, bylines
Source 1: Mining Your Own Placement History
Source 2: Competitor Backlink Mining
Source 3: Publication Sitemap Scraping
Source 4: LinkedIn Connections
Source 5: Email Pattern Engineering
The Merge: Deduplication and Scoring
Keeping It Fresh
27,000+
Verified Contacts
$0
Media List Cost
5
Data Sources Used
18,871
Verified journalist contacts in Presslei’s database — built from 3 years of active campaign work

60 days
Maximum age of relevant coverage before a journalist’s recency signal degrades

25–50
Optimal contact count for a single campaign — quality over quantity every time

4–8hrs
Time investment to build a properly researched, verified media list from scratch

Why Commercial Media Databases Fall Short

I am not saying these tools are useless. They are convenient. But they have three fundamental problems:

  1. Everyone has the same contacts. If you and every other agency are pitching the same list from Muckrack, those journalists are drowning in pitches. Your “exclusive” data story lands in an inbox alongside 50 other agencies using the same database.
  2. The data decays fast. Journalists change beats, switch publications, go freelance. Commercial databases update quarterly at best. By the time you pitch, the journalist may have moved on months ago.
  3. They miss the long tail. Freelancers, regional journalists, niche trade press, new hires who have not been indexed yet. Some of our best placements came from journalists who are not in any commercial database.

Building your own database is more work upfront. But the contacts are fresher, more targeted, and exclusively yours.

Pro TipUpdate your journalist database monthly. Remove contacts who’ve changed beats or publications, and add journalists who’ve recently covered topics relevant to your upcoming campaigns. A stale database is worse than no database.

Source 1: Mining Your Own Placement History

If you have done any PR before, start here. Your past placements are a goldmine of journalist contacts.

We went through over 5,200 historical placements and extracted every journalist byline, email pattern, and publication. This gave us roughly 900 journalists who we knew had covered stories similar to ours, because they already had.

The process:

  1. Export all your placement URLs into a spreadsheet
  2. Visit each article and extract the journalist’s name and any contact info
  3. Note what topics they covered and which pitches they responded to
  4. Cross-reference with LinkedIn to find current roles

Yes, this is tedious. We ended up automating most of it with scripts that scrape bylines and author pages. But even doing it manually for your top 100 placements gives you a strong starting list.

The best predictor of a future placement is a past placement. Journalists who covered your kind of story before are significantly more likely to cover it again.

Pro Tip

Personalize every pitch. Reference the journalist most recent article and explain why your story matters to their specific audience.

WarningNever scrape journalist email addresses from websites or social media profiles without verifying them through a legitimate verification service first. Sending pitches to unverified scraped emails results in high bounce rates that damage your sender reputation and can get your domain blacklisted by email providers.

DO

  • Start with Google News searches for recent relevant coverage
  • Record the last 3 relevant articles for each journalist
  • Verify email addresses before adding to your outreach list
  • Include freelancers who write for multiple publications
  • Update your database after every campaign with response data

DON’T

  • Buy pre-built journalist databases without verification
  • Add journalists based on publication prestige alone
  • Include journalists who haven’t covered your topic in 90+ days
  • Store journalist data without a legitimate business purpose
  • Skip the LinkedIn employment verification step

This is one of our most effective methods and I have never seen another agency talk about it publicly.

The logic: if a journalist wrote about your competitor’s data study and linked to them, they will probably be interested in your data study on a similar topic.

Here is the method:

  1. Identify 10 to 15 competitors or similar brands that have earned media coverage
  2. Pull their backlink profiles using Ahrefs, Semrush, or Moz
  3. Filter for editorial links (exclude directories, forums, guest posts)
  4. Extract the journalist bylines from each linking article
  5. Cross-reference against your existing database to find new contacts

When we ran this process across 35 PR agency domains, we found 655 scored new contacts that were not in any commercial media database. These are journalists who are already proven to cover data-driven PR stories.

We take it a step further by scoring each contact based on the domain rating of publications they write for, whether we have an email, whether we have a LinkedIn profile, and their geographic region. The highest-scored contacts go into our priority outreach queue.

Source 3: Publication Sitemap Scraping

Every news website has a sitemap. Most sitemaps contain author URLs. Those author URLs contain journalist names, beats, and sometimes contact information.

We wrote a script that crawls publication sitemaps, extracts author pages, and pulls byline data. Running this across 60 major publications gave us over 700 journalist records, of which about 580 were genuinely new contacts not in our database.

The approach:

  1. Find the publication’s sitemap (usually at /sitemap.xml or /sitemap_index.xml)
  2. Look for author-specific sitemaps or URLs containing /author/
  3. Extract names and any associated metadata
  4. Cross-reference with the publication’s recent articles to identify active beats

This works best with mid-size publications. The massive outlets like BBC have complex sitemap structures, but regional news groups and trade publications are straightforward.

Source 4: LinkedIn Connections

If you have been networking in your industry, your LinkedIn connections are an untapped source of journalist contacts.

We exported all 8,100+ LinkedIn connections, filtered for media professionals (editors, journalists, reporters, correspondents), and found 580 media contacts we had never messaged.

The advantage: these are people who already accepted a connection request. There is a baseline relationship. A LinkedIn DM from a connection gets read far more reliably than a cold email.

We now maintain a dashboard that tracks which media connections have been contacted, which responded, and which should be avoided (because they explicitly asked not to be pitched). That kind of tracking prevents embarrassing double-pitches.

Key Takeaway

The best pitches answer one question: why should this journalist readers care about this right now?

Source 5: Email Pattern Engineering

Here is where it gets technical. Once you have journalist names and publications, you still need email addresses. And most journalists do not list their email publicly.

But email addresses follow patterns. Most publications use one of a few formats:

PatternExample
firstname.lastname@john.smith@publication.com
firstname@john@publication.com
firstinitial.lastname@j.smith@publication.com
firstnamelastname@johnsmith@publication.com

We built a pattern map covering 265 publication domains with their specific email formats. When we add a new journalist from The Telegraph or Express or Metro, we can generate a likely email address instantly.

How we built the pattern map:

  1. Start with journalists whose emails we already know (from previous correspondence, public bios, etc.)
  2. Extract the pattern for that publication domain
  3. Apply it to other journalists at the same publication
  4. Verify using email validation tools

This alone took our email coverage from roughly 25% to over 42% of our database. For a free, DIY approach, it is remarkably effective.

The Merge: Deduplication and Scoring

The hardest part of building a multi-source database is merging everything without creating duplicates. A journalist might appear in your placement history, your competitor backlinks, AND your LinkedIn connections under slightly different name spellings.

Our deduplication process:

  1. Normalize names (lowercase, remove middle initials, handle hyphenated surnames)
  2. Match on name + publication domain first
  3. Then match on email address for cross-publication matches
  4. Manual review for fuzzy matches (similar names at the same publication)

After merging 17 different sources, our unified database settled at 5,909 unique journalist records. About 42% have verified email addresses. About 26% have LinkedIn profiles. And we tag every contact with their source, beat, engagement history, and a priority tier.

Keeping It Fresh

A database is only as good as its last update. We have a few systems for keeping ours current:

  • Bounce tracking: Every email that bounces gets flagged. If a journalist’s email bounces, we check if they changed publications.
  • Response tagging: Every response (positive, negative, or redirect) gets logged. This builds a picture of each journalist’s preferences over time.
  • Quarterly re-scraping: We re-run our sitemap and backlink scripts every quarter to catch new journalists and publication changes.
  • LinkedIn monitoring: Job change notifications for key contacts alert us when someone switches publications.

Do You Need 27,000+ Contacts?

No. For most campaigns, you are pitching 50 to 80 journalists. But having a large database means you can be selective. Instead of pitching everyone who might be relevant, you pitch the 50 people who are most likely to respond, based on their beat, their publication’s authority, and their past engagement with similar stories.

Start small. Even 200 well-researched, correctly-targeted contacts will outperform a 10,000-name list from a commercial database. The value is not in the volume. It is in the accuracy and the freshness.

Want access to our journalist network for your campaign? When you work with Presslei, you get the benefit of 27,000+ contacts built over months of research and real outreach. Get in touch.

Frequently Asked Questions

How do you build a journalist database without paying for one?

Start with three free sources: scrape article bylines from publication sitemaps to find active journalists, export your LinkedIn connections to identify media professionals, and mine competitor backlinks to find journalists who cover your industry. Then enrich with email patterns based on each publication’s format.

Are paid media databases worth it?

For most small agencies and in-house teams, no. Paid databases like Cision or Muckrack charge $5,000 to $15,000 per year and often contain outdated contacts. A self-built database using public data is more current, more targeted, and free.

How do you find journalist email addresses?

Most publications follow predictable email patterns like firstname.lastname@publication.com. Once you identify the pattern for a publication, you can generate emails for any journalist there. Verify them using free tools before sending.

SJ

Salvador Jovells

Founder of Presslei, a reactive digital PR agency based in Zurich. Previously led marketing for two ecommerce brands where he discovered that data-driven reactive PR outperforms traditional approaches by every metric. Connect on LinkedIn.

Salva Jovells

About the Author

Salva Jovells

Founder of Presslei. 12+ years in ecommerce SEO across international markets. After a decade of link buying for Hockerty and Sumissura, I reverse-engineered 5,272 earned media placements and founded a reactive PR agency that builds authority through data-driven stories journalists actually want to publish. Based in Zurich.

Related Reading

Ready to earn links instead of buying them?

Get 8–14 editorial placements in top-tier publications. No contracts. No risk. Just results.

Book a Free Strategy Call

$3,000 per campaign · 8–14 links guaranteed · Zero penalty risk

Share this article:
𝕏
in



Founder of Presslei. 12+ years in ecommerce SEO across international markets. After a decade of link buying for Hockerty and Sumissura, I reverse-engineered 5,272 earned media placements and founded a reactive PR agency that builds authority through data-driven stories journalists actually want to publish. Based in Zurich.