How to Find Journalists to Pitch

Q: How do you build a journalist database without paying for one?

Start with three free sources: scrape article bylines from publication sitemaps to find active journalists, export your LinkedIn connections to identify media professionals, and mine competitor backlinks to find journalists who cover your industry. Then enrich with email patterns based on each publication’s format.

Q: Are paid media databases worth it?

For most small agencies and in-house teams, no. Paid databases like Cision or Muckrack charge $5,000 to $15,000 per year and often contain outdated contacts. A self-built database using public data is more current, more targeted, and free.

DIY Media Database

Built from scratch without paying for a media list

⌚ 9 min read · 2,000 words

27K+
journalist contacts

Most PR agencies pay $500 to $1,000 a month for media databases like Prowly, Muckrack, or Cision. We did too, briefly. Then I realized something: those databases are incomplete, outdated, and everyone else has the same list.

Key TakeawayA database of 500 highly-targeted journalists with recency data is worth more than a purchased list of 50,000 generic media contacts. Quality and recency beat volume every time.

18,871

Verified journalist contacts in Presslei’s database — each with beat classification and recency data from real campaigns

“A journalist database isn’t a spreadsheet of email addresses. It’s a living system of relationships, recency signals, and coverage patterns.”
— Salva Jovells, Presslei

In This Article

Why Commercial Media Databases Fall Short
Source 1: Mining Your Own Placement History
Source 2: Competitor Backlink Mining
Source 3: Publication Sitemap Scraping
Source 4: LinkedIn Connections
Source 5: Email Pattern Engineering
The Merge: Deduplication and Scoring
Keeping It Fresh
Do You Need 27,000+ Contacts?

Key Takeaway

We built a database of 27,000+ journalists without buying a single media list. Using free tools, public data from sitemaps and bylines, LinkedIn exports, and smart automation, any PR team can build a better contact list than what paid services offer.

So I built our own. From scratch. It now has 27,000+ journalists with beat information, contact details, and engagement history. And I am going to show you exactly how we did it.

IN THIS ARTICLE

→ Why Commercial Media Databases Fall Short

KEY TAKEAWAYS

• 27,000+ journalist contacts built from scratch

• 15,525 with verified email addresses (57%)

• Zero spent on expensive media databases

• Multiple sources combined: placements, Prowly, Ahrefs, bylines

→ Source 1: Mining Your Own Placement History

→ Source 2: Competitor Backlink Mining

→ Source 3: Publication Sitemap Scraping

→ Source 4: LinkedIn Connections

→ Source 5: Email Pattern Engineering

→ The Merge: Deduplication and Scoring

→ Keeping It Fresh

27,000+

Verified Contacts

Media List Cost

Data Sources Used

18,871

Verified journalist contacts in Presslei’s database — built from 3 years of active campaign work

60 days

Maximum age of relevant coverage before a journalist’s recency signal degrades

25–50

Optimal contact count for a single campaign — quality over quantity every time

4–8hrs

Time investment to build a properly researched, verified media list from scratch

Why Commercial Media Databases Fall Short

I am not saying these tools are useless. They are convenient. But they have three fundamental problems:

Everyone has the same contacts. If you and every other agency are pitching the same list from Muckrack, those journalists are drowning in pitches. Your “exclusive” data story lands in an inbox alongside 50 other agencies using the same database.
The data decays fast. Journalists change beats, switch publications, go freelance. Commercial databases update quarterly at best. By the time you pitch, the journalist may have moved on months ago.
They miss the long tail. Freelancers, regional journalists, niche trade press, new hires who have not been indexed yet. Some of our best placements came from journalists who are not in any commercial database.

Building your own database is more work upfront. But the contacts are fresher, more targeted, and exclusively yours.

Pro TipUpdate your journalist database monthly. Remove contacts who’ve changed beats or publications, and add journalists who’ve recently covered topics relevant to your upcoming campaigns. A stale database is worse than no database.

Source 1: Mining Your Own Placement History

If you have done any PR before, start here. Your past placements are a goldmine of journalist contacts.

We went through over 5,200 historical placements and extracted every journalist byline, email pattern, and publication. This gave us roughly 900 journalists who we knew had covered stories similar to ours, because they already had.

The process:

Export all your placement URLs into a spreadsheet
Visit each article and extract the journalist’s name and any contact info
Note what topics they covered and which pitches they responded to
Cross-reference with LinkedIn to find current roles

Yes, this is tedious. We ended up automating most of it with scripts that scrape bylines and author pages. But even doing it manually for your top 100 placements gives you a strong starting list.

The best predictor of a future placement is a past placement. Journalists who covered your kind of story before are significantly more likely to cover it again.

Pro Tip

Personalize every pitch. Reference the journalist most recent article and explain why your story matters to their specific audience.

Source 2: Competitor Backlink Mining

WarningNever scrape journalist email addresses from websites or social media profiles without verifying them through a legitimate verification service first. Sending pitches to unverified scraped emails results in high bounce rates that damage your sender reputation and can get your domain blacklisted by email providers.

Start with Google News searches for recent relevant coverage
Record the last 3 relevant articles for each journalist
Verify email addresses before adding to your outreach list
Include freelancers who write for multiple publications
Update your database after every campaign with response data

DON’T

Buy pre-built journalist databases without verification
Add journalists based on publication prestige alone
Include journalists who haven’t covered your topic in 90+ days
Store journalist data without a legitimate business purpose
Skip the LinkedIn employment verification step

Source 3: Publication Sitemap Scraping

Every news website has a sitemap. Most sitemaps contain author URLs. Those author URLs contain journalist names, beats, and sometimes contact information.

We wrote a script that crawls publication sitemaps, extracts author pages, and pulls byline data. Running this across 60 major publications gave us over 700 journalist records, of which about 580 were genuinely new contacts not in our database.

The approach:

Find the publication’s sitemap (usually at /sitemap.xml or /sitemap_index.xml)
Look for author-specific sitemaps or URLs containing /author/
Extract names and any associated metadata
Cross-reference with the publication’s recent articles to identify active beats

This works best with mid-size publications. The massive outlets like BBC have complex sitemap structures, but regional news groups and trade publications are straightforward.

Source 4: LinkedIn Connections

If you have been networking in your industry, your LinkedIn connections are an untapped source of journalist contacts.

We exported all 8,100+ LinkedIn connections, filtered for media professionals (editors, journalists, reporters, correspondents), and found 580 media contacts we had never messaged.

The advantage: these are people who already accepted a connection request. There is a baseline relationship. A LinkedIn DM from a connection gets read far more reliably than a cold email.

We now maintain a dashboard that tracks which media connections have been contacted, which responded, and which should be avoided (because they explicitly asked not to be pitched). That kind of tracking prevents embarrassing double-pitches.

Key Takeaway

The best pitches answer one question: why should this journalist readers care about this right now?

Source 5: Email Pattern Engineering

Here is where it gets technical. Once you have journalist names and publications, you still need email addresses. And most journalists do not list their email publicly.

But email addresses follow patterns. Most publications use one of a few formats:

Pattern	Example
firstname.lastname@	john.smith@publication.com
firstname@	john@publication.com
firstinitial.lastname@	j.smith@publication.com
firstnamelastname@	johnsmith@publication.com

We built a pattern map covering 265 publication domains with their specific email formats. When we add a new journalist from The Telegraph or Express or Metro, we can generate a likely email address instantly.

How we built the pattern map:

Start with journalists whose emails we already know (from previous correspondence, public bios, etc.)
Extract the pattern for that publication domain
Apply it to other journalists at the same publication
Verify using email validation tools

This alone took our email coverage from roughly 25% to over 42% of our database. For a free, DIY approach, it is remarkably effective.

The Merge: Deduplication and Scoring

The hardest part of building a multi-source database is merging everything without creating duplicates. A journalist might appear in your placement history, your competitor backlinks, AND your LinkedIn connections under slightly different name spellings.

Our deduplication process:

Normalize names (lowercase, remove middle initials, handle hyphenated surnames)
Match on name + publication domain first
Then match on email address for cross-publication matches
Manual review for fuzzy matches (similar names at the same publication)

After merging 17 different sources, our unified database settled at 5,909 unique journalist records. About 42% have verified email addresses. About 26% have LinkedIn profiles. And we tag every contact with their source, beat, engagement history, and a priority tier.

Keeping It Fresh

A database is only as good as its last update. We have a few systems for keeping ours current:

Bounce tracking: Every email that bounces gets flagged. If a journalist’s email bounces, we check if they changed publications.
Response tagging: Every response (positive, negative, or redirect) gets logged. This builds a picture of each journalist’s preferences over time.
Quarterly re-scraping: We re-run our sitemap and backlink scripts every quarter to catch new journalists and publication changes.
LinkedIn monitoring: Job change notifications for key contacts alert us when someone switches publications.

Do You Need 27,000+ Contacts?

No. For most campaigns, you are pitching 50 to 80 journalists. But having a large database means you can be selective. Instead of pitching everyone who might be relevant, you pitch the 50 people who are most likely to respond, based on their beat, their publication’s authority, and their past engagement with similar stories.

Start small. Even 200 well-researched, correctly-targeted contacts will outperform a 10,000-name list from a commercial database. The value is not in the volume. It is in the accuracy and the freshness.

Want access to our journalist network for your campaign? When you work with Presslei, you get the benefit of 27,000+ contacts built over months of research and real outreach. Get in touch.

Frequently Asked Questions

How do you build a journalist database without paying for one?

Start with three free sources: scrape article bylines from publication sitemaps to find active journalists, export your LinkedIn connections to identify media professionals, and mine competitor backlinks to find journalists who cover your industry. Then enrich with email patterns based on each publication’s format.

Are paid media databases worth it?

For most small agencies and in-house teams, no. Paid databases like Cision or Muckrack charge $5,000 to $15,000 per year and often contain outdated contacts. A self-built database using public data is more current, more targeted, and free.

How do you find journalist email addresses?

Most publications follow predictable email patterns like firstname.lastname@publication.com. Once you identify the pattern for a publication, you can generate emails for any journalist there. Verify them using free tools before sending.