Textsearch Overlapping PDF Pages

Posted 4 days ago by Michael. Me

Post a topic
Un Answered
M
Michael. Me

Hello,

I just realized an issue when applying text search analysis for multiple words within a segment or sentence. Specifically, searching for words like "water" and "climate" within the same sentence or segment does not yield results if the sentence/segment spans across multiple pages in a PDF file. It appears that the search function does not work correctly for text fragments that extend beyond a single page. 

This is a significant issue for me because I am analyzing large text corpora, and in many cases, sentences do not end at the bottom of a page. As a result, important keyword combinations might be overlooked, affecting the accuracy of the analysis.

Is there any workaround or solution to handle this scenario effectively?


Thanks in advance for your help!


PS: Someone got the same problem in 2021, but the page isnt available anymore (https://www.maxqda.com/support/forum/viewtopic.php?f=10&t=1580

0 Votes


3 Comments

Sorted by
Khaled Alostath

Khaled Alostath posted about 10 hours ago Admin

Hi Michael, 


To convert PDF documents to plain text in MAXQDA, follow these steps:

1. Import the PDF document into your MAXQDA project.

2. Select the PDF document(s) in the "Document System."

3. Use the "Insert PDF Text as New Document" function. This will extract the text and save it as a new text document, ignoring images and formatting.


Regarding the search issue, MAXQDA treats paragraphs in PDFs as ending at page boundaries, which can affect searches for keyword combinations across pages. This is a limitation of paragraph recognition in PDF documents.


Feel free to reach out to us if you have any questions.


Kind regards,
Khaled

0 Votes

M

Michael. Me posted about 13 hours ago

Dear Khaled,


could you explain how to convert PDF documents to plain text within MAXQDA? 

I am encountering an issue where the "within X sentences or paragraphs" search function does not work as expected. It seems that the search stops at the end of each PDF page, preventing it from detecting keyword combinations that span across pages. 


Thank you for your support! 

Michael

0 Votes

Khaled Alostath

Khaled Alostath posted about 13 hours ago Admin

Dear Michael, 


When dealing with PDF files, MAXQDA's text search function may face limitations if sentences span across multiple pages. Here are some suggestions to address this issue: 

1. Consider converting your PDF documents to plain text format within MAXQDA. This can help ensure that sentences are not split across pages, allowing for more accurate text searches.

2. Use the "Within x sentences or paragraphs" option to broaden the search range, which might help capture keyword combinations that span across pages.

3. For critical analyses, manually review sections where important keyword combinations might occur to ensure nothing is overlooked. 


Let me know if you have any questions.


Kind regards,
Khaled

0 Votes

Login or Sign up to post a comment