Microsoft 365 Transformer PDF en Txt

FCMLE44

XLDnaute Impliqué
Supporter XLD
Bonjour

J ai un répertoire dans lequel se trouve des fichiers PDF à mettre en txt
J'ai créé un fichier de commande .Bat avec les données ci dessous
FOR /R "D:\Test Prévoyance\PREVOYANCE\" %%i IN (*.pdf) do (D:\Test Prévoyance\Pdf2Text\Pdf2Text.exe "%%i" "D:\Test Prévoyance\PREVOYANCE\%%~ni.txt")

J'ai créé une petite macro pour le lancer
VB:
Sub PARAMS()

Dim Fichier As String

        Fichier = ThisWorkbook.Sheets("PARAMS").Cells(2, 2).Value

        Shell "cmd.exe /k cd " & Fichier & "&&PREVOYANCE.bat"

        MsgBox "Fichiers Textes créés"

End Sub


Cela ne crée pas mes fichiers en txt
Quelqu'un aurait il une idée ?

Merci
 

fanch55

XLDnaute Barbatruc
Avec la version actuelle de pdf2txt et vu l'emploi que vous en faites,
il n'y a pas besoin de passer par excel ou un fichier bat .
Il suffit de presser les touches Win + R et d'entrer la commande ci-dessous
Code:
D:\Test Prévoyance\Pdf2Text\pdf2text -o D:\Test Prévoyance\Pdf2Text  D:\Test Prévoyance\Pdf2Text\*.pdf

ou vous mettez la commande dans votre shell ...
 

FCMLE44

XLDnaute Impliqué
Supporter XLD
Avec la version actuelle de pdf2txt et vu l'emploi que vous en faites,
il n'y a pas besoin de passer par excel ou un fichier bat .
Il suffit de presser les touches Win + R et d'entrer la commande ci-dessous
Code:
D:\Test Prévoyance\Pdf2Text\pdf2text -o D:\Test Prévoyance\Pdf2Text  D:\Test Prévoyance\Pdf2Text\*.pdf

ou vous mettez la commande dans votre shell ...
A quoi correspond le petit -o
 

fanch55

XLDnaute Barbatruc
D:\Test Prévoyance\Pdf2Text\pdf2text -o D:\Test Prévoyance\Pdf2Text D:\Test Prévoyance\Pdf2Text\*.pdf
<------- - programme appelé --------><-- dossier où mettre les txt --><------- fichiers à convertir ------->

PDFTron PDF2Text V9.3080104.
Copyright (c) 2001-2022 PDFTron Systems Inc., www.pdftron.com.

You are running a DEMO version of PDF2Text.
In the demo version, random words or pages will be replaced with the <DEMO> string.

Usage: pdf2text [<options>] file...

OPTIONS:

--file... arg A list of folders and/or file names to process.

-o [ --output ] arg The folder used to store output files. By
default, the output will be displayed on
screen.

-a [ --pages ] arg (=-) Specifies the list of pages to convert. By
default, all pages are converted.

-e [ --encoding ] arg (=UTF8) Output text encoding:
UTF8
UTF16
The default output encoding is UTF8.

-f [ --format ] arg (=plain) Output text formating:
plain
wordlist
textruns
xml
The default output format is 'plain' text.

--noligatures Disables expanding of ligatures using a
predefined mapping. Default ligatures are: fi,
ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st,
oe, OE.

--nodehyphen Disables finding and removing hyphens that
split words across two lines. Hyphens are often
used a the end of lines as an indicator that a
word spans two lines. Hyphen detection enables
removal of hyphen character and merging of text
runs to form a single word. This option has no
effect on Tagged PDF files.

--no_dup_remove Disables removing duplicated text that is
frequently used to achieve visual effects of
drop shadow and fake bold.

--punct_break Treat punctuation (e.g. full stop, comma,
semicolon, etc.) as word break characters.

--remove_hidden_text Enables removal of text that is obscured by
images or rectangles. Since this option has
small performance penalty on performance of
text extraction, by default it is not enabled.

--no_invisible_text Enables removing text that uses rendering mode
3 (i.e. invisible text). Invisible text is
usually used in 'PDF Searchable Images' (i.e.
scanned pages with a corresponding OCR text).
As a result, invisible text will be extracted
by default.

--use_z_order Use Z-order as reading order for text

--output_bbox Include bounding box information for each text
element. If the output format is 'XML' the
bounding box information will be stored in
'bbox' attribute. If the output format is
'wordlist' the coordinates of the bounding box
will precede the word.

--xml_words_as_elements Output words as XML elements instead of inline
text.

--xml_output_styles Include font and styling information.

--json_zones Load zoning information from JSON file

--wordcount Get the number of words on each page.

--charcount Get total number of characters on each page.

--pageinfo Get the width, height, media box, crop box, and
page rotation for every page.

--prefix arg The prefix for output text files. The output
filename will be constructed by appending the
prefix string, the page number, and the
appropriate file extension (e.g. myprefix1.txt,
myprefix2.xml, etc). The prefix option should
be used only for processing of individual
documents. By default, PDF filename will be
used as a prefix.

--digits arg The number of digits used in the page counter
portion of the output filename. By default, new
digits are added as needed; however this
parameter could be used to format the page
counter field to a uniform width (e.g.
myfile0001.txt, myfile0002.txt, etc).

--subfolders Process all sub-directory for every directory
specified in the argument list. By default,
sub-directories are not processed.

-c [ --clip ] arg User definable clip box. The default clip
region is crop box of the page.

--noprompt Disables any user input. By default, the
application will ask for a valid password if
the password is incorrect.

-p [ --pass ] arg The password for secured PDF files. Not
required if the input document is not secured
using the 'open' password.

--extension arg (=.pdf) The default file extension used to process PDF
documents. The default extension is ".pdf".

--verb arg (=1) Set the opt.m_verbosity level to 'arg' (0-2).

-v [ --version ] Print the version information.

-h [ --help ] Print a listing of available options.


--lic_key arg PDFTron SDK license key. License keys can be passed
using this option or in a separate .lic file.


Examples:
pdf2text my.pdf
pdf2text -o test_out/ex1 test/my.pdf
pdf2text --wordcount my.pdf
pdf2text -o test_out -a 1 -f xml --output_bbox my.pdf
 

FCMLE44

XLDnaute Impliqué
Supporter XLD
D:\Test Prévoyance\Pdf2Text\pdf2text -o D:\Test Prévoyance\Pdf2Text D:\Test Prévoyance\Pdf2Text\*.pdf
<------- - programme appelé --------><-- dossier où mettre les txt --><------- fichiers à convertir ------->

PDFTron PDF2Text V9.3080104.
Copyright (c) 2001-2022 PDFTron Systems Inc., www.pdftron.com.

You are running a DEMO version of PDF2Text.
In the demo version, random words or pages will be replaced with the <DEMO> string.

Usage: pdf2text [<options>] file...

OPTIONS:

--file... arg A list of folders and/or file names to process.

-o [ --output ] arg The folder used to store output files. By
default, the output will be displayed on
screen.

-a [ --pages ] arg (=-) Specifies the list of pages to convert. By
default, all pages are converted.

-e [ --encoding ] arg (=UTF8) Output text encoding:
UTF8
UTF16
The default output encoding is UTF8.

-f [ --format ] arg (=plain) Output text formating:
plain
wordlist
textruns
xml
The default output format is 'plain' text.

--noligatures Disables expanding of ligatures using a
predefined mapping. Default ligatures are: fi,
ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st,
oe, OE.

--nodehyphen Disables finding and removing hyphens that
split words across two lines. Hyphens are often
used a the end of lines as an indicator that a
word spans two lines. Hyphen detection enables
removal of hyphen character and merging of text
runs to form a single word. This option has no
effect on Tagged PDF files.

--no_dup_remove Disables removing duplicated text that is
frequently used to achieve visual effects of
drop shadow and fake bold.

--punct_break Treat punctuation (e.g. full stop, comma,
semicolon, etc.) as word break characters.

--remove_hidden_text Enables removal of text that is obscured by
images or rectangles. Since this option has
small performance penalty on performance of
text extraction, by default it is not enabled.

--no_invisible_text Enables removing text that uses rendering mode
3 (i.e. invisible text). Invisible text is
usually used in 'PDF Searchable Images' (i.e.
scanned pages with a corresponding OCR text).
As a result, invisible text will be extracted
by default.

--use_z_order Use Z-order as reading order for text

--output_bbox Include bounding box information for each text
element. If the output format is 'XML' the
bounding box information will be stored in
'bbox' attribute. If the output format is
'wordlist' the coordinates of the bounding box
will precede the word.

--xml_words_as_elements Output words as XML elements instead of inline
text.

--xml_output_styles Include font and styling information.

--json_zones Load zoning information from JSON file

--wordcount Get the number of words on each page.

--charcount Get total number of characters on each page.

--pageinfo Get the width, height, media box, crop box, and
page rotation for every page.

--prefix arg The prefix for output text files. The output
filename will be constructed by appending the
prefix string, the page number, and the
appropriate file extension (e.g. myprefix1.txt,
myprefix2.xml, etc). The prefix option should
be used only for processing of individual
documents. By default, PDF filename will be
used as a prefix.

--digits arg The number of digits used in the page counter
portion of the output filename. By default, new
digits are added as needed; however this
parameter could be used to format the page
counter field to a uniform width (e.g.
myfile0001.txt, myfile0002.txt, etc).

--subfolders Process all sub-directory for every directory
specified in the argument list. By default,
sub-directories are not processed.

-c [ --clip ] arg User definable clip box. The default clip
region is crop box of the page.

--noprompt Disables any user input. By default, the
application will ask for a valid password if
the password is incorrect.

-p [ --pass ] arg The password for secured PDF files. Not
required if the input document is not secured
using the 'open' password.

--extension arg (=.pdf) The default file extension used to process PDF
documents. The default extension is ".pdf".

--verb arg (=1) Set the opt.m_verbosity level to 'arg' (0-2).

-v [ --version ] Print the version information.

-h [ --help ] Print a listing of available options.


--lic_key arg PDFTron SDK license key. License keys can be passed
using this option or in a separate .lic file.


Examples:
pdf2text my.pdf
pdf2text -o test_out/ex1 test/my.pdf
pdf2text --wordcount my.pdf
pdf2text -o test_out -a 1 -f xml --output_bbox my.pdf
Lorsque je mets ca ca fait pareil
C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\Pdf2Text\pdf2text -o C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE\*.pdf

Les fichiers PDF sont bien dans PREVOYANCE et les TXT doivent y etre aussi
 

fanch55

XLDnaute Barbatruc
Pb avec les espaces dans les noms ...
Fichier bat à créer ou enclore les fichiers/dossiers de la ligne de commande entre des doubles guillemets
Code:
Set Pgm="C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\Pdf2Text\pdf2text.exe"
Set Tgt="C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE"
%Pgm% -o %Tgt% %Tgt%\*pdf

L'exécuter via cmd ( noms sur mon pc pour faire voir que ça marche ... ):
1660736999505.png
 
Dernière édition:

TooFatBoy

XLDnaute Barbatruc
Lorsque je mets ca ca fait pareil
C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\Pdf2Text\pdf2text -o C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE C:\Users\Frede\OneDrive\Bureau\Test Prévoyance\PREVOYANCE\*.pdf
Comme l'a dit fanch55 le problème vient des espaces dans les chemins ou les noms des fichiers.

D'ailleurs, je n'avais pas vu, mais dans ton message originel, tu as oublié les guillemets pour un des trois chemins :
FOR /R "D:\Test Prévoyance\PREVOYANCE\" %%i IN (*.pdf) do (D:\Test Prévoyance\Pdf2Text\Pdf2Text.exe "%%i" "D:\Test Prévoyance\PREVOYANCE\%%~ni.txt")
 

FCMLE44

XLDnaute Impliqué
Supporter XLD
Comme l'a dit fanch55 le problème vient des espaces dans les chemins ou les noms des fichiers.

D'ailleurs, je n'avais pas vu, mais dans ton message originel, tu as oublié les guillemets pour un des trois chemins :
FOR /R "D:\Test Prévoyance\PREVOYANCE\" %%i IN (*.pdf) do (D:\Test Prévoyance\Pdf2Text\Pdf2Text.exe "%%i" "D:\Test Prévoyance\PREVOYANCE\%%~ni.txt")
Merci
même avec les guillemets ca ne fonctionne pas jene comprends pas
 

Discussions similaires

Réponses
19
Affichages
2 K

Statistiques des forums

Discussions
312 217
Messages
2 086 353
Membres
103 196
dernier inscrit
N-TR86