Skip to content

Commit 67a512c

Browse files
authored
Add files via upload
0 parents  commit 67a512c

File tree

14 files changed

+1976
-0
lines changed

14 files changed

+1976
-0
lines changed

License

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Copyright 2021 Hubert Tournier
2+
3+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
4+
5+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
6+
7+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
8+
9+
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
10+
11+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Makefile

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
NAME=unicode2ascii
2+
SOURCES=src/${NAME}/__init__.py src/${NAME}/main.py
3+
4+
# Default action is to show this help message:
5+
.help:
6+
@echo "Possible targets:"
7+
@echo " check-code Verify PEP 8 compliance (lint)"
8+
@echo " check-security Verify security issues (audit)"
9+
@echo " check-unused Find unused code"
10+
@echo " check-version Find required Python version"
11+
@echo " check-sloc Count Single Lines of Code"
12+
@echo " checks Make all the previous tests"
13+
@echo " format Format code"
14+
@echo " package Build package"
15+
@echo " upload-test Upload the package to TestPyPi"
16+
@echo " upload Upload the package to PyPi"
17+
@echo " distclean Remove all generated files"
18+
19+
check-code: /usr/local/bin/pylint
20+
-pylint ${SOURCES}
21+
22+
lint: check-code
23+
24+
check-security: /usr/local/bin/bandit
25+
-bandit -r ${SOURCES}
26+
27+
audit: check-security
28+
29+
check-unused: /usr/local/bin/vulture
30+
-vulture --sort-by-size ${SOURCES}
31+
32+
check-version: /usr/local/bin/vermin
33+
-vermin ${SOURCES}
34+
35+
check-sloc: /usr/local/bin/pygount
36+
-pygount --format=summary .
37+
38+
checks: check-code check-security check-unused check-version check-sloc
39+
40+
format: /usr/local/bin/black
41+
black ${SOURCES}
42+
43+
love:
44+
@echo "Not war!"
45+
46+
man/${NAME}.1.gz: man/${NAME}.1
47+
@gzip -k9c man/${NAME}.1 > man/${NAME}.1.gz
48+
49+
man/${NAME}.3.gz: man/${NAME}.3
50+
@gzip -k9c man/${NAME}.3 > man/${NAME}.3.gz
51+
52+
package: man/${NAME}.1.gz man/${NAME}.3.gz
53+
python -m build
54+
55+
upload-test:
56+
python -m twine upload --repository testpypi dist/*
57+
58+
upload:
59+
python -m twine upload dist/*
60+
61+
distclean:
62+
rm -rf build dist man/${NAME}.1.gz man/${NAME}.3.gz src/*.egg-info

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Installation
2+
pip install [pnu-unicode2ascii](https://pypi.org/project/pnu-unicode2ascii/)
3+
4+
# UNICODE2ASCII(1), UNICODE2ASCII(3)
5+
This repository includes a command-line utility:
6+
* [unicode2ascii(1)](https://github.com/HubTou/unicode2ascii/blob/main/UNICODE2ASCII.1.md) - Unicode to Ascii command-line tool
7+
8+
And a Python library:
9+
* [unicode2ascii(3)](https://github.com/HubTou/unicode2ascii/blob/main/UNICODE2ASCII.3.md) - Unicode to Ascii Python library
10+

UNICODE2ASCII.1.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# UNICODE2ASCII(1)
2+
3+
## NAME
4+
unicode2ascii - Unicode to Ascii command-line tool
5+
6+
## SYNOPSIS
7+
**unicode2ascii**
8+
\[-a|--analyze\]
9+
\[-t|--translated\]
10+
\[-u|--untranslated\]
11+
\[--debug\]
12+
\[--help|-?\]
13+
\[--version\]
14+
\[--\]
15+
16+
## DESCRIPTION
17+
The **unicode2ascii** utility filters Unicode characters in the standard input and replace them by their standard or corrected Unicode equivalent,
18+
or its internal equivalent if missing in Unicode.
19+
20+
Command line options disable filtering mode and analyze Unicode characters (**-a** option)
21+
or report (un)translated Unicode characters (**-t** and **-u** options) in a format suitable for updating the source code.
22+
23+
### OPTIONS
24+
Options | Use
25+
------- | ---
26+
-a\|--analyze|Analyze Unicode characters. Disable filter mode
27+
-t\|--translated|Report translated Unicode characters. Disable filter mode
28+
-u\|--untranslated|Report untranslated Unicode characters. Disable filter mode
29+
--debug|Enable debug mode
30+
--help\|-?|Print usage and a short help message and exit
31+
--version|Print version and exit
32+
--|Options processing terminator
33+
34+
## ENVIRONMENT
35+
The UNICODE2ASCII_DEBUG environment variable can also be set to any value to enable debug mode.
36+
37+
## FILES
38+
39+
## EXIT STATUS
40+
The **unicode2ascii** utility exits 0 on success, and >0 if an error occurs.
41+
42+
## SEE ALSO
43+
[unicode2ascii(3)](https://github.com/HubTou/unicode2ascii/blob/main/UNICODE2ASCII.3.md)
44+
[iconv(1)](https://www.freebsd.org/cgi/man.cgi?query=iconv)
45+
46+
## STANDARDS
47+
The **unicode2ascii** utility is not a standard UNIX/POSIX command.
48+
49+
It tries to follow the [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guide for [Python](https://www.python.org/) code.
50+
51+
## HISTORY
52+
This utility was made for [The PNU project](https://github.com/HubTou/PNU).
53+
54+
## LICENSE
55+
This utility is available under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause).
56+
57+
## AUTHORS
58+
[Hubert Tournier](https://github.com/HubTou)
59+
60+
## CAVEATS
61+
So far, only the following Unicode character sets are processed for missing ASCII equivalents:
62+
* C0 control characters
63+
* C1 control characters
64+
* Basic Latin characters
65+
* Latin-1 Supplement
66+
* Latin Extended-A
67+
* Latin Extended-B
68+
* Latin Extended Additional
69+
* IPA Extensions
70+
* Spacing Modifier Letters
71+
* Unicode symbols
72+
* General Punctuation
73+
* Number Forms
74+

UNICODE2ASCII.3.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# UNICODE2ASCII(3)
2+
3+
## NAME
4+
unicode2ascii - Unicode to Ascii Python library
5+
6+
## SYNOPSIS
7+
**import unicode2ascii**
8+
9+
*Boolean* unicode2ascii.**is_unicode_category**(String *character*, String *category*)
10+
11+
*Boolean* unicode2ascii.**is_unicode_letter**(String *character*)
12+
13+
*Boolean* unicode2ascii.**is_unicode_mark**(String *character*)
14+
15+
*Boolean* unicode2ascii.**is_unicode_number**(String *character*)
16+
17+
*Boolean* unicode2ascii.**is_unicode_punctuation**(String *character*)
18+
19+
*Boolean* unicode2ascii.**is_unicode_symbol**(String *character*)
20+
21+
*Boolean* unicode2ascii.**is_unicode_separator**(String *character*)
22+
23+
*Boolean* unicode2ascii.**is_unicode_other**(String *character*)
24+
25+
*String* unicode2ascii.**unicode_category**(String *category*)
26+
27+
*String* unicode2ascii.**unicode_to_ascii_character**(String *character*, [String *default* = ''])
28+
29+
*String* unicode2ascii.**unicode_to_ascii_string**(String *string*, [String *default* = ''])
30+
31+
unicode2ascii.**analyze_unicode_character**(String *character*)
32+
33+
## DESCRIPTION
34+
The **is_unicode_category**() function returns True if *character* belongs to the *category* Unicode category or False if not.
35+
36+
All the other **is_unicode_XXX**() functions return True if *character* belongs to the XXX category.
37+
38+
The **unicode_category**() function return a one-line description of the specified *category*.
39+
40+
The **unicode_to_ascii_character**() function returns the ASCII equivalent of an unicode *character*, or an unchanged non-Unicode character.
41+
If there is no ASCII equivalent, it returns the *default* string ("" if not provided).
42+
43+
The **unicode_to_ascii_string**() does the same for all the characters in the *string*.
44+
45+
The **analyze_unicode_character**() function returns all available information about the Unicode *character*.
46+
47+
## ENVIRONMENT
48+
The UNICODE2ASCII_DEBUG environment variable can be set to any value to enable debug mode.
49+
50+
## SEE ALSO
51+
[unicode2ascii(1)](https://github.com/HubTou/unicode2ascii/blob/main/UNICODE2ASCII.1.md)
52+
[iconv(3)](https://www.freebsd.org/cgi/man.cgi?query=iconv&sektion=3)
53+
54+
## STANDARDS
55+
The **unicode2ascii** library tries to follow the PEP 8 style guide for Python code.
56+
57+
## HISTORY
58+
This library was made for [The PNU project](https://github.com/HubTou/PNU).
59+
60+
## LICENSE
61+
This library is available under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause).
62+
63+
## AUTHORS
64+
[Hubert Tournier](https://github.com/HubTou)
65+
66+
## CAVEATS
67+
So far, only the following Unicode character sets are processed for missing ASCII equivalents:
68+
* C0 control characters
69+
* C1 control characters
70+
* Basic Latin characters
71+
* Latin-1 Supplement
72+
* Latin Extended-A
73+
* Latin Extended-B
74+
* Latin Extended Additional
75+
* IPA Extensions
76+
* Spacing Modifier Letters
77+
* Unicode symbols
78+
* General Punctuation
79+
* Number Forms
80+

man/unicode2ascii.1

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
.Dd June 19, 2021
2+
.Dt unicode2ascii 1
3+
.Os
4+
.Sh NAME
5+
.Nm unicode2ascii
6+
.Nd Unicode to Ascii command-line tool
7+
.Sh SYNOPSIS
8+
.Nm
9+
.Op Fl a|--analyze
10+
.Op Fl t|--translated
11+
.Op Fl u|--untranslated
12+
.Op Fl -debug
13+
.Op Fl -help|-?
14+
.Op Fl -version
15+
.Op Fl -
16+
.Sh DESCRIPTION
17+
The
18+
.Nm
19+
utility filters Unicode characters in the standard input and replace them by their standard or corrected Unicode equivalent,
20+
or its internal equivalent if missing in Unicode.
21+
.Pp
22+
Command line options disable filtering mode and analyze Unicode characters (
23+
.Fl a
24+
option) or report (un)translated Unicode characters (
25+
.Fl t
26+
and
27+
.Fl u
28+
options) in a format suitable for updating the source code.
29+
.Ss OPTIONS
30+
.Op Fl a|--analyze
31+
Analyze Unicode characters.
32+
Disable filter mode
33+
.Pp
34+
.Op Fl t|--translated
35+
Report translated Unicode characters.
36+
Disable filter mode
37+
.Pp
38+
.Op Fl u|--untranslated
39+
Report untranslated Unicode characters.
40+
Disable filter mode
41+
.Pp
42+
.Op Fl -debug
43+
Enable debug mode
44+
.Pp
45+
.Op Fl -help|-?
46+
Print usage and this help message and exit
47+
.Pp
48+
.Op Fl -version
49+
Print version and exit
50+
.Pp
51+
.Op Fl -
52+
Options processing terminator
53+
.Sh ENVIRONMENT
54+
The
55+
.Ev UNICODE2ASCII_DEBUG
56+
environment variable can also be set to any value to enable debug mode.
57+
.Sh EXIT STATUS
58+
.Ex -std unicode2ascii
59+
.Sh SEE ALSO
60+
.Xr unicode2ascii 3 ,
61+
.Xr iconv 1
62+
.Sh STANDARDS
63+
The
64+
.Nm
65+
utility is not a standard UNIX/POSIX command.
66+
.Pp
67+
It tries to follow the PEP 8 style guide for Python code.
68+
.Sh HISTORY
69+
This utility was made for
70+
.Lk https://github.com/HubTou/PNU [The PNU project]
71+
.Sh LICENSE
72+
This utility is available under the 3-clause BSD license.
73+
.Sh AUTHORS
74+
.An Hubert Tournier
75+
.Sh CAVEATS
76+
So far, only the following Unicode character sets are processed for missing ASCII equivalents:
77+
.Bl -bullet
78+
.It
79+
C0 control characters
80+
.It
81+
C1 control characters
82+
.It
83+
Basic Latin characters
84+
.It
85+
Latin-1 Supplement
86+
.It
87+
Latin Extended-A
88+
.It
89+
Latin Extended-B
90+
.It
91+
Latin Extended Additional
92+
.It
93+
IPA Extensions
94+
.It
95+
Spacing Modifier Letters
96+
.It
97+
Unicode symbols
98+
.It
99+
General Punctuation
100+
.It
101+
Number Forms
102+
.El

0 commit comments

Comments
 (0)